Spreadsheets have their CHAR() function, and VBA has its Chr() function. Both return the text character for the specified numerical input, 0 to 255. And spreadsheets have their CODE() function, and VBA has its Asc() function. Both of those return the ASCII code for the leading character of a text string. All well-worn stuff.
But what if you want or need to work with Unicode values? All four functions fail you. As an example, assume you want the true prime character (‘, Unicode 2032) in a string. The prime character, technically, is not an italicized apostrophe (‘), a right single curly quote (‘), or an acute accent (‘).
VBA provides the ChrW() function that does that. ChrW() expects a long as input, but also accepts hexadecimal. Unicode is in hex numbering, so there are two choices: Change U2032 to decimal, or tell ChrW() that the input is in Hex. Since HEX2DEC(2032) is 8242, these two are equivalent:
- ChrW(8242)
- ChrW(&H2032)
Both will put ‘ into a string. If ChrW() repeated the same functionality of Chr() below 256, things would be simple. However, the Windows character set deviates from the Unicode character set for ASCII(128) to ASCII(159). In that range, Chr(CharCode) and ChrW(CharCode) produce different results. As WikiPedia says, Windows “coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters.” Not sure what C1 controls (probably a printer), but if we want to get Unicode to the spreadsheet, do we want it to give the functionality of CHAR()/Chr(), or that of ChrW() which is ISO-8859-1 compliant? Or, optionally both. The function CHARW() takes the optional route. If you set Exact_functionality to TRUE, you can put those C1 controls in your spreadsheet. The default is to do otherwise.
‘Use a Leading “U” or “u” to indicate Unicode values
‘Exact_functionality returns the Unicode characters for Ascii(128) to Ascii(159) rather than
‘the Windows characters
If UCase(Left$(CharCode, 1)) = “U” Then CharCode = Replace(CharCode, “U”, “&H”, 1, 1, vbTextCompare)
CharCode = CLng(CharCode)
If CharCode < 256 Then
If Exact_functionality Then
CHARW = ChrW(CharCode)
Else
CHARW = Chr(CharCode)
End If
Else
CHARW = ChrW(CharCode)
End If
End Function
One very nice thing is that you can feed Clng() a hex value, and it will do the HEX2DEC conversion for you.
The VBA function AscW() goes the other way, and has the same ISO problems. It will tell you the decimal code of the first character in a Unicode string, with no regard to the Windows character set. We can make another UDF CODEW() that can optionally specify either the decimal or hex value for the first character is returned, and whether or not to be ISO compliant. The default is to return the HEX unicode (as Uxxxx) and not to comply.
Optional Exact_functionality As Boolean = False) As Variant
‘ Exact Functionality returns exact Unicode for characters as AscW() does
‘ rather than Windows characters as Asc() does
Dim Characters As String
Dim i As Long
If Exact_functionality Then
CODEW = AscW(Character)
If Unicode_value Then CODEW = “U” & Hex(CODEW)
Exit Function
End If
For i = 128 To 159 ‘where non-compliant
Characters = Characters & Chr(i)
Next i
If InStr(1, Characters, Left$(Character, 1), vbBinaryCompare) Then
CODEW = Asc(Character)
Else
CODEW = AscW(Character)
End If
If Unicode_value Then CODEW = “U” & Hex(CODEW)
End Function
The default will return U2032 when the first character is ‘, and 8242 when Unicode_value is set FALSE. For another example, is CHAR(128), Chr(128), ChrW(8354), CHARW(128), CHARW(“U80”), CHARW(“U20AC”,TRUE) and CHARW(8364,TRUE).
CODEW(“”) is “U80”, CODEW(“”,FALSE) is 128, CODEW(“”,,TRUE) is “U20AC”, and CODEW(“”,FALSE,TRUE) is 8354.
To see Unicode characters, the cell’s font has to be set to a Unicode font.
…mrt
AscW returns negative numbers sometimes. This is documented in the MS Excel help files. Does the macro above also work in those cases?
Stefano –
There is nothing in the UDF to protect for bad AscW behavior.
…mrt
The workaround:
http://support.microsoft.com/kb/272138
Stefano –
This would be the modified code:
Optional Exact_functionality As Boolean = False) As Variant
‘ Exact Functionality returns exact Unicode for characters as AscW() does
‘ correcting for http://support.microsoft.com/kb/272138 behavior
‘ rather than Windows characters as Asc() does
Dim Characters As String
Dim i As Long
If Exact_functionality Then
CODEW = AscW(Character)
If CODEW < 0 the CODEW = 65536 + CODEW
If Unicode_value Then CODEW = “U” & Hex(CODEW)
Exit Function
End If
For i = 128 To 159 ‘where non-compliant
Characters = Characters & Chr(i)
Next i
If InStr(1, Characters, Left$(Character, 1), vbBinaryCompare) Then
CODEW = Asc(Character)
Else
CODEW = AscW(Character)
If CODEW < 0 the CODEW = 65536 + CODEW
End If
If Unicode_value Then CODEW = “U” & Hex(CODEW)
End Function
Can’t really call it “Exact Functionality” any more :roll:
…mrt
Related problem is a get inserted symbol’s code. Solution is to use dialog capabilities in context of selected symbol:
http://word.mvps.org/faqs/macrosvba/FindReplaceSymbols.htm
Sub GetCharNoAndFont()
With Dialogs(wdDialogInsertSymbol)
Debug.Print “Font: ” & .Font
Debug.Print “Char number ” & .CharNum
End With
End Sub
Thanks for this even though it’s an old post it helped with using ChrW(&H20AC) instead of chr(128). I believe this helps alleviate the issue of U.S. and non-U.S. character sets.
Do you have any thoughts on using the Unichar function for Chinese characters? It works in Excel, but doesn’t seem to be available in MS Access.
Not to derail the thread, but…I’m trying to build a db of Chinese characters and can’t find a char set that supports all 8,105 characters pubished by the PRC. Excel works fine with Unichar / Unicode functions, but I need similar functionality in MySQL, MS Access. So, in addition to the comments above, I need a character set that supports these outliers so they can be stored…
UTF8 works on some DB’s (not MySQL)
UTF8MB4 still doesn’t support all characters
GB18030 is supposed to support all officially listed characters, but still doesn’t quite work