Appendix D. Unicode String Encode Support

Data matrix barcode (as defined in ISO 16022) does not support Unicode natively. The default character set is ISO8859-1. Theoretically the support could become available using ECI; however there is no ECI published for any of Unicode character set (UTF16, UTF8 etc.). There is also lack of ECI support in most 2D barcode readers.

Because there is a demand on encoding characters outside ISO8859-1, several methods have been developed. The common approach is to encode characters in native character set, and the reader is configured to read based on the default locale. This approach produces the smallest barcode as possible with one major caveat. The same data matrix code is decoded into different text when read by readers with different locale configured. In many use cases this is not an issue, as a 2D barcode with Chinese text encoded is intended to be used in China only.

A new API DataMatrixEncode2W is added to accept a UTF16 string. Internally, the encoder examine the contents of the UTF-16 string. If all characters fall into ISO8859-1, it converts them into ISO8859-1 and encoded as is. Otherwise, it converts the UTF16 string into UTF-8 with BOM, and encode the result. You can still use the DataMatrixEncode2 API and take care of the character set conversion by yourself, for cases that you are required to use local character set.

Several components that accept Unicode parameters are updated in 5.1 release. If you are working exclusively with ASCII or ISO8859-1, you wont' see any changes in the results. Previously, characters outside ISO8859-1 are converted to its ANSI counterpart with the default locale. Now with 5.1 release, the whole string will be converted to UTF-8 with BOM. This makes the data matrix code portable among countries. Those components include DataMatrix ActiveX control, GUI encoder and Crystal Reports UFL.