D Unicode String Encode Support - Morovia QRCode Fonts & Encoder 5 Reference Manual

Appendix D. Unicode String Encode Support

The most recent QR code standard, ISO IEC 18004 2015, does not support Unicode natively. The standard states that the default character set is ISO8859-1, and ECI is required to switch to a different character set. Unfortunately, most barcode readers do not support ECI.

Because there is a demand on encoding characters outside ISO8859-1, several methods have been developed. The common approach is to encode characters in native character set, and the reader is configured to read based on the default locale. This approach produces the smallest barcode as possible with one major caveat. The same QR code is decoded into different text when read by readers with different locale configured. In many use cases this is not an issue, as a QR code with Chinese text encoded is intended to be used in China only.

The open source library zxing creates all QR codes by converting the Unicode string into UTF-8 first then encode as is. As this method is capable of encoding all Unicode characters without using ECI, and the free nature of this library, this so-called "UTF-8 encoded QR code" has become mainstream. Many 2D barcode scanners provide configuration to interprete this kind of QR codes. Some industry standards start to mandate this QR code type. For example, Swiss Payment Standards version 2.2, published in Febrary 2021 states "the Swiss QR Code must also be UTF-8 encoded".

The QR Code encoder handles unicode string conversion when a Unicode string is passed. For example, when you call QRCodeEncode2W function, or use a Unicode capable components (such as Crystal Reports UFL, QR Code ActiveX etc.).

In versions after 5.1 and before 5.3.3, the input Unicode string is examined to see if there are any characters outside ISO8859-1. If none of characters are outside ISO8859-1, no conversion is performed and the string is encoded as the way that is compliant to the ISO standard. Otherwise, it converts the string to UTF-8 with BOM, then encodes the UTF-8 string in the same way as an ISO8859-1 string. This method produces standard-compliant QR codes when all characters in the string are either Latin or ASCII.

In version 5.3.3 and above, the input string is always converted to UTF8, thus creating unambiguous "UTF-8 encoded QR code". The decoded bytes should be treated as UTF-8 and processed accordingly. User can set environment variable morovia.qrencoder5.mode to ISO2015 to keep the previous behavior.

Users should evaluate this change when upgrading to version 5.3.3 and above.

If all characters in the input string are ASCII, the QR code does not change. Any readers should read the QR code correctly.
If the string always contain characters that are neither ISO8859-1 nor ASCII, the QR code does not change. Phone readers will pick up the QR code correctly, and 2D barcode readers must be configured to set UTF-8 encode mode.
For Latin strings (containing ISO8859-1 characters and ASCII), the QR code will change. The version prior to 5.3.3 produces stanadard compliant QR codes, and 2D barcode readers should read them correctly under ISO8859-1 mode or the stanard-compliant mode. Phone readers generally read correctly by recognizing latin characters. In version 5.3.3 and above, UTF-8 encoded QR codes are produced.

Appendix D. Unicode String Encode Support

Morovia QRCode Fonts & Encoder 5 Reference Manual

Do you know?