wxJSONValue::AsXxxxx()
function can be used to get the value of a JSON value but you have first to check if it is of the expected type. So you would probably write code like this one:
wxJSONValue v["key"] = 100; int i; if ( v["key"].IsInt() ) { i = v["key"].AsInt(); } else { cout << "Error: value is not of the expected type"; }
This release adds a new version of all overloaded AsXxxxxx() function which stores the value in the provided argument and returns TRUE if the value stored in the JSON value object is of the correct type. This is the function prototype for integer value:
bool AsInt( int& i );
Now you can get the value and check if it is of the expected type in only one call:
wxJSONValue v["key"] = 100; int i; if ( !v["key"].AsInt( i ) ) { cout << "Error: value is not of the expected type"; }
In version 3.0 of the GUI framework, developers have introduced a radical change to Unicode support and the wxString class has totally changed in its internal organisation. In particular, the wxString class now stores strings in UTF-16 encoding on Windows and in UTF-8 on unix systems. The drawback is that on *nix systems the usual character access using subscripts such as:
wxString s; s[n];
is VERY inefficient because of the UTF-8 encoding. The conseguence is that in wxJSON there is a speed issue also when the JSON text input is from wxString and not only from streams.
wxString
wxInput/OutputStream
wxWidgets
2.9
. For streams the encoding is alwasy UTF-8. A further different encoding is used in ANSI mode: locale dependent one-byte characters.
These encoding differences complicates very much the organization of the writer and the reader because character read from / written to JSON text has to be converted to a unique type for processing. Actually, each char is converted to a wchar_t type and it occurs in ANSI mode, too. This conversion slows down the processing very much. A further complication is that wxWidgets 2.9 does no more return a char or wchar_t type when accessing string objects but a helper class: wxUniChar which has its own encoding format so that it has to be further converted to wchar_t.
The solution is to use only one encoding format for all types of I/O, build mode and wxWidget's versions: UTF-8 is the only one applicable to all these cases. Using UTF-8 as the unique I/O format has several advantages:
wxString::FromUTF8()
function.
{ "us-ascii" : "abcABC", "latin1" : "àèì©®", "greek" : "aß?d", "cyrillic" : "????" }
We read the file in a wxWidgets application built in ANSI mode and localized in West Europa thus using the ISO-8859-1 (Latin1) character set. Because Latin1 charset does not have support for greek and cyrillic characters, the reader cannot store such values in the wxJSONValue
object because it contains a wxString
object which only uses one-byte locale dependent characters to be stored.
In order to keep the original meaning of data, the wxJSON library converted each character that cannot be represented in the current locale into a unicode escaped sequence. Below you find a representation of the content of the wxJSONvalue
when the file is read:
{ "us-ascii" : "abcABC", "latin1" : "àèì©®", "greek" : "\u03B1\u03B2\u03B3\u03B4", "cyrillic" : "\u0424\u0425\u0426\u0427" }
I thought that this would be an elegant solution for reading UTF-8 streams in ANSI mode and that data could be exchanged safely from ANSI to Unicode and viceversa but... there are some drawbacks in this solution:
The solution suggested by Piotr Likus in his e-mail was pretty simple and very fast: who cares about internal encoding of wxString? When a double-quote character is encontered, just copy all the stream up to the next unescaped double-quote char; only process escaped sequences. The wxString object will, therefore, contain UTF-8 octets in ANY modes and platforms.
Although this would be a very fast solution, one problem still remains: what if the stored strings have to be used / processed / displayed by the application? They surely need to be converted to the native internal encoding which is platform- and mode-dependant.
So, I decided to do the conversion in the wxJSON reader: string values are always stored in the native format so that they can be immediatly processed by the application: for speed purposes, the conversion is done for the whole string, in one step. In Unicode builds the conversion of the UTF-8 buffer always succeeds. In ANSI builds it may succeed or not. If the conversion fails then the UTF-8 buffer is copied to the wxString object.
Note that this behaviour is different from version 1.0 which instead stores unicode escaped sequences. This is not a compaibility break but a bug fix for the reasons I wrote before