wxjson_whatsnew What's new in version 1.1

A new AsXxxxxx() function

The wxJSONValue::AsXxxxx() function can be used to get the value of a JSON value but you have first to check if it is of the expected type. So you would probably write code like this one:

  wxJSONValue v["key"] = 100;
  int i;
  if ( v["key"].IsInt() ) {
    i = v["key"].AsInt();
  }
  else {
    cout << "Error: value is not of the expected type";
  }

This release adds a new version of all overloaded AsXxxxxx() function which stores the value in the provided argument and returns TRUE if the value stored in the JSON value object is of the correct type. This is the function prototype for integer value:

  bool AsInt( int& i );

Now you can get the value and check if it is of the expected type in only one call:

  wxJSONValue v["key"] = 100;
  int i;
  if ( !v["key"].AsInt( i ) ) {
    cout << "Error: value is not of the expected type";
  }

The new reader and writer organizaion

Until version 1.0 the wxJSON reader and writer had some issues mostly related to speed. The problem was that both the reader and the writer performed a character conversion from / to UTF-8 and unicode for every char read from / written to streams. Worst, in ANSI builds, every char undergoes to a double conversion for both the reader and the writer (the following is for the reader):

Also note that such a conversion is, for most characters, not needed at all because those chars are in the US-ASCII charset (0x00..0x7F).

In version 3.0 of the GUI framework, developers have introduced a radical change to Unicode support and the wxString class has totally changed in its internal organisation. In particular, the wxString class now stores strings in UTF-16 encoding on Windows and in UTF-8 on unix systems. The drawback is that on *nix systems the usual character access using subscripts such as:

        wxString s;
        s[n];

is VERY inefficient because of the UTF-8 encoding. The conseguence is that in wxJSON there is a speed issue also when the JSON text input is from wxString and not only from streams.

What are the goals of the new 1.1 version

In order to find the best organization for the reader and the writer I have to first point out what are the goals of this new release of wxJSON:

The new wxJSON organisation

The wxJSON library allows you to write / read JSON text to / from two different types of objects:

These two kinds of I/O classes are very different because of the internal representation of the JSON text: in particular, wxString uses UTF-16 on windows and UTF-32 on *nix systems up to wxWidgets 2.8. UTF-8 is used on *nix systems in wxWidgets 2.9. For streams the encoding is alwasy UTF-8. A further different encoding is used in ANSI mode: locale dependent one-byte characters.

Encoding formats in the different wxWidget's modes / versions /platforms
ver11.gif

These encoding differences complicates very much the organization of the writer and the reader because character read from / written to JSON text has to be converted to a unique type for processing. Actually, each char is converted to a wchar_t type and it occurs in ANSI mode, too. This conversion slows down the processing very much. A further complication is that wxWidgets 2.9 does no more return a char or wchar_t type when accessing string objects but a helper class: wxUniChar which has its own encoding format so that it has to be further converted to wchar_t.

The solution is to use only one encoding format for all types of I/O, build mode and wxWidget's versions: UTF-8 is the only one applicable to all these cases. Using UTF-8 as the unique I/O format has several advantages:

The only drawback is when input / output is not from / to a stream (which is in UTF-8 format) but from / to a wxString object. The solution I found is:

So, as opposed to the previous versions, the read / write operations are faster on streams and slower on strings because of the construction of the temporary UTF-8 memory buffers.

Issues in ANSI mode

In versions up to 1.0 the wxJSON library gives you a limited Unicode support in AMSI mode when reading UTF-8 streams. For example, suppose we have a UTF-8 file that contains the following text:

{
  "us-ascii" : "abcABC",
  "latin1"   : "àèì©®",
  "greek"    : "aß?d",
  "cyrillic" : "????"
}

We read the file in a wxWidgets application built in ANSI mode and localized in West Europa thus using the ISO-8859-1 (Latin1) character set. Because Latin1 charset does not have support for greek and cyrillic characters, the reader cannot store such values in the wxJSONValue object because it contains a wxString object which only uses one-byte locale dependent characters to be stored.

In order to keep the original meaning of data, the wxJSON library converted each character that cannot be represented in the current locale into a unicode escaped sequence. Below you find a representation of the content of the wxJSONvalue when the file is read:

{
  "us-ascii" : "abcABC",
  "latin1"   : "àèì©®",
  "greek"    : "\u03B1\u03B2\u03B3\u03B4",
  "cyrillic" : "\u0424\u0425\u0426\u0427"
}

I thought that this would be an elegant solution for reading UTF-8 streams in ANSI mode and that data could be exchanged safely from ANSI to Unicode and viceversa but... there are some drawbacks in this solution:

Because in the new organization the reader and the parser only process UTF-8 streams, there is a problem when the string contains unrepresentable UTF-8 characters. Note that this only happens in the parser class and when the JSON text input is actually from a stream: it does not happen if the processed stream is a temporary UTF-8 buffer obtained by converting the wxString input text.

The solution suggested by Piotr Likus in his e-mail was pretty simple and very fast: who cares about internal encoding of wxString? When a double-quote character is encontered, just copy all the stream up to the next unescaped double-quote char; only process escaped sequences. The wxString object will, therefore, contain UTF-8 octets in ANY modes and platforms.

Although this would be a very fast solution, one problem still remains: what if the stored strings have to be used / processed / displayed by the application? They surely need to be converted to the native internal encoding which is platform- and mode-dependant.

So, I decided to do the conversion in the wxJSON reader: string values are always stored in the native format so that they can be immediatly processed by the application: for speed purposes, the conversion is done for the whole string, in one step. In Unicode builds the conversion of the UTF-8 buffer always succeeds. In ANSI builds it may succeed or not. If the conversion fails then the UTF-8 buffer is copied to the wxString object.

Note that this behaviour is different from version 1.0 which instead stores unicode escaped sequences. This is not a compaibility break but a bug fix for the reasons I wrote before


Generated on Thu Oct 22 18:15:09 2009 for wxJSON by  doxygen 1.5.5