Monday, February 27, 2017

PHP-Style JSON Parsing in Java with Jsoniter

JSON originated from a weakly-typed and dynamic language, Javascript. There is an impedance mismatch between JSON's dynamic nature and Java's rigid typing. I found existing solutions too focused on the concept of data binding, which is too heavy-weight in some circumstance. Contrast that with PHP, where PHP we have the all-in-one data type Array, and by just one line of json_decode we can parse a complex JSON document. Jsoniter is a new library written in Java, determined to make JSON parsing in Java as easy as in PHP through a similar data type: Any. The most remarkable feature is the underlying lazy-parsing technique, which makes the parsing not only easy, but very fast.

Why JSON Is Hard to Process in Java

There are three reasons why JSON documents can be hard to process using existing parsers. I call this the "JSON impedance mismatch".

Reason 1: Type Mismatch

When JSON is used as a data exchange format between Java and dynamic languages like PHP, the object field types might become a problem. For example, have a look at this JSON:

{
"order_id": 100098,
"order_details": {"pay_type": "cash"}
}

99% of the times, the PHP code might return the exact structure we expect. But it might also return slightly different JSON for different input conditions, due to the fact most PHP developers do not care if a variable is string or int.

{
"order_id": "100098",
"order_details": []
}

Why is order_details an empty array instead of an empty object? It is a common problem when working with PHP, where everything is an array. An array used as a non-empty map will be encoded like {"key":"value"} but an empty map is just an empty array, which will be encoded as [] instead of {}.

It is not a big problem, definitely fixable, but for historical data like logs, we have to deal with it anyway.

Reason 2: Heterogeneous Data

In Jave, we are used to homogeneous data. For example, [1, 2, 3] is an int array, ["1", "2", "3"] a String array. But how do you represent [1, 2, "3"] in Java? An object array Object[] is awkward to work with. How about [1, ["2", 3]]? Java does not have a convenient container to hold this kind of data.

Moreover, it is very common in JSON to have slightly different structures representing the same thing. For example, a success response:

{
"code": 0,
"data": "Success"
}

But for an error response:

{
"code": -1,
"error": {"msg": "Wrong Parameter", "stacktrace": "…"}
}

If we want to get the data or error message, we have make a number of null checks. Assuming the response is represented as Map<String, Object>, the code to extract the error message will look as follows:

Object errorObj = response.get("error");
if (errorObj == null)
    return "N/A";

Map<String, Object> error = (Map<String, Object>)errorObj;
Object msgObj = errorObj.get("msg");
if (msgObj == null)
    return "N/A";

return (String)msgObj;

The type casting and null checking is not fun at all. Unfortunately, it is common to extract value from a JSON five levels deep!

Reason 3: Performance and Flexibility Balance

By going with JSON, we have already chosen flexibility instead of raw performance. However, it still feels bad to parse a JSON document as Map<String, Object>, knowing that it will be very costly. I am not arguing we should choose the performance over expressiveness. But the guilt of deliberately compromising performance constantly troubles me. It is a dilemma I find myself in frequently:

  • Parse the JSON as Map<String, Object> and read values from it. Saves the trouble of defining a schema class but we have to unmarshall all the bytes, regardless if we need them or not.
  • Define a class and use data binding. It can skip unneeded parsing work, and accessing an object is faster than a hash map. But is it worth the trouble every time?
  • Some JSON parser come with a streaming API, but it is considered too low level.

There is a long way between totally type-less parsing and rigid data binding. It would be better if we have more options to choose between performance and flexibility, or both.

Parsing JSON in Java like in PHP with Jsoniter

How Jsoniter Solves the JSON Impedance Mismatch

Continue reading %PHP-Style JSON Parsing in Java with Jsoniter%


by Tao Wen via SitePoint

No comments:

Post a Comment