Home Articles Code Programs Photos
Created: 2009-01-14 ~ Updated: 2009-02-21

Code : JsonParser

A Compact Pull Parser for JSON

This is another situation where I needed something leaner than what I could find on the Internet, and I found myself lying awake in bed early morning one wondering, "how hard could it be?". Written from about 3 am to 10 am, this parser is efficient, lean and functional.

Update 2008-02-21: I have enhanced the parser to support the deviations from strict JSON syntax as options, which must be explicitly specified when the parser is constructed. This updated version also uses a nested Escape class to provide coded exceptions which are specific to the failure, for more precise error handling.

Relaxation of the JSON Syntax

I mildly dislike a few aspects of the JSON specification (except for the requirement to quote keywords, which I detest), so my parser accepts some relaxations. Note that this still conforms to the specification in spirit, which specifically allows parser to accept looser syntax. This is primarily because I use JSON heavily for configuration files, not just data interchange, and these exceptions allow the configurations to be far more readable and workable for a human being.

The optional behaviors are:

  • OPT_UNQUOTED_KEYWORDS: Allow keyword strings to be unquoted.
  • OPT_EOL_IS_COMMA: Allow an end-of-line to be treated as a comma.
  • OPT_MULTILINE_COMMENTS: Allow multiline comments using /* and */.
  • OPT_MULTILINE_STRINGS: Allow mutiline strings - this permits strings to be broken over multiple lines.

Note also that I typically use a layer on top of the parser for reading documents that treats repeated keywords as if they represent array elements, which vastly improves some complex arrays.

Unquoted Names

I find the requirement for quoting names to be draconian and downright ugly. It kills readability, and it also totally defeats a big benefit of syntax highlighting when manually editing the JSON text. Furthermore, it seems that in general the programming community agrees that the quoting of keywords is an unfortunate side-effect of deriving JSON from the JavaScript ECMA specification.

Implied Commas

In my opinion, requiring a comma at the end of line is just adding "noise".

Multiline Comments

This is really a convenience for marking temporary changes and debugging by a human editor. It's not necessary at all for interchange, but it can be quite useful in complex configuration files.

Multiline String

This one is really "pushing the friendship". Again, only for human-edited configuration files, it should be enabled only if the JSON data truly exists in a limited/closed context. That said, I have seen this used to great benefit to allow long and complex SQL statements to span lines so as to render them in a way that reflected their structure.

Compare this strict JSON:


"ProcessingUnit": {
  "Name": "FunctionKeyHotSpots",

  "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 1.0  },

  "Defaults": [{
    "Name": "FunctionKey_Patterns1",
    "Pattern": [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]",
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]",
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]",
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }, {
    "Name": "FunctionKey_Patterns2",
    "Pattern": [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][Space?][(-:)][$Label]",
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]",
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }, {
    "Name": "FunctionKey_Common",
    "Macro": "CMD([$Key])",
    "Metadata": { "Type": "FunctionKey", "Description": "Press F[$Key]", "Label": "[$Label]", "Value": "[$Key]" }
    }],

  "HotSpot": [{
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns1",
      "MinRow"          : "Window.Bottom-2",
      "MaxRow"          : "Window.Bottom+1",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1"] }
      },
    "InheritDefaults"   : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 2 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns2",
      "MinRow"          : "Window.Bottom-2",
      "MaxRow"          : "Window.Bottom+1",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults"   : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 3 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns1",
      "MinRow"          : "Window.Top",
      "MaxRow"          : "Window.Bottom-3",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults" : "FunctionKey_Common"
    }, {
    "Condition": { "Key": "Variable#Level.DetectFunctionKey", "Comparator": "GE", "Value": 4 },
    "FindText": {
      "InheritDefaults" : "FunctionKey_Patterns2",
      "MinRow"          : "Window.Top",
      "MaxRow"          : "Window.Bottom-3",
      "TrimStart"       : { "CharSpec": "Space" },
      "TrimEnd"         : { "CharSpec": "Space" },
      "ExpandEnd"       : { "Pattern": [ "  " ], "InheritDefaults": ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      },
    "InheritDefaults" : "FunctionKey_Common"
    }]
  }

to this relaxed JSON syntax:


ProcessingUnit: {
  Name: "FunctionKeyHotSpots"

  Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 1    }

  Defaults: {
    Name: "FunctionKey_Patterns1"
    Pattern: [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]"
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(=)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }

  Defaults: {
    Name: "FunctionKey_Patterns2"
    Pattern: [
      "[^Letter,^OpenBracket,BRK]F[(0)?][$Key][Digit{1,2}][$][Space?][Space?][(-:)][$Label]"
      "[^Letter,^OpenBracket,BRK]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label]"
      "[OpenBracket][Space?]F[(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      "[OpenBracket][Space?]CMD[Space,(-)?][(0)?][$Key][Digit{1,2}][$][Space?][(-:)][$Label][^CloseBracket*][$][CloseBracket]"
      ]
    }

  Defaults: {
    Name: "FunctionKey_Common"
    Macro: "CMD([$Key])"
    Metadata: { Type: "FunctionKey", Description: "Press F[$Key]", Label: "[$Label]", Value: "[$Key]" }
    }

  HotSpot: {
    FindText: {
      InheritDefaults : "FunctionKey_Patterns1"
      MinRow          : "Window.Bottom-2"
      MaxRow          : "Window.Bottom+1"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 2 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns2"
      MinRow          : "Window.Bottom-2"
      MaxRow          : "Window.Bottom+1"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 3 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns1"
      MinRow          : "Window.Top"
      MaxRow          : "Window.Bottom-3"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }

  HotSpot: {
    Condition: { Key: "Variable#Level.DetectFunctionKey", Comparator: "GE", Value: 4 }
    FindText: {
      InheritDefaults : "FunctionKey_Patterns2"
      MinRow          : "Window.Top"
      MaxRow          : "Window.Bottom-3"
      TrimStart       : { CharSpec: "Space" }
      TrimEnd         : { CharSpec: "Space" }
      ExpandEnd       : { Pattern: [ "  " ], InheritDefaults: ["FunctionKey_Patterns1","FunctionKey_Patterns2"] }
      }
    InheritDefaults : "FunctionKey_Common"
    }
  }

How It Works

Essentially, the caller simply calls next() in a loop, with each call returning a parsing event code. For each event the details are queried out of the parser in order to process the parsed data. The parser takes care of all decoding, delivering values as Strings, which the caller may then convert into Java objects and values - note that although a String value is ECMA unescaped, the quotes are left on so that string values can be differentiated from numbers and true, false and null.

The essence of the parser is the next() method, which implements a low-level state machine (the events returned themselves constitute a higher level state machine). Most of the rest of the class is methods for accessing the values for the higher level events.

After an up-front test for comments (my parser allow single line comments starting with "*", "#" and "//", and non-nested multiline comments using "/* ... */"), the parser arrives at the next event by progressing through a keyword, in quotes, divider and value states. Parsing is made considerably simpler by the facts that the JSON syntax is simple and rigorous, the rules are strict and any violation results in an exception.

Two skip methods, one for objects and one for arrays, provide for convenient stream processing of select members.

A very simple layer on top of the pull parser can load a document into a Java multi-map structure - here is an example. This example uses a Callback to actually create any objects, delegating the responsibility back to the calling code, allowing very precise control (a default callback method is also shown). Note that this example turns repeated keywords into an array.


/**
 * Parse a generalized data structure from a JSON input stream.
 * <p>
 * All values are added using the <code>crtmbrcbk</code> callback.
 * <p>
 * <b><u>Reminder</b></u>
 * <p>
 * When using a reflected method, don't forget to configure your code obfuscator to retain it in unobfuscated form.
 *
 * @param psr       The parser to use.
 * @param tgt       Target object to which to add members; if this is null a new object is created using the callback.
 * @param maxlvl    Maximum level to recursively parse substructures, including arrays (objects at a deeper level are silently ignored).
 * @param crtmbrcbk A callback object invoked to create a member value.
 */
static public Object parseObject(JsonParser psr, Object tgt, int maxlvl, Callback crtmbrcbk) {
    return _parseObject(psr,tgt,maxlvl,new Callback.WithParms(crtmbrcbk,4),false);
    }

static private Object _parseObject(JsonParser psr, Object tgt, int maxlvl, Callback.WithParms crtmbrcbk, boolean arr) {
    int                                 evt;                                    // event code

    if(tgt==null) { tgt=crtmbrcbk.invoke(psr,null,"",null); }

    while((evt=psr.next())!=JsonParser.EVT_INPUT_ENDED && evt!=JsonParser.EVT_OBJECT_ENDED && evt!=JsonParser.EVT_ARRAY_ENDED) {
        String  nam=psr.getMemberName();

        switch(evt) {
            case JsonParser.EVT_OBJECT_BEGIN : {
                if(nam.length()>0) {
                    if(maxlvl>1) { _parseObject(psr,crtmbrcbk.invoke(psr,tgt,nam,null),(maxlvl-1),crtmbrcbk,false); }
                    else         { psr.skipObject();                                                                              }
                    }
                else {
                    _parseObject(psr,tgt,maxlvl,crtmbrcbk,false);
                    }
                } break;

            case JsonParser.EVT_ARRAY_BEGIN : {
                if(!arr) {
                    _parseObject(psr,tgt,maxlvl,crtmbrcbk,true);                // first level of any array is added directly to the inherently list-supporting object
                    }
                else {
                    if(maxlvl>1) { _parseObject(psr,crtmbrcbk.invoke(psr,tgt,nam,null),(maxlvl-1),crtmbrcbk,true); }
                    else         { psr.skipArray();                                                                              }
                    }
                } break;

            case JsonParser.EVT_OBJECT_MEMBER : {
                crtmbrcbk.invoke(psr,tgt,nam,psr.getMemberValue());
                } break;
            }
        }
    return tgt;
    }

/**
 * Default callback method for creating a typed value and/or add it to a DataStruct.
 * <p>
 * The rules used to determine type are the standard JSON interpretation of the input value, except that numerics are created as Strings:
 * <ol>
 *   <li>A null value is returned as a new DataStruct().
 *   <li>An unquoted text value of "null" is return as null.
 *   <li>An unquoted text value of "true" is returned as Boolean.TRUE.
 *   <li>An unquoted text value of "false" is returned as Boolean.FALSE.
 *   <li>A quoted text value is returned as a String with the quotes stripped.
 *   <li>Any other value (which must be a String) is returned unchanged.
 *   </ol>
 * <p>
 *
 * @param tgt       The optional target to which to add the new member - if null the member object is created without adding it to anything.
 * @param nam       The name to use to add the new member.
 * @param val       The value of the member to add - must be null or String.
 * @return          The newly created object, which is one of: DataStruct, null, Boolean, or String.
 */
static public Object callback_crtMemberDefault(JsonParser psr, Object tgt, String nam, Object val) {
    String                              txt=(String)val;

    if     (val==null                    ) { val=new DataStruct();            }
    else if("null" .equalsIgnoreCase(txt)) { val=null;                        }
    else if("true" .equalsIgnoreCase(txt)) { val=Boolean.TRUE;                }
    else if("false".equalsIgnoreCase(txt)) { val=Boolean.FALSE;               }
    else if(JsonParser.isQuoted     (txt)) { val=JsonParser.stripQuotes(txt); }

    if(tgt!=null) { ((DataStruct)tgt).addField(nam,val); }

    return val;
    }

Get The Source

The package statement has been stripped from the source for convenience. I do not advocate unpackaged classes - you should add a package according to your own requirements.

The source compiles to Java 3, but only minor changes should be required to target Java 2 or earlier.

Download JsonParser.java (Total downloads: 142)