Is JParse Fast

November 6, 2024

                                                                           

This article originally appeared on LinkedIn on Feb 19th, 2024 by Rick Hightower

JParse: The most efficient JSON parser for the JVM yet!

Rick Hightower Engineering Consultant focused on AI

February 19, 2023

JParse

JParse, is the most efficient JSON parser for the JVM yet.

Why JParse?

JParse is the most efficient JSON parser for the JVM yet - it uses an index overlay to deliver lightning-fast parsing speeds.

The JParse parser is designed to revolutionize how developers process and analyze JSON data by providing a more intelligent, more efficient approach to JSON parsing.

So, what is an index overlay, and why is it such a game-changer? Put simply, an index overlay is a mechanism that allows our JSON parser to access and analyze data in real time as it is being parsed (before it Is completely parsed and deserialized). This means that instead of waiting until the entire JSON document is parsed before analyzing it, our parser can extract and analyze specific pieces of data you want.

This dramatically speeds up the parsing process and reduces the memory required to parse large JSON documents. It can also speed up mapping JSON objects and arrays to Java objects. It for sure speeds up the process by avoiding tons of buffer copies. This is especially the case if you only want a portion of the JSON payload and want to use the built-in JSONPath support.

But that’s not all - with our JSON parser you can quickly implement advanced features, such as support for incremental parsing, in-memory compression, and automatic schema generation. The index overlay feature is a boon and makes our parser the most efficient, flexible, and developer-friendly JSON parser available today.

Whether you’re processing massive data sets, building complex data pipelines, or simply looking for a faster, more efficient way to parse JSON data, our index overlay JSON parser is the ideal solution. Try it today and experience the power of real-time JSON parsing for yourself!

What is an Index overlay parser?

An index overlay parser is a type of parser used for processing structured data, particularly JSON data. Unlike traditional parsers that typically parse the entire data document before returning the results, an index overlay parser uses a mechanism that allows for real-time data access and analysis before the data is fully parsed.

Specifically, an index overlay parser creates an index or a mapping of the data elements in the JSON document during the parsing process. This index allows the parser to quickly access and retrieve specific data elements in the document as needed without having to re-parse the entire document again. This dramatically speeds up the parsing process, especially for large or complex JSON documents, and can also reduce the memory requirements for parsing.

For example, suppose you have a large JSON document containing multiple nested objects and need only extract specific data from it. With a traditional parser, you must parse the entire document and

traverse the object hierarchy to find the needed data. With an index overlay parser, however, the parser can access the specific data element directly using the index it has created during parsing, significantly reducing the time and memory required to process the document.

In summary, an index overlay parser offers a more efficient and faster way to process JSON data by creating an index of the data elements during parsing, allowing for real-time data access and analysis before it is completely parsed.

Let’s compare an index overlay parser to an event-based and DOM-style parser.

The usual basic approach for JSON parsing (creating a DOM):

  • Parse the input JSON string character by character.
  • Identify the different types of JSON objects (arrays, objects, strings, numbers, booleans, null) and their respective syntax rules.
  • Use a stack or recursively walk the JSON characters to keep track of the current level of nested objects and store the parsed data in a structured data format (e.g., a hash table or linked list).
  • Handle errors and invalid input (e.g., incorrect syntax, unexpected characters).
  • Return the parsed data structure.

An index overlay approach to JSON Parsing:

  • Parse the input JSON string character by character.
  • Identify the different types of JSON objects (arrays, objects, strings, numbers, booleans, null) and their respective syntax rules.
  • Use a stack to keep or recursively walk the JSON characters track of the current level of nested objects and store the unparsed data in an Index overlay data format (e.g., array or flat list of tokens that keep track of start position, end position, and token type).
  • Handle errors and invalid input (e.g., incorrect syntax, unexpected characters).
  • Return a lazy version of the parsed data structure (e.g., a hash table or linked list) where the final parse happens when a subelement is requested.

An event-based approach to JSON Parsing:

  • Parse the input JSON string character by character.
  • Identify the different types of JSON objects (arrays, objects, strings, numbers, booleans, null) and their respective syntax rules.
  • Use a stack or recursively walk the JSON characters to keep track of the current level of nested objects and store the unparsed data in and issue events like START_ARRAY, START_OBJECT, START_OBJECT_ATTRIBUTE, END_ARRAY, etc. The events should have the index location and access to the buffer at that index range.
  • Handle errors and invalid input (e.g., incorrect syntax, unexpected characters).
  • Return nothing because you instead issue events.

Comparing an event-based approach to building a full DOM or doing an index overlay is like comparing a car (DOM and index overlay) to an engine.

If you would like to learn more about index overlay parsers vs. DOM and Event parsers, read this JSON parser description: DOM vs. Index Overlay vs. Event Driven.

Why not just update Boon?

I am one of Boon’s original authors. Boon was a utility library that became a JSON parser. Boon got much attention back in 2014, and it was the fastest way to parse JSON and serialize to/from JavaBeans circa 2014.

Why not just update Boon? Boon was meant to be a utility lib and became a JSON parser. Boon is 90,000 lines of code (just Java main, not including test classes). Boon does too many things that no one uses. It was due for a complete redesign. It also uses Unsafe, which you can’t do in a later version of Java.

JParse is feature complete now and is only 4,570 lines long vs. 90,000 LoC of Boon. Jackson core is 55,000 LOC, and other libs are needed for various data types and mappings if you use Jackson.

What is JParse?

JParse is a JSON parser plus a small subset of JSONPath. It is small (just 4,200 lines long). JParse uses an index overlay from the ground up which lays the foundation for quick JSONPath lookups as well as very fast mapping. It will not grow in the feature set. Any other features will be part of other libs.

Is JParse fast?

Yes.

Is JParse fast?

Benchmark                             Mode  Cnt        Score   Error  Units
BenchMark.jParseBigDecimalArrayFast  thrpt    2  1201663.430          ops/s
BenchMark.jacksonBigDecimalArray     thrpt    2   722409.093          ops/s

BenchMark.jParseDoubleArrayFast      thrpt    2   890538.018          ops/s
BenchMark.jacksonDoubleArray         thrpt    2   627404.869          ops/s
Benchmark                                      Mode  Cnt        Score   Error  Units
BenchMark.readGlossaryJParse                  thrpt    2  1034323.573          ops/s
BenchMark.readGlossaryNoggit                  thrpt    2   830511.356          ops/s
BenchMark.readGlossaryNoggitObjectBuilder     thrpt    2   541948.355          ops/s
BenchMark.readGlossaryJackson                 thrpt    2   468925.690          ops/s

To see more JSON benchmarks go here. Although JParse might be fast. Jackson, for example, is well-known and has 15 to 20 years of maturity (bug fixes). Unless you are looking for a small JSON parser, you are always better off using Jackson. JParse is excellent for folks who want to either embed a parser (copy and paste JParse into their project giving credit to JParse, of course) or want a lightweight JSON parser that works well for things like AWS Lambda or a Docker image. If you want powerful JSON parsing with a small footprint, JParse is your pick. See caveats at the end of the article.

Using JParse

Sample JSON file for departments

{
  "departments": [
    {
      "departmentName" : "Engineering",
      "employees": [
        {
          "firstName": "Bob",
          "lastName": "Jones",
          "dob": "05/22/1990",
          "manager": true,
          "id": 111,
          "managerId": -1
        },
        {
          "firstName": "Rick",
          "lastName": "Hightower",
          "dob": "05/22/1990",
          "manager": false,
          "id": 777,
          "managerId": 111
        },
        {
          "firstName": "Cindy",
          "lastName": "Torre-alto",
          "dob": "04/15/1993",
          "manager": true,
          "id": 999,
          "managerId": 111
        }
      ]
    },
    {
      "departmentName" : "HR",
      "employees": [
        {
          "firstName": "Sarah",
          "lastName": "Jones",
          "dob": "05/22/1990",
          "manager": true,
          "id": 222,
          "managerId": 111
        },
        {
          "firstName": "Sam",
          "lastName": "Smith",
          "dob": "05/22/1990",
          "manager": true,
          "id": 555,
          "managerId": 222
        },
        {
          "firstName": "Suzy",
          "lastName": "Jones",
          "dob": "04/15/1993",
          "manager": true,
          "id": 551,
          "managerId": 222
        }
      ]
    }
  ]
}

Note this is a JSON payloads that has departments with two departments of employees, namely, HR and engineering departments.

Read JSON from a file.

final File file = new File("./src/test/resources/json/depts.json");

final var rootNode = Json.toRootNode(Sources.fileSource(file));

Grab just the employees from the engineering department

 final File file = new File("./src/test/resources/json/depts.json");

            final var rootNode = Json.toRootNode(Sources.fileSource(file));

            final var engineeringEmployees = Path.atPath("departments[0].employees", rootNode)
                    .asCollection().asArray();

           System.out.println("      " + engineeringEmployees.toJsonString());

The object engineeringEmployees is a JSON node and it just contains the data from that part of the JSON document.

Grab just the employees from the engineering department Output

 [
        {
          "firstName": "Bob",
          "lastName": "Jones",
          "dob": "05/22/1990",
          "manager": true,
          "id": 111,
          "managerId": -1
        },
        {
          "firstName": "Rick",
          "lastName": "Hightower",
          "dob": "05/22/1990",
          "manager": false,
          "id": 777,
          "managerId": 111
        },
        {
          "firstName": "Cindy",
          "lastName": "Torre-alto",
          "dob": "04/15/1993",
          "manager": true,
          "id": 999,
          "managerId": 111
        }
      ]

See how easy it is to slice and dice up a JSON doc?

Get the Cindy node from the engineeringEmployees array

 final var cindy = Path.atPath("[2]", engineeringEmployees);

            System.out.println("      " + cindy.toJsonString());

Get the Cindy node from the engineeringEmployees array Output

 {
          "firstName": "Cindy",
          "lastName": "Torre-alto",
          "dob": "04/15/1993",
          "manager": true,
          "id": 999,
          "managerId": 111
        }

Use different JSONPath expressions to get parts of the cindy node

 final var cindyName = Path.atPath("[2].firstName", engineeringEmployees);
            final var cindyId = Path.atPath(".id", cindy);
            final var manager = Path.atPath("[2].manager", engineeringEmployees);

Note that JSONPath nodes are always basic Java types and the API is always easy to use. There are no surprises.

Just List, Map, Number, and CharSequences. No complicated API

 if (manager.asScalar().booleanValue()) {
                System.out.printf("This employee %s is a manager %s \n", cindyName, manager);
            }

            if (cindyName.asScalar().equalsString("Cindy")) {
                System.out.printf("The employee's name is  Cindy %s \n", cindyId);
            }

            if (cindyName instanceof CharSequence) {
                System.out.println("cirhyeName is a CharSequence");
            }

            if (cindyId instanceof Number) {
                System.out.println("cindyId is a Number");
            }

            if (engineeringEmployees instanceof List) {
                System.out.println("engineeringEmployees is a List " + engineeringEmployees.getClass().getName());
            }

            if (cindy instanceof Map) {
                System.out.println("cindy is a Map " + cindy.getClass().getName());
            }

Keep in mind that there is no complicated API to learn:

  • java.lang.Number (JSON number)
  • java.util.List (JSON array)
  • Java array (e.g., int[]) (JSON array)
  • java.lang.CharSequence (JSON String and every node)
  • java.util.Map (JSON object)

In fact, the API for looking things up and mappings is more or less the Java streams API.

Using Java streams and Java functional programming with JParse

 final var rick = engineeringEmployees.stream()
                    .map(node -> node.asCollection().asObject())
                    .filter(objectNode ->
                            objectNode.getString("firstName").equals("Rick")
                    ).findFirst();

            rick.ifPresent(node -> {
                System.out.println("Found  " + node);

                exploreNode(node);
            });

We have automated and sped up common mappings and filtering.

Using JParse to do functional mapping and find operations

 final var rick2 = engineeringEmployees.findObjectNode(
                    objectNode ->
                            objectNode.getString("firstName").equals("Rick")

            );

            rick2.ifPresent(node -> {
                System.out.println("Found  " + node);

                exploreNode(node);
            });

The above does the same as the example previously to it but in less lines of code.

Object mappings are powerful and easy to implement

 public record Employee(String firstName, String lastName,
                       String dob, boolean manager,
                       int id, int managerId) {
            }
            ...
            final var  employees = engineeringEmployees.mapObjectNode(on ->
                    new Employee(on.getString("firstName"), on.getString("lastName"),
                            on.getString("dob"), on.getBoolean("manager"),
                            on.getInt("id"), on.getInt("managerId"))
            );

            employees.forEach(System.out::println);

Output

 Employee[firstName=Bob, lastName=Jones, dob=05/22/1990, manager=true, id=111, managerId=-1]
Employee[firstName=Rick, lastName=Hightower, dob=05/22/1990, manager=false, id=777, managerId=111]
Employee[firstName=Cindy, lastName=Torre-alto, dob=04/15/1993, manager=true, id=999, managerId=111]

Object mappings are powerful and easy to implement for nested cases too

 public record Employee(String firstName, String lastName,
                       String dob, boolean manager,
                       int id, int managerId) {
            }

            public record Department (String name,
                                      List<Employee> employees) {
            }

            ...
            final var departmentsNode = Path.atPath("departments", json).asCollection().asArray();

            final var departments = departmentsNode.mapObjectNode(on ->
                    new Department(on.getString("departmentName"),
                    on.getArrayNode("employees").mapObjectNode(en ->
                    new Employee(en.getString("firstName"), en.getString("lastName"),
                    en.getString("dob"), en.getBoolean("manager"),
                    en.getInt("id"), en.getInt("managerId"))
                    )));

                    departments.forEach(System.out::println);

Working with objects attributes is quite easy, and you have many options.

Output

 Department[name=Engineering, employees=[Employee[firstName=Bob, lastName=Jones, dob=05/22/1990, manager=true, id=111, managerId=-1], Employee[firstName=Rick, lastName=Hightower, dob=05/22/1990, manager=false, id=777, managerId=111], Employee[firstName=Cindy, lastName=Torre-alto, dob=04/15/1993, manager=true, id=999, managerId=111]]]
Department[name=HR, employees=[Employee[firstName=Sarah, lastName=Jones, dob=05/22/1990, manager=true, id=222, managerId=111], Employee[firstName=Sam, lastName=Smith, dob=05/22/1990, manager=true, id=555, managerId=222], Employee[firstName=Suzy, lastName=Jones, dob=04/15/1993, manager=true, id=551, managerId=222]]]

Working with object attributes

 private static void exploreNode(ObjectNode node) {

        int id = node.getInt("id");
        String name = node.getString("firstName");
        String dob = node.getString("dob");
        boolean isManager = node.getBoolean("manager");
        int managerId = node.getInt("managerId");

        System.out.printf("%d %s %s %s %d \n", id, name, dob, isManager, managerId);

        final var idNode = node.getNumberNode("id");
        final var nameNode = node.getStringNode("firstName");
        final var dobNode = node.getStringNode("dob");
        final var isManagerNode = node.getBooleanNode("manager");
        final var managerIdNode = node.getNumberNode("managerId");

        System.out.printf("%s %s %s %s %s \n", idNode, nameNode, dobNode, isManagerNode, managerIdNode);

        System.out.printf("%d %s %s %s %d \n", idNode.intValue(),
                nameNode.toString(), dobNode.toString(),
                isManagerNode.booleanValue(), managerIdNode.intValue());

    }

Output

777 Rick 05/22/1990 false 111
777 Rick 05/22/1990 false 111
777 Rick 05/22/1990 false 111

## Caveats and limitations

  • Not done yet (BenchMark has changed a bit – some got slightly better most some got slightly worse)
  • Still needs to improve error handling (Did a solid first pass, but there is room for improvement).
  • Boon had excellent error messages, and JParse will too (Boon was not as good as I thought, actually)
  • Does not support any extras (by design)
  • Does not support NaN, +Infinity, -Infinity because that is not in the JSON org spec.
  • Does not support any attribute key types but Strings as per JSON org spec.
  • Just strict JSON no minimal JSON (Boon and others support strict and nonstrict modes, JParse only does strict)
  • It is new, so there could be bugs and mistakes.

Conclusion

Thanks for reading this article and allowing us to introduce the most efficient JSON parser for the JVM. JParse uses an index overlay to deliver lightning-fast parsing speeds and unparalleled accuracy. This JSON parser is designed to revolutionize the way developers process and analyze JSON data, by providing a smarter, more efficient approach to JSON parsing. Please check JParse out and provide feedback.

About the Author

Rick Hightower is a seasoned software engineer and technology innovator with decades of experience in the field. As the creator of JParse, Rick brings his extensive knowledge of Java and JSON parsing to deliver cutting-edge solutions for developers. With a passion for efficiency and performance, Rick has contributed to numerous open-source projects and is a respected voice in the software development community.

Throughout his career, Rick has focused on developing tools and frameworks that simplify complex tasks for developers. His work on JParse exemplifies his commitment to creating high-performance, user-friendly technologies that push the boundaries of what’s possible in data processing and analysis.

When not coding, Rick enjoys sharing his knowledge through technical writing and speaking at industry conferences. He is dedicated to mentoring the next generation of software engineers and promoting best practices in software development.

For more insights from Rick, check out his articles on LinkedIn or follow his latest projects on GitHub.

                                                                           
comments powered by Disqus

Apache Spark Training
Kafka Tutorial
Akka Consulting
Cassandra Training
AWS Cassandra Database Support
Kafka Support Pricing
Cassandra Database Support Pricing
Non-stop Cassandra
Watchdog
Advantages of using Cloudurable™
Cassandra Consulting
Cloudurable™| Guide to AWS Cassandra Deploy
Cloudurable™| AWS Cassandra Guidelines and Notes
Free guide to deploying Cassandra on AWS
Kafka Training
Kafka Consulting
DynamoDB Training
DynamoDB Consulting
Kinesis Training
Kinesis Consulting
Kafka Tutorial PDF
Kubernetes Security Training
Redis Consulting
Redis Training
ElasticSearch / ELK Consulting
ElasticSearch Training
InfluxDB/TICK Training TICK Consulting