speed of parsing json structures_问答_开发者_运维开发者技术经验分享

I want to make a simple database system, amd possibly use JSON as the main data format for importing and exporting (including full database backups) . So my question is: how fast is it to开发者_运维技巧 parse JSON, even from big JSON structures (think gigabytes), in comparison to speed when importing from other systems (like (faster) binary files, or (slower) XML)?

EDIT: to clarify, I am wondering how fast it is to parse JSON (into some internal database format), but not how fast it would be as an internal storage mechanism. So this JSON data would not be queried etc., but just parsed into another format.

Also, my main intent asking this question is I am wondering if JSON is any easier to parse than XML because of smaller delimiters (']' or '}' instead of '' or ''), and if it is maybe even similar in speed to binary formats because of the quite simple delimiters. (For example, maybe json can be parsed something like this: record delimiter = ascii code xx (xx being a brace or bracket) except where preceded by ascii xx (xx being some escape char).)

It's definitely much, much slower than MySQL (for a server) or SQLite (for a client) which are preferrable.

Also, JSON speed depends almost solely on the implementation. For instance, you could eval() it, but not only that is very risky, it's also slower than a real parser. At any rate, there are probably much better optimized XML parsers than JSON parsers, just because it's a more used format. (So grab a GB-sized XML and imagine the same results but slower).

Seriously, JSON was never meant for big things. Use a real database if possible.

Edit: why is JSON much slower than a database?

Many reasons. I'll try to list a few.

JSON relies on matching sections such as {}s (much like XML's <>s)

This means a parser has to check where's the ending to an object block. There are other of these such as []s and ""s. In a conventional database there's no "ending tag" or "ending bracket" so it's easier to read.

JSON parsers need to read each and every character before being able to understand the whole object structure.

So before you can even read some of the JSON you have to read the whole file. This means waiting a few minutes at best for the sizes you mentioned, and a database is ready to be queried in less than a second (because the hierarchy is stored at the beginning).

In JSON you can't precalculate offsets.

In a database, size is traded for performance. You can make VARCHAR(512) and all strings will be null-padded to occupy 512 bytes. Why? Because that way you can know the 4th value is at offset 2048 for example. You can't do that with JSON hence performance suffers.

JSON is optimized for small filesizes.

...Because it's a web format.
This may look like a pro but it's a con from a performance perspective.

JSON is a JavaScript subset.

So some parsers might allow unnecessary data to be present and considered, such as comments. Chrome's native JSON used to allow comments for example (not anymore).
No database engine uses eval() right?

JSON is meant to have some error resilience.

People might put anything into a JSON file, so parsers are defensive and try to read invalid files sometimes. Database aren't supposed to repair a broken file silently.
You might hand-code a JSON but not a database!

JSON is a new, unsupported and badly tested format

There are bugs in some native parsers (like IE8's) and support for most browsers is very preliminary and slower than, say, the fastest XML parser out there. Simply because XML was being used for ages and Steve Ballmer has an XML fetish so companies please him by making almost anything under the sun XML-compatible. While JSON is one of Crockford's successful weekend pasttimes.

The best JSON parsers are in browsers

If you pick one random open-source JSON parser for your favourite language, what chances are that it's the best possible parser under the sun? Well, for XML you do have awesome parsers like this But what is there for JSON?

Need more reasons why JSON should be relegated to its intended use case?

If you consider JSON as an intermediate format for data transfer, you might want to consider binary alternatives as well, because they need less disk space and network bandwidth (both compressed and uncompressed), so you may get faster parsing because the input to parse is shorter.

MessagePack
BSON (binary JSON)
Google Protocol Buffers
Apache & Facebook Thrift
Python Pickle implemented as the `cPickle' module with the highest version
Python Marshal (very fast but architecture- and version-dependent)

If you run your own benchmark, make sure to benchmark multiple parsers for the same language, e.g. a JSON parser implemented in pure Python is expected to be much slower than a JSON parser written in C -- but you may find a significant speed difference (up to a factor of 2, but maybe 5) between different implementations in the same programming language as well.

Benchmarks of JSON, XML, and lots of other things can be found in the JVM Serializers project. The results are too complicated to reproduce here, but the best JSON results (comparing both manual and databound classes) are quite a bit better than the best XML results. That comparison isn't complete, but it's a starting point.

EDIT: as of right now (2012-10-30), there are no published results, because the benchmark is being revised. However, there are some preliminary results available.

A database is a file system with faster seeking facility. If you can achieve the same with JSON, then things are easy. You have to make a system to seek things faster from JSON file.