Alright, tweaked the code a bit more to support all the data types that JSON has, and be backward compatible with existing netstrings: http://codepad.org/gfpQLnZP That will parse the original netstring as a "blob" which in python is just a string. A couple of design points: 1. It's actually easier to get the type after you load the data. You have to read the size, then the ':', then read the full message anyway, so having the type ahead of time doesn't buy you much. Having it at the end makes it easy to recursively deal with or loop over. 2. Putting balanced separators makes the parsing code harder. In Python you'd have to split with a regex on all the possible chars instead of just ':'. 3. Backward compatibility with netstrings is crucial because then people can use this library to handle the netstrings we already use even if they use JSON. Looking at the code for a lot of netstrings libs out there it's kind of bad stuff. 4. So far the other implementations people have offered up are showing that parsing this simple text is really easy (as compared with binary formats): http://codepad.org/8NZTj74s http://re-factor.blogspot.com/2011/03/typed-netstrings.html Next up, generating this from the above. I'll probably have to make an explicit Blob() class for Python so that it round-trips. -- Zed A. Shaw http://zedshaw.com/
On Sun, Mar 20, 2011 at 3:52 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > Alright, tweaked the code a bit more to support all the data types that > JSON has, and be backward compatible with existing netstrings: > > http://codepad.org/gfpQLnZP Ruby version based on http://codepad.org/Uj42SuMo : http://codepad.org/qhgbXDPP Benchmark results added at the bottom in a comment. Summary is that it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's Java-based JSON. Only tested on JRuby 1.6 and Zed's 'TESTS' sample tnetstring data. -Isaac
On Mon, Mar 21, 2011 at 02:18:14AM -0700, Isaac Force wrote: > Ruby version based on http://codepad.org/Uj42SuMo : > > http://codepad.org/qhgbXDPP Alright so it's coming out to be about 100 lines of code in any language to implement. Sounds like a winner for simplicity. > Benchmark results added at the bottom in a comment. Summary is that > it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's > Java-based JSON. Is this comparing pure Ruby tnetstring to JRuby? Can you do JRuby-tnetstring against JRuby-json? I'm curious if it's the jruby or the json that's faster. -- Zed A. Shaw http://zedshaw.com/
On Mon, Mar 21, 2011 at 9:37 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > On Mon, Mar 21, 2011 at 02:18:14AM -0700, Isaac Force wrote: >> http://codepad.org/qhgbXDPP > >> Benchmark results added at the bottom in a comment. Summary is that >> it's ~3.5x faster than the pure-Ruby JSON, but ~2x slower than JRuby's >> Java-based JSON. > > Is this comparing pure Ruby tnetstring to JRuby? Can you do > JRuby-tnetstring against JRuby-json? I'm curious if it's the jruby or > the json that's faster. All tests were done with the same JRuby VM. When you require 'json' logic is included that attempts to load a binary acceleration library for the platform it's running in; in this case that's a Java lib. Requiring 'java/pure' skips that and uses a pure-Ruby JSON lib. tnetstring vs. json/pure = 3.5x faster tnetstring vs. json(+java) = 2x slower Pure-Ruby vs. pure-Ruby in the same VM the tnetstring code is faster than the JSON lib that ships with Ruby. Is that what you're asking? -Isaac
On Mon, Mar 21, 2011 at 11:28:51AM -0700, Isaac Force wrote: > tnetstring vs. json/pure = 3.5x faster > tnetstring vs. json(+java) = 2x slower > > Pure-Ruby vs. pure-Ruby in the same VM the tnetstring code is faster > than the JSON lib that ships with Ruby. > > Is that what you're asking? Yep, perfect. That's actually kind of interesting because cjson under the Python library is idiotic fast compared to tnetstrings or simplejson. Like 250x faster. I was thinking the JVM might be able to make naive tnetstrings fast, and at least be able to do json fast, but looks like not really. Now I'm curious what a C implementation of tnetstrings can do. -- Zed A. Shaw http://zedshaw.com/
could not decode message
On Tue, 2011-03-22 at 14:38 +1100, Ryan Kelly wrote: > On Mon, 2011-03-21 at 15:36 -0700, Zed A. Shaw wrote: > > > > > > Yep, perfect. That's actually kind of interesting because cjson under > > the Python library is idiotic fast compared to tnetstrings or > > simplejson. Like 250x faster. > > > > Now I'm curious what a C implementation of tnetstrings can do. > > Attached is a start. It's a "_tnetstring" module for python written in > the style of the cjson module, i.e. a pure-C parsing core with hooks > back into the python API. > > On my machine, it goes head-to-head with cjson: > > $> python shootout.py > cjson: 0.00308704376221 > _tnetstring 0.0030951499939 Ahem. As Tordek points out, those are stupidly small numbers. There was an error in my shootout code which meant it was basically timing nothing. Fortunately, the results are still very similar when it actually runs the two parsers: $> python shootout.py cjson: 1.35818314552 _tnetstring 1.35400009155 Ryan > > That's the result of about 2 hours of hacking, so there's probably room > for a fair bit of optimisation. Currently it only parses, doesn't > render. It also segfaults on bad input from time to time. > > Still, I think beating cjson is very doable without much work. > > The parser core is written to use a struct of callback functions to > build up the result, so it should be straightforward to adapt for use > outside of python. If I get a chance, I'll try to do a version using > the ADTs from mongrel2. > > > Cheers, > > Ryan > -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
could not decode message
On Sun, 2011-03-20 at 15:52 -0700, Zed A. Shaw wrote: > Alright, tweaked the code a bit more to support all the data types that > JSON has, and be backward compatible with existing netstrings: > > http://codepad.org/gfpQLnZP > > That will parse the original netstring as a "blob" which in python is > just a string. > > > Next up, generating this from the above. I'll probably have to make an > explicit Blob() class for Python so that it round-trips. The sqlite3 bindings make you wrap strings with the built-in buffer object to indicate bytes-vs-text. Might be simpler than creating your own Blob class. Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
On Sun, 2011-03-20 at 15:52 -0700, Zed A. Shaw wrote: > Alright, tweaked the code a bit more to support all the data types that > JSON has, and be backward compatible with existing netstrings: > > http://codepad.org/gfpQLnZP > > That will parse the original netstring as a "blob" which in python is > just a string. So what's the difference between a string and a blob? Since this is a wire-format, they're both coming in as bytes. Is there an encoding or something? I'd rather not see a python-style "mongrel3" transition just to sort out a strings-vs-bytes issue that we left ambiguous early in the design :-) Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote: > So what's the difference between a string and a blob? Since this is a > wire-format, they're both coming in as bytes. Is there an encoding or > something? Good point, maybe say just , and no " and say it's bytes always, with no interpretation? -- Zed A. Shaw http://zedshaw.com/
On 2011-03-21 00:11, Zed A. Shaw wrote: > On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote: >> So what's the difference between a string and a blob? Since this is a >> wire-format, they're both coming in as bytes. Is there an encoding or >> something? > > Good point, maybe say just , and no " and say it's bytes always, with no > interpretation? Yes please. It is a really big can of worms starting to do the string and blob difference. You will get endless of problems because so many people don't even know the concept of encoding. So you will get a lot of problems because the length of the string in the netstring will not match the length as given by the encoded aware len("string"). Please, no encoding dependency at the protocol level. loïc
On Sun, 2011-03-20 at 16:11 -0700, Zed A. Shaw wrote: > On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote: > > So what's the difference between a string and a blob? Since this is a > > wire-format, they're both coming in as bytes. Is there an encoding or > > something? > > Good point, maybe say just , and no " and say it's bytes always, with no > interpretation? Or to quote the mongrel2 manual: "Sorry, Unicodians, It's All ASCII" Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
It's not ascii though, just make it 8 bit clean and everybody's happy. Our wire formats may be ascii, or not, it shouldn't matter to the protocol. On Mon, Mar 21, 2011 at 10:14 AM, Ryan Kelly <ryan@rfk.id.au> wrote: > On Sun, 2011-03-20 at 16:11 -0700, Zed A. Shaw wrote: > > On Mon, Mar 21, 2011 at 10:04:14AM +1100, Ryan Kelly wrote: > > > So what's the difference between a string and a blob? Since this is a > > > wire-format, they're both coming in as bytes. Is there an encoding or > > > something? > > > > Good point, maybe say just , and no " and say it's bytes always, with no > > interpretation? > > Or to quote the mongrel2 manual: "Sorry, Unicodians, It's All ASCII" > > Cheers, > > Ryan > > -- > Ryan Kelly > http://www.rfk.id.au | This message is digitally signed. Please visit > ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details > >