I got the naive tnetstrings working well and now I have a simple speed
test harness going:
http://codepad.org/Uj42SuMo
Preliminary tests by doing the roundtrip tests with 300k runs gets
about:
JSON:
real 3m24.780s
user 3m24.520s
sys 0m0.020s
tnetstrings:
real 1m5.692s
user 1m5.650s
sys 0m0.010s
SimpleJSON:
real 0m57.427s
user 0m57.350s
sys 0m0.030s
So, the stock JSON in python sort of sucks, but apparently simplejson
includes a little _speedups.c module that triples its performance.
That's very interesting to find out (and sort of annoying).
The naive tnetstrings then is at least as fast as simplejson with its C
library, and 3x the speed of stock JSON. Now the question is if that
naive implementation can be sped up any, or is it only possible to speed
it up with C the way SimpleJSON did.
As for the code size, tnetstrings wins by a whole hell of a lot being
about 1/10th the size.
Finally, the real test will be how it fairs in C. The whole purpose is
really to have a fast thing that Mongrel2 can use for "internal" stuff
like proxy handlers, control port stuff, etc.
--
Zed A. Shaw
http://zedshaw.com/
On 17:19 Sun 20 Mar , Zed A. Shaw wrote: > I got the naive tnetstrings working well and now I have a simple speed > test harness going: I said this on IRC, but posting it here for the record. I've implemented a simple tnetstrings for common lisp here: https://github.com/jasom/tnetstring Note that for deeply-nested structures, writing out tnetstrings is going to be slower than json, since tnetstrings needs to render all children structures before it can write-out the length of the parent structure. On the other-hand decoding deeply-nested tnetstrings is a big win vs json since you don't have to search for (possibly quoted) delimiters. I don't think this will affect mongrel2 much though, since I think the maximum depth mongrel2 is going to encode should be 3: headers dictionary: cookies dictionary: list of cookies -Jason P.S. If you want to see the asymmetry in relative performance on encode/decode try out something like: "243:238:233:228:223:218:213:208:203:198:193:188:183:178:173:168:163:158:153:148:143:138:133:128:123:118:113:108:103:99:95:91:87:83:79:75:71:67:63:59:55:51:47:43:39:35:31:27:23:19:15:11:hello-there,]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]"
On Tue, 2011-03-22 at 15:24 -0700, Jason Miller wrote: > On 17:19 Sun 20 Mar , Zed A. Shaw wrote: > > I got the naive tnetstrings working well and now I have a simple speed > > test harness going: > I said this on IRC, but posting it here for the record. > > I've implemented a simple tnetstrings for common lisp here: > https://github.com/jasom/tnetstring > > Note that for deeply-nested structures, writing out tnetstrings is going > to be slower than json, since tnetstrings needs to render all children > structures before it can write-out the length of the parent structure. Yeah, that's a pain. The pure-python version has the same problem, it has to generate lots of little string objects to figure out their lengths, then join them all together. A trick that seems to work nicely in the C-extension version is to render everything in reverse, i.e. you output the typecode tag, then recursively render the nested structures in reverse, then write the length. This lets you write everything into one big char* buffer the do a single in-place reverse to fix it up at the end. Not sure how feasible this would be in higher-level languages. Interestingly, the cjson module also renders JSON by producing lots of little strings and joining them all together, even though it doesn't have to. Because of this, _tnetstring absolutely *smokes* cjson rendering your deeply nested list example - almost 85% speedup on my machine. Cheers, Ryan > On the other-hand decoding deeply-nested tnetstrings is a big win vs > json since you don't have to search for (possibly quoted) delimiters. > > I don't think this will affect mongrel2 much though, since I think the > maximum depth mongrel2 is going to encode should be 3: > > headers dictionary: > cookies dictionary: > list of cookies > > -Jason > > P.S. > If you want to see the asymmetry in relative performance on > encode/decode try out something like: > "243:238:233:228:223:218:213:208:203:198:193:188:183:178:173:168:163:158:153:148:143:138:133:128:123:118:113:108:103:99:95:91:87:83:79:75:71:67:63:59:55:51:47:43:39:35:31:27:23:19:15:11:hello-there,]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]" > > -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
http://dpaste.de/dFOH/ Here's a simple PHP one. It's 10 times slower than the native JSON parser, which I assume is done in C.
On 2011-03-23 06:46, Tordek wrote: > http://dpaste.de/dFOH/ > > Here's a simple PHP one. It's 10 times slower than the native JSON > parser, which I assume is done in C. Great! I will work on it to find the hot spots and see if we can do better. Anyway, I will create a PHP C extension for tnetstrings. Yes json_encode and json_decode are coded in C. Also, note that json_encode is not binary safe in PHP where tnetstrings are. loïc
On Wed, 2011-03-23 at 08:52 +0100, Loic d'Anterroches wrote: > > On 2011-03-23 06:46, Tordek wrote: > > http://dpaste.de/dFOH/ > > > > Here's a simple PHP one. It's 10 times slower than the native JSON > > parser, which I assume is done in C. > > Great! I will work on it to find the hot spots and see if we can do > better. Anyway, I will create a PHP C extension for tnetstrings. I've put my python C-extension up on github, feel free to cannibalise any of it if you want. The stuff in tns_core.c should be fairly re-usable: https://github.com/rfk/tnetstring/blob/master/tnetstring/tns_core.c Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
On Wed, Mar 23, 2011 at 07:11:28PM +1100, Ryan Kelly wrote: > I've put my python C-extension up on github, feel free to cannibalise > any of it if you want. The stuff in tns_core.c should be fairly > re-usable: Damn, I gotta read the mailing list more often. I'll check this out more and maybe ditch what I've got. -- Zed A. Shaw http://zedshaw.com/
On 2011-03-23 09:11, Ryan Kelly wrote: > On Wed, 2011-03-23 at 08:52 +0100, Loic d'Anterroches wrote: >> >> On 2011-03-23 06:46, Tordek wrote: >>> http://dpaste.de/dFOH/ >>> >>> Here's a simple PHP one. It's 10 times slower than the native JSON >>> parser, which I assume is done in C. >> >> Great! I will work on it to find the hot spots and see if we can do >> better. Anyway, I will create a PHP C extension for tnetstrings. > > I've put my python C-extension up on github, feel free to cannibalise > any of it if you want. The stuff in tns_core.c should be fairly > re-usable: > > https://github.com/rfk/tnetstring/blob/master/tnetstring/tns_core.c Thanks a lot! And your code is a pleasure to read. Nicely indented, with nice spacing and vertical rythm. loïc
On 20/03/11 21:19, Zed A. Shaw wrote: > The naive tnetstrings then is at least as fast as simplejson with its C > library, and 3x the speed of stock JSON. Now the question is if that > naive implementation can be sped up any, or is it only possible to speed > it up with C the way SimpleJSON did. With some trivial optimizations (still naïve; just faster naïve) I've gotten ~10% speedup that now reliably beats simplejson on my machine: simplejson: 14.1191418171 Zed's: 15.4922268391 Tordek's: 14.0526180267 http://codepad.org/4pnvwqCd Each is tested with: print "Tordek's: ", timeit.timeit("thrash_tnetstrings()", "from tnetstr import thrash_tnetstrings", number=100000) or similar. -- Guillermo O. «Tordek» Freschi. Programador, Escritor, Genio Maligno. http://tordek.com.ar :: http://twitter.com/tordek http://www.arcanopedia.com.ar - Juegos de Rol en Argentina
On Mon, Mar 21, 2011 at 03:03:29PM -0300, Tordek wrote: > On 20/03/11 21:19, Zed A. Shaw wrote: > > The naive tnetstrings then is at least as fast as simplejson with its C > > library, and 3x the speed of stock JSON. Now the question is if that > > naive implementation can be sped up any, or is it only possible to speed > > it up with C the way SimpleJSON did. > > With some trivial optimizations (still naïve; just faster naïve) > I've gotten ~10% speedup that now reliably beats simplejson on my > machine: Cool, I'll incorporate this when I hack on it tonight. -- Zed A. Shaw http://zedshaw.com/
On 21/03/11 19:33, Zed A. Shaw wrote: > On Mon, Mar 21, 2011 at 03:03:29PM -0300, Tordek wrote: >> On 20/03/11 21:19, Zed A. Shaw wrote: >> > The naive tnetstrings then is at least as fast as simplejson with its C >> > library, and 3x the speed of stock JSON. Now the question is if that >> > naive implementation can be sped up any, or is it only possible to speed >> > it up with C the way SimpleJSON did. >> >> With some trivial optimizations (still naïve; just faster naïve) >> I've gotten ~10% speedup that now reliably beats simplejson on my >> machine: > > Cool, I'll incorporate this when I hack on it tonight. > And here's Darren Rush's idea, which skips an assigment; more repetition, but another 10% or so: def parse_tnetstring(data): payload, payload_type, remain = parse_payload(data) if payload_type == ',': return payload, remain elif payload_type == '#': return int(payload), remain elif payload_type == '!': return payload == 'true', remain elif payload_type == '~': assert len(payload) == 0, "Payload must be 0 length for null." return None, remain elif payload_type == '}': return parse_dict(payload), remain elif payload_type == ']': return parse_list(payload), remain else: assert False, "Invalid payload type: %r" % payload_type -- Guillermo O. «Tordek» Freschi. Programador, Escritor, Genio Maligno. http://tordek.com.ar :: http://twitter.com/tordek http://www.arcanopedia.com.ar - Juegos de Rol en Argentina
I get mostly similar results. Runs of 50k (I'm impatient), like: [bpowers@vyse tns]$ time python zed.py tns 50000 (using 'user' time) [bpowers@vyse tns]$ python --version Python 2.7.1 Zed: 0m8.486s Tordek: 0m7.708s Darren: 0m7.544s But, json got a lot faster: json: 0m7.510s simplejson: 0m7.320s (simplejson v2.1.3) Wow! json tripled in speed, and is now almost as fast as simplejson! This perplexed me until I found this gem in the python 2.7 release notes: * Updated module: The json module was upgraded to version 2.0.9 of the simplejson package, which includes a C extension that makes encoding and decoding faster. (Contributed by Bob Ippolito; issue 4136.) yours, Bobby On Mon, Mar 21, 2011 at 4:36 PM, Tordek <kedrot@gmail.com> wrote: > > On 21/03/11 19:33, Zed A. Shaw wrote: > > On Mon, Mar 21, 2011 at 03:03:29PM -0300, Tordek wrote: > >> On 20/03/11 21:19, Zed A. Shaw wrote: > >> > The naive tnetstrings then is at least as fast as simplejson with its C > >> > library, and 3x the speed of stock JSON. Now the question is if that > >> > naive implementation can be sped up any, or is it only possible to speed > >> > it up with C the way SimpleJSON did. > >> > >> With some trivial optimizations (still naïve; just faster naïve) > >> I've gotten ~10% speedup that now reliably beats simplejson on my > >> machine: > > > > Cool, I'll incorporate this when I hack on it tonight. > > > > And here's Darren Rush's idea, which skips an assigment; more > repetition, but another 10% or so: > > def parse_tnetstring(data): > payload, payload_type, remain = parse_payload(data) > > if payload_type == ',': > return payload, remain > elif payload_type == '#': > return int(payload), remain > elif payload_type == '!': > return payload == 'true', remain > elif payload_type == '~': > assert len(payload) == 0, "Payload must be 0 length for null." > return None, remain > elif payload_type == '}': > return parse_dict(payload), remain > elif payload_type == ']': > return parse_list(payload), remain > else: > assert False, "Invalid payload type: %r" % payload_type > > > -- > Guillermo O. «Tordek» Freschi. Programador, Escritor, Genio Maligno. > http://tordek.com.ar :: http://twitter.com/tordek > http://www.arcanopedia.com.ar - Juegos de Rol en Argentina