librelist archives

« back to archive

Proposing An Alternative To JSON

Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 19:12
Hey, so I'm tinkering with the idea of a proxy as a handler, but
realizing that we sort of need a second protocol that's faster and
easier to parse in C (and others) than JSON.  JSON's great for getting
things going, but I think we can support both JSON and another protocol.

As an idea, I cooked up "tagged netstrings".  This is simply the idea
that you encode JSON style data as a sequence of nested netstrings with
their character terminators saying what's inside.

Here's a python implementation of parsing it:

http://codepad.org/xct0E5ac

This needs to have one more type of Blob with ',' terminator so that
it's backward compatible with regular netstrings, and you can transmit
binary data safely on platforms where that's not possible (javascript),
but otherwise this is very easy to parse and generate.

Can someone in another language try to replicate this and see how hard
it is?  I'll be doing C next as a test, but I'd like to see a few others
to compare.

Also, no, we won't use protocol bufs, BIRT, or others since those are
hard as hell to parse compared to this and probably don't buy much in
terms of speed for the usability costs.

Thanks!

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ciprian Dorin Craciun
Date:
2011-03-22 @ 17:15
On Sun, Mar 20, 2011 at 21:12, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> [...]
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/


    I apologize for another off-topic email, but I've seen suggestions
for most serialization formats, except the one proposed by Joe
Armstrong, UBF:
        http://www.sics.se/~joe/ubf/site/home.html
        (or a nice doc) http://norton.github.com/ubf/ubf-user-guide.en.html

    It is ASCII based -- thus easily parsable (at least by a computer,
as the syntax is pre-fixed) -- but at the same time it offers both
native binary payload support, and type hinting, being as expressive
as JSON.

    One such Python implementation I've found below, but as you'll see
it's trivial to implement.
    http://www.eighty-twenty.org/~tonyg/Darcs/ubf/python/ubf.py

    Ciprian.

Re: [mongrel2] Proposing An Alternative To JSON

From:
Hedge Hog
Date:
2011-03-21 @ 00:35
On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is?  I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.

I am curious about whether you considered extprot.
I appreciate the need to keep it simple, and for speed - for most
people those are traded-off against functionality.
But maybe the mogrel2 proxy/handler could add value by taking care of
the serialization and de-serialization steps.
The added complexity is a protocol definition file, and whether that
is worth it...

Anyway, some might find extprot of use elsewhere in their stack, so
hopefully this mail is not pure noise.
HTH

>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>



-- 
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
  Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

Re: [mongrel2] Proposing An Alternative To JSON

From:
Hedge Hog
Date:
2011-03-21 @ 02:05
On Mon, Mar 21, 2011 at 11:35 AM, Hedge Hog <hedgehogshiatus@gmail.com> wrote:
> On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON.  JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings".  This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>> Here's a python implementation of parsing it:
>>
>> http://codepad.org/xct0E5ac
>>
>> This needs to have one more type of Blob with ',' terminator so that
>> it's backward compatible with regular netstrings, and you can transmit
>> binary data safely on platforms where that's not possible (javascript),
>> but otherwise this is very easy to parse and generate.
>>
>> Can someone in another language try to replicate this and see how hard
>> it is?  I'll be doing C next as a test, but I'd like to see a few others
>> to compare.
>>
>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>> hard as hell to parse compared to this and probably don't buy much in
>> terms of speed for the usability costs.
>
> I am curious about whether you considered extprot.

Apologies. I should have included this link:
https://github.com/mfp/extprot

> I appreciate the need to keep it simple, and for speed - for most
> people those are traded-off against functionality.
> But maybe the mogrel2 proxy/handler could add value by taking care of
> the serialization and de-serialization steps.
> The added complexity is a protocol definition file, and whether that
> is worth it...
>
> Anyway, some might find extprot of use elsewhere in their stack, so
> hopefully this mail is not pure noise.
> HTH
>
>>
>> Thanks!
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>
>
> --
> πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
> [The fox knows many things, but the hedgehog knows one big thing.]
>   Archilochus, Greek poet (c. 680 BC – c. 645 BC)
> http://wiki.hedgehogshiatus.com
>



-- 
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
  Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com

Re: [mongrel2] Proposing An Alternative To JSON

From:
S. Günther
Date:
2011-03-21 @ 03:10
Here's a pretty horrible transliteration to haskell:

http://codepad.org/wDGmjjUc

Sorry for the ugliness. (But it seems to work.)

I would also like to note that the "bencode" encoding used in the
bittorrent protocol looks kind of similar to the proposed typed
netstrings. There are some key differences though and if those rule out
the format completely, I apologise for adding to the growing list of
proposed alternatives.

kind regards
Stephan Günther

--------------------------------------------------------------------------------

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-21 @ 04:12
On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
> Here's a pretty horrible transliteration to haskell:
> 
> http://codepad.org/wDGmjjUc

Super cool, so far it's looking pretty good for implementation.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
joshua simmons
Date:
2011-03-21 @ 04:24
As a side note, you actually don't have to have the protobuf to use protocol
buffers, there's enough data in the format to read it off the wire without
needing any nasty code gen. It's just not how google want it.

On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
> > Here's a pretty horrible transliteration to haskell:
> >
> > http://codepad.org/wDGmjjUc
>
> Super cool, so far it's looking pretty good for implementation.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Alex Gartrell
Date:
2011-03-22 @ 18:36
+1 here.  I implemented a google protocol buffer parser in C for a
research project (and then we switched to Thrift, which is more of the
same but more limited), so it's doable.  Varint parsing in a language
like javascript seems like it would blow though.  The other thing to
note is that field ids are used rather than field names, so you'll
have to keep "Field 3 = the IP field" available somewhere, which can
be a little less clear/convenient than IP=... (Keep in mind I'm
talking about field with ID 3 rather than the third field, protobufs
allow you to omit or reorder fields).

I think we can go with either of <length> <content> <type> or <length>
<type> <content>.  LTC is the pattern used by Proto bufs and Thrift,
but, in practice, there's no difference because you've read the whole
thing into memory anyway, and you're just skipping ahead by N bytes.



On Mon, Mar 21, 2011 at 12:24 AM, joshua simmons <simmons.44@gmail.com> wrote:
> As a side note, you actually don't have to have the protobuf to use protocol
> buffers, there's enough data in the format to read it off the wire without
> needing any nasty code gen. It's just not how google want it.
>
> On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>>
>> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
>> > Here's a pretty horrible transliteration to haskell:
>> >
>> > http://codepad.org/wDGmjjUc
>>
>> Super cool, so far it's looking pretty good for implementation.
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
James Dennis
Date:
2011-03-21 @ 04:28
Another side note, stomp allows specifying content length which avoids the
escape paradox.

I don't take issue with anything else said regarding stomp and neither did
anyone I forwarded this to. :)


On Mon, Mar 21, 2011 at 12:24 AM, joshua simmons <simmons.44@gmail.com>wrote:

> As a side note, you actually don't have to have the protobuf to use
> protocol buffers, there's enough data in the format to read it off the wire
> without needing any nasty code gen. It's just not how google want it.
>
> On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>
>> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
>> > Here's a pretty horrible transliteration to haskell:
>> >
>> > http://codepad.org/wDGmjjUc
>>
>> Super cool, so far it's looking pretty good for implementation.
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-21 @ 00:42
On Mon, 2011-03-21 at 11:35 +1100, Hedge Hog wrote:
> On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> > Hey, so I'm tinkering with the idea of a proxy as a handler, but
> > realizing that we sort of need a second protocol that's faster and
> > easier to parse in C (and others) than JSON.  JSON's great for getting
> > things going, but I think we can support both JSON and another protocol.
> >
> > As an idea, I cooked up "tagged netstrings".  This is simply the idea
> > that you encode JSON style data as a sequence of nested netstrings with
> > their character terminators saying what's inside.
> >
> > Here's a python implementation of parsing it:
> >
> > http://codepad.org/xct0E5ac
> >
> > This needs to have one more type of Blob with ',' terminator so that
> > it's backward compatible with regular netstrings, and you can transmit
> > binary data safely on platforms where that's not possible (javascript),
> > but otherwise this is very easy to parse and generate.
> >
> > Can someone in another language try to replicate this and see how hard
> > it is?  I'll be doing C next as a test, but I'd like to see a few others
> > to compare.
> >
> > Also, no, we won't use protocol bufs, BIRT, or others since those are
> > hard as hell to parse compared to this and probably don't buy much in
> > terms of speed for the usability costs.
> 
> I am curious about whether you considered extprot.
> I appreciate the need to keep it simple, and for speed - for most
> people those are traded-off against functionality.
> But maybe the mogrel2 proxy/handler could add value by taking care of
> the serialization and de-serialization steps.
> The added complexity is a protocol definition file, and whether that
> is worth it...
> 
> Anyway, some might find extprot of use elsewhere in their stack, so
> hopefully this mail is not pure noise.
> HTH

I love extprot, and in fact I maintain the python implementation.
Anyone considering something like protobuf or thrift should definitely
give it a look.

But I think it falls squarely in the "parsing is too complicated" camp
for the uses that Zed has in mind here.  Anything that involves
bit-twiddling is probably out of the question.


  Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-21 @ 04:11
On Mon, Mar 21, 2011 at 11:42:32AM +1100, Ryan Kelly wrote:
> I love extprot, and in fact I maintain the python implementation.
> Anyone considering something like protobuf or thrift should definitely
> give it a look.
> 
> But I think it falls squarely in the "parsing is too complicated" camp
> for the uses that Zed has in mind here.  Anything that involves
> bit-twiddling is probably out of the question.

It also falls into the "type safety is impossible in network protocols".
Protobufs and extprot make you "compile" the protocol:

(* this is a comment (* and this a nested comment *) *)
message user = {
  id : int;
  name : string;
}

In every protocol that's like this (corba, dce, dcom, onc-rpc, etc.) it
becomes nearly impossible to upgrade the protocol if you add fields.
You end up having to add version numbers and altering the protocol to
handle various versions and stubs.  Eventually it becomes a nightmare to
coordinate the release of these protocols.  It's this combination of
structure and semantics that doesn't work because the semantics usually
have to change over time, but the structure usually doesn't.

By comparison, protocols that work well and last are ones that define
structure but not semantics.  Take JSON as an example.  It defines
structures, but not what goes in them so the semantics are left to me.
If I add a field to a hashmap, it'll still get processed by the receiver
and most older clients can just ignore it.  The semantics are controlled
at the application layer and not at the protocol layer so it degrades
better and stands up longer.

A good way to describe the above is if you had to compile all your HTTP
client requests to match exactly what the server expected, right down to
the URLs and header contents.  It'd get pretty impossible to make the
web work if that were the case.

Finally, the myth is that this is faster, but there's rarely any
evidence to back this up.  Usually the few metrics showing speed are
just for simple stuff like the above and not for anything that's deeply
nested and connected.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-21 @ 04:33
On Sun, 2011-03-20 at 21:11 -0700, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 11:42:32AM +1100, Ryan Kelly wrote:
> > I love extprot, and in fact I maintain the python implementation.
> > Anyone considering something like protobuf or thrift should definitely
> > give it a look.
> > 
> > But I think it falls squarely in the "parsing is too complicated" camp
> > for the uses that Zed has in mind here.  Anything that involves
> > bit-twiddling is probably out of the question.
> 
> It also falls into the "type safety is impossible in network protocols".
> Protobufs and extprot make you "compile" the protocol:
> 
> (* this is a comment (* and this a nested comment *) *)
> message user = {
>   id : int;
>   name : string;
> }
>
> In every protocol that's like this (corba, dce, dcom, onc-rpc, etc.) it
> becomes nearly impossible to upgrade the protocol if you add fields.
> You end up having to add version numbers and altering the protocol to
> handle various versions and stubs.  Eventually it becomes a nightmare to
> coordinate the release of these protocols.  It's this combination of
> structure and semantics that doesn't work because the semantics usually
> have to change over time, but the structure usually doesn't.

While there is a compilation step in extprot, you don't need to have the
type definition to understand the message.  You can decode an arbitrary
extprot message into a "skeleton" very much like you'd get out of JSON
(a list of ints, a hashmap, a five-element tuple, etc)

So the message encodes its own structure, and the compiled type
definition provides the intended semantics by mapping the raw structure
into your application domain.

The "ext" in extprot stands for "extensible" and it has well-defined
allowances for extending the protocol while maintaining both backwards-
and forwards-compatibility:

   http://eigenclass.org/R2/writings/protocol-extension-with-extprot


I still think it's wholly unsuited for this use-case though.

> Finally, the myth is that this is faster, but there's rarely any
> evidence to back this up.  Usually the few metrics showing speed are
> just for simple stuff like the above and not for anything that's deeply
> nested and connected.

Yep, in extprot's case at least there is basically no speed advantage
derived from compiling the protocol definition.

The only case where it wins you any speed if if the underlying message
doesn't match the typedef, then you can bail out sooner than if you had
to parse it all and validate at the end.  Not exactly a common case.

Plus, in a high-level interpreted language python, any supposed speed
advantages disappear as soon as you need to start bit-twiddling to
decode the embedded type tags in the message.

I think your tnetstrings strike a really nice balance between speed,
compactness, and ease of implementation.


  Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-20 @ 21:56
could not decode message

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 22:31
On Mon, Mar 21, 2011 at 08:56:38AM +1100, Ryan Kelly wrote:
> The python was pretty easy to transliterate into javascript,
> implementation attached.  Works as expected, but I doubt it will be
> faster than JSON in this context :-)

Ha, yeah not going to beat JSON on javascript, but definitely easier to
implement.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
joshua simmons
Date:
2011-03-20 @ 22:39
JSON parsing is a hot spot in mongrel2-lua too. It's not a very fast
protocol to parse and with luajit's ffi a simple protocol parser would be
able to near C's speed.

On Mon, Mar 21, 2011 at 9:31 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> On Mon, Mar 21, 2011 at 08:56:38AM +1100, Ryan Kelly wrote:
> > The python was pretty easy to transliterate into javascript,
> > implementation attached.  Works as expected, but I doubt it will be
> > faster than JSON in this context :-)
>
> Ha, yeah not going to beat JSON on javascript, but definitely easier to
> implement.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Samuel Tardieu
Date:
2011-03-20 @ 21:42
2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>

Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>

It won't play well in embedded systems contexts where you can't use a lot of
memory. With "tagged netstrings" you must receive and store the string
before being able to decode it, even if it ends up being an integer. A
prefix-based type system (instead of a suffix-based one) would let you
decode data as you receive it should you want to do so.

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-20 @ 22:01
On Sun, 2011-03-20 at 22:42 +0100, Samuel Tardieu wrote:
> 
> 
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
> 
>         Hey, so I'm tinkering with the idea of a proxy as a handler,
>         but
>         realizing that we sort of need a second protocol that's faster
>         and
>         easier to parse in C (and others) than JSON.  JSON's great for
>         getting
>         things going, but I think we can support both JSON and another
>         protocol.
>         
>         As an idea, I cooked up "tagged netstrings".  This is simply
>         the idea
>         that you encode JSON style data as a sequence of nested
>         netstrings with
>         their character terminators saying what's inside.
> 
> It won't play well in embedded systems contexts where you can't use a
> lot of memory. With "tagged netstrings" you must receive and store the
> string before being able to decode it, even if it ends up being an
> integer. A prefix-based type system (instead of a suffix-based one)
> would let you decode data as you receive it should you want to do so.

True, but doesn't 0mq force you to receive the whole message at once
anyway?  Or is there a way to incrementally read the message that I
haven't come across?


   Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] Proposing An Alternative To JSON

From:
Tordek
Date:
2011-03-20 @ 22:41
I'm gonna go ahead and agree with everyone that's rooting for
prefixes. Now, you can be even bolder and replace the colon separator
for the type character. This saves a few characters, one main problem:
The strings are a bit less readable (it's hard to see where a number
ends and where and the next thing begins for a human).

Eg:

"0{" : {},
"0[" : [],
'34{5"hello22[11#123456789014"this': {'hello': [12345678901, 'this']},
'5#12345: 12345
'0"' : ""
'24[5#123455#678905"xxxxx' : [12345, 67890, 'xxxxx']


But it should be relatively easy to parse in C.

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-20 @ 22:51
On Sun, 2011-03-20 at 19:41 -0300, Tordek wrote:
> I'm gonna go ahead and agree with everyone that's rooting for
> prefixes. Now, you can be even bolder and replace the colon separator
> for the type character. This saves a few characters, one main problem:
> The strings are a bit less readable (it's hard to see where a number
> ends and where and the next thing begins for a human).
> 
> Eg:
> 
> "0{" : {},
> "0[" : [],
> '34{5"hello22[11#123456789014"this': {'hello': [12345678901, 'this']},
> '5#12345: 12345
> '0"' : ""
> '24[5#123455#678905"xxxxx' : [12345, 67890, 'xxxxx']
> 
> 
> But it should be relatively easy to parse in C.


If maintaining human-scanability is important then you could always
duplicate the type marker at the end:

"0{}" : {},
"0[]" : [],
'34{5"hello"22[11#12345678901#4"this"]}': {'hello': [12345678901, 'this']},
'5#12345#: 12345
'0""' : ""
'24[5#12345#5#67890#5"xxxxx"]' : [12345, 67890, 'xxxxx']


But it starts to look like some sort of zombie length-delimited
whitespace-free JSON encoding.


  Ryan



-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 22:46
On Sun, Mar 20, 2011 at 07:41:31PM -0300, Tordek wrote:
> I'm gonna go ahead and agree with everyone that's rooting for
> prefixes. Now, you can be even bolder and replace the colon separator
> for the type character. This saves a few characters, one main problem:
> The strings are a bit less readable (it's hard to see where a number
> ends and where and the next thing begins for a human).

Tried that, but it's actually easier to parse if there's only one ':' to
look for as the separator, and it's backward compatible with netstrings
now that I added ',' as the blob char.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 22:28
On Mon, Mar 21, 2011 at 09:01:57AM +1100, Ryan Kelly wrote:
> True, but doesn't 0mq force you to receive the whole message at once
> anyway?  Or is there a way to incrementally read the message that I
> haven't come across?

There is, but it's not well documented and hard to use so I avoid it.
Also, ehem, I like to hedge my bets and not depend on the 0mq API for
the wire protocol.  You know, just in case. :-)

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Ryan Kelly
Date:
2011-03-20 @ 22:46
On Sun, 2011-03-20 at 15:28 -0700, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 09:01:57AM +1100, Ryan Kelly wrote:
> > True, but doesn't 0mq force you to receive the whole message at once
> > anyway?  Or is there a way to incrementally read the message that I
> > haven't come across?
> 
> There is, but it's not well documented and hard to use so I avoid it.
> Also, ehem, I like to hedge my bets and not depend on the 0mq API for
> the wire protocol.  You know, just in case. :-)

Of course, but there's always a little push-back from YAGNI.

If you *did* want to go 0mq-all-the-way-down, you could use its
multi-part messages instead of an internally-delimited format like
netstring, and have a good hunk of the parsing done for free by your
messaging API.

But that's not a serious suggestion.

+1 for Matt's idea of going <size>:<type>:<content>, since you're
already breaking with the netstring format anyway.

I believe the trailing comma in netstrings was meant to aid
human-readability, which would be diminished by moving it to the front
of the message.  But really, can you parse something like:

   34:5:hello"22:11:12345678901#4:this"]}

into the appropriate structure just by looking at it?  I actually find
the reverse notation a little more readable, apart from the numbers
being all smooshed together:

   34:{:5:":hello22:[:11:#:123456789014:":this



  Ryan



-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] Proposing An Alternative To JSON

From:
Matt Nunogawa
Date:
2011-03-20 @ 21:57
<size>:<type>:<content>

might be easier to parse as well...  your zero-size case and your non-zero
case both would have consistent ordering, instead of:

<zero-size>:<type>
<non-zero-size>:<content>:<type>

They aren't really netstrings at that point though...



On Sun, Mar 20, 2011 at 2:42 PM, Samuel Tardieu <sam@rfc1149.net> wrote:

>
>
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
>
>
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON.  JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings".  This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>
> It won't play well in embedded systems contexts where you can't use a lot
> of memory. With "tagged netstrings" you must receive and store the string
> before being able to decode it, even if it ends up being an integer. A
> prefix-based type system (instead of a suffix-based one) would let you
> decode data as you receive it should you want to do so.
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
James Dennis
Date:
2011-03-20 @ 21:51
Maybe STOMP is worth considering?

http://stomp.codehaus.org/Protocol

On Sunday, March 20, 2011, Samuel Tardieu <sam@rfc1149.net> wrote:
>
>
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
>
>
>
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> It won't play well in embedded systems contexts where you can't use a 
lot of memory. With "tagged netstrings" you must receive and store the 
string before being able to decode it, even if it ends up being an 
integer. A prefix-based type system (instead of a suffix-based one) would 
let you decode data as you receive it should you want to do so.
>
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 22:27
On Sun, Mar 20, 2011 at 05:51:09PM -0400, James Dennis wrote:
> Maybe STOMP is worth considering?
> 
> http://stomp.codehaus.org/Protocol

Ugh, STOMP.  Why'd you bring that up man?  I thought we were friends.
:-)

Ok, I think it's time to have a lesson in how *not* to design a
protocol:

http://stomp.codehaus.org/Protocol

If you look at that you get this wondeful message format:

SEND
destination:/queue/a
receipt:message-12345

Hello a!^@

Alright, see anything wrong with that?  What if I want to send a message
that is a sequence of the ^@ terminators?  Oh, that means I have to
escape the terminators?  Ok, so \^@ which means now I have to escape the
escape.  Now my parser has to handle \\ and \^@ just to handle a
message, oh and also need to escape newlines. Oh and \r and \n newlines
need escaping too probably.

This is the problem with terminated protocols.  You always have the
"escape paradox" where in order to send the message you need to either
invent an escaping system, a guard system (like multipart mime), or a
presize system (like chunked-encoding).  Every protocol designed this
way is vulnerable to all sorts of attacks related to streaming insane
amounts of data, exploits of the protocol grammar, and other problems
that Mongrel2 already has to work around.

Next, let's look at what protocol they're replicating.  Oh why it's
HTTP, that awesome success story of clarity and parseability.  To even
come close to a reliable parsing method I have to use a full on state
machine compiler to generate a parser, so now, to handle messages I have
to do the same for this?  Great.

Finally, the entire semantics are jacked.  They assume there's a
centralized server, can't handle partitioning, require explicit
connection management, have no specification for message durability, and
no defined defacto API that people implement.

Compared to 0mq and AMQP the STOMP protocol is a massive joke.  It's all
the disadvantages of HTTP for none of the benefits you can just get from
0mq.

</rant>

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
James Dennis
Date:
2011-03-20 @ 23:14
Ha! Well, I have only heard about it from advocates. Thought I'd test
the waters.

But seriously, I appreciate the long and clear response.


On Mar 20, 2011, at 6:28 PM, "Zed A. Shaw" <zedshaw@zedshaw.com> wrote:

> On Sun, Mar 20, 2011 at 05:51:09PM -0400, James Dennis wrote:
>> Maybe STOMP is worth considering?
>>
>> http://stomp.codehaus.org/Protocol
>
> Ugh, STOMP.  Why'd you bring that up man?  I thought we were friends.
> :-)
>
> Ok, I think it's time to have a lesson in how *not* to design a
> protocol:
>
> http://stomp.codehaus.org/Protocol
>
> If you look at that you get this wondeful message format:
>
> SEND
> destination:/queue/a
> receipt:message-12345
>
> Hello a!^@
>
> Alright, see anything wrong with that?  What if I want to send a message
> that is a sequence of the ^@ terminators?  Oh, that means I have to
> escape the terminators?  Ok, so \^@ which means now I have to escape the
> escape.  Now my parser has to handle \\ and \^@ just to handle a
> message, oh and also need to escape newlines. Oh and \r and \n newlines
> need escaping too probably.
>
> This is the problem with terminated protocols.  You always have the
> "escape paradox" where in order to send the message you need to either
> invent an escaping system, a guard system (like multipart mime), or a
> presize system (like chunked-encoding).  Every protocol designed this
> way is vulnerable to all sorts of attacks related to streaming insane
> amounts of data, exploits of the protocol grammar, and other problems
> that Mongrel2 already has to work around.
>
> Next, let's look at what protocol they're replicating.  Oh why it's
> HTTP, that awesome success story of clarity and parseability.  To even
> come close to a reliable parsing method I have to use a full on state
> machine compiler to generate a parser, so now, to handle messages I have
> to do the same for this?  Great.
>
> Finally, the entire semantics are jacked.  They assume there's a
> centralized server, can't handle partitioning, require explicit
> connection management, have no specification for message durability, and
> no defined defacto API that people implement.
>
> Compared to 0mq and AMQP the STOMP protocol is a massive joke.  It's all
> the disadvantages of HTTP for none of the benefits you can just get from
> 0mq.
>
> </rant>
>
> --
> Zed A. Shaw
> http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Loic d'Anterroches
Date:
2011-03-20 @ 21:26
Hello,

in PHP it will end up being slower as we would have to parse in PHP
where for json we just do a json_decode($payload) which itself is coded
in C. If really needed, I can create a C extension for PHP to provide
tnets_encode and tnets_decode. So, this is not really a big issue.

But you also wrote: "I think we can support both JSON and another
protocol.". If we keep the ease of the current JSON protocol by default,
then, I must say, go for it.

Do you want to configure the protocol like that:

handler_test = Handler(
                   # protocol='json',
                   protocol='tnets',
                   send_spec='tcp://127.0.0.1:9997',
                   send_ident='34f9ceee-cd52-4b7f-b197-88bf2f0ec378',
                   recv_spec='tcp://127.0.0.1:9996',
                   recv_ident='')

Considering that a "proxy" handler would work only with the tnets or
whatever the name, protocol?

loïc


> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
> 
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
> 
> Here's a python implementation of parsing it:
> 
> http://codepad.org/xct0E5ac
> 
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
> 
> Can someone in another language try to replicate this and see how hard
> it is?  I'll be doing C next as a test, but I'd like to see a few others
> to compare.
> 
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
> 
> Thanks!
> 

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 21:36
On Sun, Mar 20, 2011 at 10:26:50PM +0100, Loic d'Anterroches wrote:
> Hello,
> 
> in PHP it will end up being slower as we would have to parse in PHP
> where for json we just do a json_decode($payload) which itself is coded
> in C. If really needed, I can create a C extension for PHP to provide
> tnets_encode and tnets_decode. So, this is not really a big issue.
> 
> But you also wrote: "I think we can support both JSON and another
> protocol.". If we keep the ease of the current JSON protocol by default,
> then, I must say, go for it.

Exactly, there's always things where JSON will be understood no matter
what.  This would just be for those who want to remove that overhead as
well (assuming tnetstrings turn out to be faster in practice).

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Henry Baragar
Date:
2011-03-20 @ 20:49
What about BSON (http://bsonspec.org/), binary encoded JSON?

Its the native storage format for MongoDB (http://www.mongodb.org/), a popular 
no-sql database.

Cheers,
Henry

On March 20, 2011 03:12:08 pm Zed A. Shaw wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
> 
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
> 
> Here's a python implementation of parsing it:
> 
> http://codepad.org/xct0E5ac
> 
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
> 
> Can someone in another language try to replicate this and see how hard
> it is?  I'll be doing C next as a test, but I'd like to see a few others
> to compare.
> 
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
> 
> Thanks!

-- 
Henry Baragar
Instantiated Software

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 21:35
On Sun, Mar 20, 2011 at 04:49:04PM -0400, Henry Baragar wrote:
> What about BSON (http://bsonspec.org/), binary encoded JSON?

The C code for BSON isn't very good, and as I mentioned with msgpack,
binary protocols are hard to parse in a lot of languages.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
Andrew Cholakian
Date:
2011-03-20 @ 19:34
What are your thoughts as far as MessagePack?

It correlates well to JSON, has a very compact representation, and has a
very fast set of widely available bindings.

http://msgpack.org/

On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is?  I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>



-- 
Andrew Cholakian
http://www.andrewvc.com

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 21:34
On Sun, Mar 20, 2011 at 12:34:14PM -0700, Andrew Cholakian wrote:
> What are your thoughts as far as MessagePack?
> 
> It correlates well to JSON, has a very compact representation, and has a
> very fast set of widely available bindings.
> 
> http://msgpack.org/

msg = [1,2,3].to_msgpack  #=> "\x93\x01\x02\x03"

Says it all.  That's damn hard to parse well in lots of languages, most
notably javascript.  Basically, netstrings are parseable by everything
that can handle ascii text, but most other "fast" formats like msgpack
and BIRT are not.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Proposing An Alternative To JSON

From:
joshua simmons
Date:
2011-03-20 @ 20:33
MessagePack is horribly implemented. Get that protocol with a nice plain C
implementation and I'll be there. But not before.

On Mon, Mar 21, 2011 at 6:34 AM, Andrew Cholakian <andrew@andrewvc.com>wrote:

> What are your thoughts as far as MessagePack?
>
> It correlates well to JSON, has a very compact representation, and has a
> very fast set of widely available bindings.
>
> http://msgpack.org/
>
> On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>
>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON.  JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings".  This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>> Here's a python implementation of parsing it:
>>
>> http://codepad.org/xct0E5ac
>>
>> This needs to have one more type of Blob with ',' terminator so that
>> it's backward compatible with regular netstrings, and you can transmit
>> binary data safely on platforms where that's not possible (javascript),
>> but otherwise this is very easy to parse and generate.
>>
>> Can someone in another language try to replicate this and see how hard
>> it is?  I'll be doing C next as a test, but I'd like to see a few others
>> to compare.
>>
>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>> hard as hell to parse compared to this and probably don't buy much in
>> terms of speed for the usability costs.
>>
>> Thanks!
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>
>
> --
> Andrew Cholakian
> http://www.andrewvc.com
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
joshua simmons
Date:
2011-03-20 @ 20:43
And yeah, something that's easier to generate in C would be nice too.

Request_to_payload was (is) one of the major hot spots in the mongrel2 code
and even after changing it up it's still quite nasty.

A protocol that minimises work there would be highly beneficial.

On Mon, Mar 21, 2011 at 7:33 AM, joshua simmons <simmons.44@gmail.com>wrote:

> MessagePack is horribly implemented. Get that protocol with a nice plain C
> implementation and I'll be there. But not before.
>
>
> On Mon, Mar 21, 2011 at 6:34 AM, Andrew Cholakian <andrew@andrewvc.com>wrote:
>
>> What are your thoughts as far as MessagePack?
>>
>> It correlates well to JSON, has a very compact representation, and has a
>> very fast set of widely available bindings.
>>
>> http://msgpack.org/
>>
>> On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com>wrote:
>>
>>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>>> realizing that we sort of need a second protocol that's faster and
>>> easier to parse in C (and others) than JSON.  JSON's great for getting
>>> things going, but I think we can support both JSON and another protocol.
>>>
>>> As an idea, I cooked up "tagged netstrings".  This is simply the idea
>>> that you encode JSON style data as a sequence of nested netstrings with
>>> their character terminators saying what's inside.
>>>
>>> Here's a python implementation of parsing it:
>>>
>>> http://codepad.org/xct0E5ac
>>>
>>> This needs to have one more type of Blob with ',' terminator so that
>>> it's backward compatible with regular netstrings, and you can transmit
>>> binary data safely on platforms where that's not possible (javascript),
>>> but otherwise this is very easy to parse and generate.
>>>
>>> Can someone in another language try to replicate this and see how hard
>>> it is?  I'll be doing C next as a test, but I'd like to see a few others
>>> to compare.
>>>
>>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>>> hard as hell to parse compared to this and probably don't buy much in
>>> terms of speed for the usability costs.
>>>
>>> Thanks!
>>>
>>> --
>>> Zed A. Shaw
>>> http://zedshaw.com/
>>>
>>
>>
>>
>> --
>> Andrew Cholakian
>> http://www.andrewvc.com
>>
>
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Bobby Powers
Date:
2011-03-20 @ 19:30
I like the extension.  So the difference between ',' and '"' is that '"' is
a (utf8? null-terminated?) string, and ',' is a blob of bytes?

yours,
Bobby

On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON.  JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings".  This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is?  I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>

Re: [mongrel2] Proposing An Alternative To JSON

From:
Zed A. Shaw
Date:
2011-03-20 @ 21:33
On Sun, Mar 20, 2011 at 12:30:50PM -0700, Bobby Powers wrote:
> I like the extension.  So the difference between ',' and '"' is that '"' is
> a (utf8? null-terminated?) string, and ',' is a blob of bytes?

Well, whatever the language thinks a "string" is.  It's hard to dictate
utf8 because in C that requires a metric pain of crap to handle well.

Additionally, we'll need:

boolean
null

Since JSON has those too.  I'll be updating the python sample soon.
Meantime, here's the thing in factor:

http://re-factor.blogspot.com/2011/03/typed-netstrings.html


-- 
Zed A. Shaw
http://zedshaw.com/