Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ciprian Dorin Craciun
- Date:
- 2011-03-22 @ 17:15
On Sun, Mar 20, 2011 at 21:12, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> [...]
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
I apologize for another off-topic email, but I've seen suggestions
for most serialization formats, except the one proposed by Joe
Armstrong, UBF:
http://www.sics.se/~joe/ubf/site/home.html
(or a nice doc) http://norton.github.com/ubf/ubf-user-guide.en.html
It is ASCII based -- thus easily parsable (at least by a computer,
as the syntax is pre-fixed) -- but at the same time it offers both
native binary payload support, and type hinting, being as expressive
as JSON.
One such Python implementation I've found below, but as you'll see
it's trivial to implement.
http://www.eighty-twenty.org/~tonyg/Darcs/ubf/python/ubf.py
Ciprian.
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Hedge Hog
- Date:
- 2011-03-21 @ 00:35
On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is? I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
I am curious about whether you considered extprot.
I appreciate the need to keep it simple, and for speed - for most
people those are traded-off against functionality.
But maybe the mogrel2 proxy/handler could add value by taking care of
the serialization and de-serialization steps.
The added complexity is a protocol definition file, and whether that
is worth it...
Anyway, some might find extprot of use elsewhere in their stack, so
hopefully this mail is not pure noise.
HTH
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>
--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Hedge Hog
- Date:
- 2011-03-21 @ 02:05
On Mon, Mar 21, 2011 at 11:35 AM, Hedge Hog <hedgehogshiatus@gmail.com> wrote:
> On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON. JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings". This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>> Here's a python implementation of parsing it:
>>
>> http://codepad.org/xct0E5ac
>>
>> This needs to have one more type of Blob with ',' terminator so that
>> it's backward compatible with regular netstrings, and you can transmit
>> binary data safely on platforms where that's not possible (javascript),
>> but otherwise this is very easy to parse and generate.
>>
>> Can someone in another language try to replicate this and see how hard
>> it is? I'll be doing C next as a test, but I'd like to see a few others
>> to compare.
>>
>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>> hard as hell to parse compared to this and probably don't buy much in
>> terms of speed for the usability costs.
>
> I am curious about whether you considered extprot.
Apologies. I should have included this link:
https://github.com/mfp/extprot
> I appreciate the need to keep it simple, and for speed - for most
> people those are traded-off against functionality.
> But maybe the mogrel2 proxy/handler could add value by taking care of
> the serialization and de-serialization steps.
> The added complexity is a protocol definition file, and whether that
> is worth it...
>
> Anyway, some might find extprot of use elsewhere in their stack, so
> hopefully this mail is not pure noise.
> HTH
>
>>
>> Thanks!
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>
>
> --
> πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
> [The fox knows many things, but the hedgehog knows one big thing.]
> Archilochus, Greek poet (c. 680 BC – c. 645 BC)
> http://wiki.hedgehogshiatus.com
>
--
πόλλ' οἶδ ἀλώπηξ, ἀλλ' ἐχῖνος ἓν μέγα
[The fox knows many things, but the hedgehog knows one big thing.]
Archilochus, Greek poet (c. 680 BC – c. 645 BC)
http://wiki.hedgehogshiatus.com
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- S. Günther
- Date:
- 2011-03-21 @ 03:10
Here's a pretty horrible transliteration to haskell:
http://codepad.org/wDGmjjUc
Sorry for the ugliness. (But it seems to work.)
I would also like to note that the "bencode" encoding used in the
bittorrent protocol looks kind of similar to the proposed typed
netstrings. There are some key differences though and if those rule out
the format completely, I apologise for adding to the growing list of
proposed alternatives.
kind regards
Stephan Günther
--------------------------------------------------------------------------------
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-21 @ 04:12
On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
> Here's a pretty horrible transliteration to haskell:
>
> http://codepad.org/wDGmjjUc
Super cool, so far it's looking pretty good for implementation.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- joshua simmons
- Date:
- 2011-03-21 @ 04:24
As a side note, you actually don't have to have the protobuf to use protocol
buffers, there's enough data in the format to read it off the wire without
needing any nasty code gen. It's just not how google want it.
On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
> > Here's a pretty horrible transliteration to haskell:
> >
> > http://codepad.org/wDGmjjUc
>
> Super cool, so far it's looking pretty good for implementation.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Alex Gartrell
- Date:
- 2011-03-22 @ 18:36
+1 here. I implemented a google protocol buffer parser in C for a
research project (and then we switched to Thrift, which is more of the
same but more limited), so it's doable. Varint parsing in a language
like javascript seems like it would blow though. The other thing to
note is that field ids are used rather than field names, so you'll
have to keep "Field 3 = the IP field" available somewhere, which can
be a little less clear/convenient than IP=... (Keep in mind I'm
talking about field with ID 3 rather than the third field, protobufs
allow you to omit or reorder fields).
I think we can go with either of <length> <content> <type> or <length>
<type> <content>. LTC is the pattern used by Proto bufs and Thrift,
but, in practice, there's no difference because you've read the whole
thing into memory anyway, and you're just skipping ahead by N bytes.
On Mon, Mar 21, 2011 at 12:24 AM, joshua simmons <simmons.44@gmail.com> wrote:
> As a side note, you actually don't have to have the protobuf to use protocol
> buffers, there's enough data in the format to read it off the wire without
> needing any nasty code gen. It's just not how google want it.
>
> On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>>
>> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
>> > Here's a pretty horrible transliteration to haskell:
>> >
>> > http://codepad.org/wDGmjjUc
>>
>> Super cool, so far it's looking pretty good for implementation.
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- James Dennis
- Date:
- 2011-03-21 @ 04:28
Another side note, stomp allows specifying content length which avoids the
escape paradox.
I don't take issue with anything else said regarding stomp and neither did
anyone I forwarded this to. :)
On Mon, Mar 21, 2011 at 12:24 AM, joshua simmons <simmons.44@gmail.com>wrote:
> As a side note, you actually don't have to have the protobuf to use
> protocol buffers, there's enough data in the format to read it off the wire
> without needing any nasty code gen. It's just not how google want it.
>
> On Mon, Mar 21, 2011 at 3:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>
>> On Mon, Mar 21, 2011 at 04:10:25AM +0100, S. Günther wrote:
>> > Here's a pretty horrible transliteration to haskell:
>> >
>> > http://codepad.org/wDGmjjUc
>>
>> Super cool, so far it's looking pretty good for implementation.
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-21 @ 00:42
On Mon, 2011-03-21 at 11:35 +1100, Hedge Hog wrote:
> On Mon, Mar 21, 2011 at 6:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> > Hey, so I'm tinkering with the idea of a proxy as a handler, but
> > realizing that we sort of need a second protocol that's faster and
> > easier to parse in C (and others) than JSON. JSON's great for getting
> > things going, but I think we can support both JSON and another protocol.
> >
> > As an idea, I cooked up "tagged netstrings". This is simply the idea
> > that you encode JSON style data as a sequence of nested netstrings with
> > their character terminators saying what's inside.
> >
> > Here's a python implementation of parsing it:
> >
> > http://codepad.org/xct0E5ac
> >
> > This needs to have one more type of Blob with ',' terminator so that
> > it's backward compatible with regular netstrings, and you can transmit
> > binary data safely on platforms where that's not possible (javascript),
> > but otherwise this is very easy to parse and generate.
> >
> > Can someone in another language try to replicate this and see how hard
> > it is? I'll be doing C next as a test, but I'd like to see a few others
> > to compare.
> >
> > Also, no, we won't use protocol bufs, BIRT, or others since those are
> > hard as hell to parse compared to this and probably don't buy much in
> > terms of speed for the usability costs.
>
> I am curious about whether you considered extprot.
> I appreciate the need to keep it simple, and for speed - for most
> people those are traded-off against functionality.
> But maybe the mogrel2 proxy/handler could add value by taking care of
> the serialization and de-serialization steps.
> The added complexity is a protocol definition file, and whether that
> is worth it...
>
> Anyway, some might find extprot of use elsewhere in their stack, so
> hopefully this mail is not pure noise.
> HTH
I love extprot, and in fact I maintain the python implementation.
Anyone considering something like protobuf or thrift should definitely
give it a look.
But I think it falls squarely in the "parsing is too complicated" camp
for the uses that Zed has in mind here. Anything that involves
bit-twiddling is probably out of the question.
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-21 @ 04:11
On Mon, Mar 21, 2011 at 11:42:32AM +1100, Ryan Kelly wrote:
> I love extprot, and in fact I maintain the python implementation.
> Anyone considering something like protobuf or thrift should definitely
> give it a look.
>
> But I think it falls squarely in the "parsing is too complicated" camp
> for the uses that Zed has in mind here. Anything that involves
> bit-twiddling is probably out of the question.
It also falls into the "type safety is impossible in network protocols".
Protobufs and extprot make you "compile" the protocol:
(* this is a comment (* and this a nested comment *) *)
message user = {
id : int;
name : string;
}
In every protocol that's like this (corba, dce, dcom, onc-rpc, etc.) it
becomes nearly impossible to upgrade the protocol if you add fields.
You end up having to add version numbers and altering the protocol to
handle various versions and stubs. Eventually it becomes a nightmare to
coordinate the release of these protocols. It's this combination of
structure and semantics that doesn't work because the semantics usually
have to change over time, but the structure usually doesn't.
By comparison, protocols that work well and last are ones that define
structure but not semantics. Take JSON as an example. It defines
structures, but not what goes in them so the semantics are left to me.
If I add a field to a hashmap, it'll still get processed by the receiver
and most older clients can just ignore it. The semantics are controlled
at the application layer and not at the protocol layer so it degrades
better and stands up longer.
A good way to describe the above is if you had to compile all your HTTP
client requests to match exactly what the server expected, right down to
the URLs and header contents. It'd get pretty impossible to make the
web work if that were the case.
Finally, the myth is that this is faster, but there's rarely any
evidence to back this up. Usually the few metrics showing speed are
just for simple stuff like the above and not for anything that's deeply
nested and connected.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-21 @ 04:33
On Sun, 2011-03-20 at 21:11 -0700, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 11:42:32AM +1100, Ryan Kelly wrote:
> > I love extprot, and in fact I maintain the python implementation.
> > Anyone considering something like protobuf or thrift should definitely
> > give it a look.
> >
> > But I think it falls squarely in the "parsing is too complicated" camp
> > for the uses that Zed has in mind here. Anything that involves
> > bit-twiddling is probably out of the question.
>
> It also falls into the "type safety is impossible in network protocols".
> Protobufs and extprot make you "compile" the protocol:
>
> (* this is a comment (* and this a nested comment *) *)
> message user = {
> id : int;
> name : string;
> }
>
> In every protocol that's like this (corba, dce, dcom, onc-rpc, etc.) it
> becomes nearly impossible to upgrade the protocol if you add fields.
> You end up having to add version numbers and altering the protocol to
> handle various versions and stubs. Eventually it becomes a nightmare to
> coordinate the release of these protocols. It's this combination of
> structure and semantics that doesn't work because the semantics usually
> have to change over time, but the structure usually doesn't.
While there is a compilation step in extprot, you don't need to have the
type definition to understand the message. You can decode an arbitrary
extprot message into a "skeleton" very much like you'd get out of JSON
(a list of ints, a hashmap, a five-element tuple, etc)
So the message encodes its own structure, and the compiled type
definition provides the intended semantics by mapping the raw structure
into your application domain.
The "ext" in extprot stands for "extensible" and it has well-defined
allowances for extending the protocol while maintaining both backwards-
and forwards-compatibility:
http://eigenclass.org/R2/writings/protocol-extension-with-extprot
I still think it's wholly unsuited for this use-case though.
> Finally, the myth is that this is faster, but there's rarely any
> evidence to back this up. Usually the few metrics showing speed are
> just for simple stuff like the above and not for anything that's deeply
> nested and connected.
Yep, in extprot's case at least there is basically no speed advantage
derived from compiling the protocol definition.
The only case where it wins you any speed if if the underlying message
doesn't match the typedef, then you can bail out sooner than if you had
to parse it all and validate at the end. Not exactly a common case.
Plus, in a high-level interpreted language python, any supposed speed
advantages disappear as soon as you need to start bit-twiddling to
decode the embedded type tags in the message.
I think your tnetstrings strike a really nice balance between speed,
compactness, and ease of implementation.
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-20 @ 21:56
could not decode message
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 22:31
On Mon, Mar 21, 2011 at 08:56:38AM +1100, Ryan Kelly wrote:
> The python was pretty easy to transliterate into javascript,
> implementation attached. Works as expected, but I doubt it will be
> faster than JSON in this context :-)
Ha, yeah not going to beat JSON on javascript, but definitely easier to
implement.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- joshua simmons
- Date:
- 2011-03-20 @ 22:39
JSON parsing is a hot spot in mongrel2-lua too. It's not a very fast
protocol to parse and with luajit's ffi a simple protocol parser would be
able to near C's speed.
On Mon, Mar 21, 2011 at 9:31 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> On Mon, Mar 21, 2011 at 08:56:38AM +1100, Ryan Kelly wrote:
> > The python was pretty easy to transliterate into javascript,
> > implementation attached. Works as expected, but I doubt it will be
> > faster than JSON in this context :-)
>
> Ha, yeah not going to beat JSON on javascript, but definitely easier to
> implement.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Samuel Tardieu
- Date:
- 2011-03-20 @ 21:42
2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
It won't play well in embedded systems contexts where you can't use a lot of
memory. With "tagged netstrings" you must receive and store the string
before being able to decode it, even if it ends up being an integer. A
prefix-based type system (instead of a suffix-based one) would let you
decode data as you receive it should you want to do so.
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-20 @ 22:01
On Sun, 2011-03-20 at 22:42 +0100, Samuel Tardieu wrote:
>
>
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
>
> Hey, so I'm tinkering with the idea of a proxy as a handler,
> but
> realizing that we sort of need a second protocol that's faster
> and
> easier to parse in C (and others) than JSON. JSON's great for
> getting
> things going, but I think we can support both JSON and another
> protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply
> the idea
> that you encode JSON style data as a sequence of nested
> netstrings with
> their character terminators saying what's inside.
>
> It won't play well in embedded systems contexts where you can't use a
> lot of memory. With "tagged netstrings" you must receive and store the
> string before being able to decode it, even if it ends up being an
> integer. A prefix-based type system (instead of a suffix-based one)
> would let you decode data as you receive it should you want to do so.
True, but doesn't 0mq force you to receive the whole message at once
anyway? Or is there a way to incrementally read the message that I
haven't come across?
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Tordek
- Date:
- 2011-03-20 @ 22:41
I'm gonna go ahead and agree with everyone that's rooting for
prefixes. Now, you can be even bolder and replace the colon separator
for the type character. This saves a few characters, one main problem:
The strings are a bit less readable (it's hard to see where a number
ends and where and the next thing begins for a human).
Eg:
"0{" : {},
"0[" : [],
'34{5"hello22[11#123456789014"this': {'hello': [12345678901, 'this']},
'5#12345: 12345
'0"' : ""
'24[5#123455#678905"xxxxx' : [12345, 67890, 'xxxxx']
But it should be relatively easy to parse in C.
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-20 @ 22:51
On Sun, 2011-03-20 at 19:41 -0300, Tordek wrote:
> I'm gonna go ahead and agree with everyone that's rooting for
> prefixes. Now, you can be even bolder and replace the colon separator
> for the type character. This saves a few characters, one main problem:
> The strings are a bit less readable (it's hard to see where a number
> ends and where and the next thing begins for a human).
>
> Eg:
>
> "0{" : {},
> "0[" : [],
> '34{5"hello22[11#123456789014"this': {'hello': [12345678901, 'this']},
> '5#12345: 12345
> '0"' : ""
> '24[5#123455#678905"xxxxx' : [12345, 67890, 'xxxxx']
>
>
> But it should be relatively easy to parse in C.
If maintaining human-scanability is important then you could always
duplicate the type marker at the end:
"0{}" : {},
"0[]" : [],
'34{5"hello"22[11#12345678901#4"this"]}': {'hello': [12345678901, 'this']},
'5#12345#: 12345
'0""' : ""
'24[5#12345#5#67890#5"xxxxx"]' : [12345, 67890, 'xxxxx']
But it starts to look like some sort of zombie length-delimited
whitespace-free JSON encoding.
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 22:46
On Sun, Mar 20, 2011 at 07:41:31PM -0300, Tordek wrote:
> I'm gonna go ahead and agree with everyone that's rooting for
> prefixes. Now, you can be even bolder and replace the colon separator
> for the type character. This saves a few characters, one main problem:
> The strings are a bit less readable (it's hard to see where a number
> ends and where and the next thing begins for a human).
Tried that, but it's actually easier to parse if there's only one ':' to
look for as the separator, and it's backward compatible with netstrings
now that I added ',' as the blob char.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 22:28
On Mon, Mar 21, 2011 at 09:01:57AM +1100, Ryan Kelly wrote:
> True, but doesn't 0mq force you to receive the whole message at once
> anyway? Or is there a way to incrementally read the message that I
> haven't come across?
There is, but it's not well documented and hard to use so I avoid it.
Also, ehem, I like to hedge my bets and not depend on the 0mq API for
the wire protocol. You know, just in case. :-)
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Ryan Kelly
- Date:
- 2011-03-20 @ 22:46
On Sun, 2011-03-20 at 15:28 -0700, Zed A. Shaw wrote:
> On Mon, Mar 21, 2011 at 09:01:57AM +1100, Ryan Kelly wrote:
> > True, but doesn't 0mq force you to receive the whole message at once
> > anyway? Or is there a way to incrementally read the message that I
> > haven't come across?
>
> There is, but it's not well documented and hard to use so I avoid it.
> Also, ehem, I like to hedge my bets and not depend on the 0mq API for
> the wire protocol. You know, just in case. :-)
Of course, but there's always a little push-back from YAGNI.
If you *did* want to go 0mq-all-the-way-down, you could use its
multi-part messages instead of an internally-delimited format like
netstring, and have a good hunk of the parsing done for free by your
messaging API.
But that's not a serious suggestion.
+1 for Matt's idea of going <size>:<type>:<content>, since you're
already breaking with the netstring format anyway.
I believe the trailing comma in netstrings was meant to aid
human-readability, which would be diminished by moving it to the front
of the message. But really, can you parse something like:
34:5:hello"22:11:12345678901#4:this"]}
into the appropriate structure just by looking at it? I actually find
the reverse notation a little more readable, apart from the numbers
being all smooshed together:
34:{:5:":hello22:[:11:#:123456789014:":this
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Matt Nunogawa
- Date:
- 2011-03-20 @ 21:57
<size>:<type>:<content>
might be easier to parse as well... your zero-size case and your non-zero
case both would have consistent ordering, instead of:
<zero-size>:<type>
<non-zero-size>:<content>:<type>
They aren't really netstrings at that point though...
On Sun, Mar 20, 2011 at 2:42 PM, Samuel Tardieu <sam@rfc1149.net> wrote:
>
>
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
>
>
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON. JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings". This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>
> It won't play well in embedded systems contexts where you can't use a lot
> of memory. With "tagged netstrings" you must receive and store the string
> before being able to decode it, even if it ends up being an integer. A
> prefix-based type system (instead of a suffix-based one) would let you
> decode data as you receive it should you want to do so.
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- James Dennis
- Date:
- 2011-03-20 @ 21:51
Maybe STOMP is worth considering?
http://stomp.codehaus.org/Protocol
On Sunday, March 20, 2011, Samuel Tardieu <sam@rfc1149.net> wrote:
>
>
> 2011/3/20 Zed A. Shaw <zedshaw@zedshaw.com>
>
>
>
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> It won't play well in embedded systems contexts where you can't use a
lot of memory. With "tagged netstrings" you must receive and store the
string before being able to decode it, even if it ends up being an
integer. A prefix-based type system (instead of a suffix-based one) would
let you decode data as you receive it should you want to do so.
>
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 22:27
On Sun, Mar 20, 2011 at 05:51:09PM -0400, James Dennis wrote:
> Maybe STOMP is worth considering?
>
> http://stomp.codehaus.org/Protocol
Ugh, STOMP. Why'd you bring that up man? I thought we were friends.
:-)
Ok, I think it's time to have a lesson in how *not* to design a
protocol:
http://stomp.codehaus.org/Protocol
If you look at that you get this wondeful message format:
SEND
destination:/queue/a
receipt:message-12345
Hello a!^@
Alright, see anything wrong with that? What if I want to send a message
that is a sequence of the ^@ terminators? Oh, that means I have to
escape the terminators? Ok, so \^@ which means now I have to escape the
escape. Now my parser has to handle \\ and \^@ just to handle a
message, oh and also need to escape newlines. Oh and \r and \n newlines
need escaping too probably.
This is the problem with terminated protocols. You always have the
"escape paradox" where in order to send the message you need to either
invent an escaping system, a guard system (like multipart mime), or a
presize system (like chunked-encoding). Every protocol designed this
way is vulnerable to all sorts of attacks related to streaming insane
amounts of data, exploits of the protocol grammar, and other problems
that Mongrel2 already has to work around.
Next, let's look at what protocol they're replicating. Oh why it's
HTTP, that awesome success story of clarity and parseability. To even
come close to a reliable parsing method I have to use a full on state
machine compiler to generate a parser, so now, to handle messages I have
to do the same for this? Great.
Finally, the entire semantics are jacked. They assume there's a
centralized server, can't handle partitioning, require explicit
connection management, have no specification for message durability, and
no defined defacto API that people implement.
Compared to 0mq and AMQP the STOMP protocol is a massive joke. It's all
the disadvantages of HTTP for none of the benefits you can just get from
0mq.
</rant>
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- James Dennis
- Date:
- 2011-03-20 @ 23:14
Ha! Well, I have only heard about it from advocates. Thought I'd test
the waters.
But seriously, I appreciate the long and clear response.
On Mar 20, 2011, at 6:28 PM, "Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Sun, Mar 20, 2011 at 05:51:09PM -0400, James Dennis wrote:
>> Maybe STOMP is worth considering?
>>
>> http://stomp.codehaus.org/Protocol
>
> Ugh, STOMP. Why'd you bring that up man? I thought we were friends.
> :-)
>
> Ok, I think it's time to have a lesson in how *not* to design a
> protocol:
>
> http://stomp.codehaus.org/Protocol
>
> If you look at that you get this wondeful message format:
>
> SEND
> destination:/queue/a
> receipt:message-12345
>
> Hello a!^@
>
> Alright, see anything wrong with that? What if I want to send a message
> that is a sequence of the ^@ terminators? Oh, that means I have to
> escape the terminators? Ok, so \^@ which means now I have to escape the
> escape. Now my parser has to handle \\ and \^@ just to handle a
> message, oh and also need to escape newlines. Oh and \r and \n newlines
> need escaping too probably.
>
> This is the problem with terminated protocols. You always have the
> "escape paradox" where in order to send the message you need to either
> invent an escaping system, a guard system (like multipart mime), or a
> presize system (like chunked-encoding). Every protocol designed this
> way is vulnerable to all sorts of attacks related to streaming insane
> amounts of data, exploits of the protocol grammar, and other problems
> that Mongrel2 already has to work around.
>
> Next, let's look at what protocol they're replicating. Oh why it's
> HTTP, that awesome success story of clarity and parseability. To even
> come close to a reliable parsing method I have to use a full on state
> machine compiler to generate a parser, so now, to handle messages I have
> to do the same for this? Great.
>
> Finally, the entire semantics are jacked. They assume there's a
> centralized server, can't handle partitioning, require explicit
> connection management, have no specification for message durability, and
> no defined defacto API that people implement.
>
> Compared to 0mq and AMQP the STOMP protocol is a massive joke. It's all
> the disadvantages of HTTP for none of the benefits you can just get from
> 0mq.
>
> </rant>
>
> --
> Zed A. Shaw
> http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Loic d'Anterroches
- Date:
- 2011-03-20 @ 21:26
Hello,
in PHP it will end up being slower as we would have to parse in PHP
where for json we just do a json_decode($payload) which itself is coded
in C. If really needed, I can create a C extension for PHP to provide
tnets_encode and tnets_decode. So, this is not really a big issue.
But you also wrote: "I think we can support both JSON and another
protocol.". If we keep the ease of the current JSON protocol by default,
then, I must say, go for it.
Do you want to configure the protocol like that:
handler_test = Handler(
# protocol='json',
protocol='tnets',
send_spec='tcp://127.0.0.1:9997',
send_ident='34f9ceee-cd52-4b7f-b197-88bf2f0ec378',
recv_spec='tcp://127.0.0.1:9996',
recv_ident='')
Considering that a "proxy" handler would work only with the tnets or
whatever the name, protocol?
loïc
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is? I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 21:36
On Sun, Mar 20, 2011 at 10:26:50PM +0100, Loic d'Anterroches wrote:
> Hello,
>
> in PHP it will end up being slower as we would have to parse in PHP
> where for json we just do a json_decode($payload) which itself is coded
> in C. If really needed, I can create a C extension for PHP to provide
> tnets_encode and tnets_decode. So, this is not really a big issue.
>
> But you also wrote: "I think we can support both JSON and another
> protocol.". If we keep the ease of the current JSON protocol by default,
> then, I must say, go for it.
Exactly, there's always things where JSON will be understood no matter
what. This would just be for those who want to remove that overhead as
well (assuming tnetstrings turn out to be faster in practice).
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Henry Baragar
- Date:
- 2011-03-20 @ 20:49
What about BSON (http://bsonspec.org/), binary encoded JSON?
Its the native storage format for MongoDB (http://www.mongodb.org/), a popular
no-sql database.
Cheers,
Henry
On March 20, 2011 03:12:08 pm Zed A. Shaw wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is? I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
--
Henry Baragar
Instantiated Software
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 21:35
On Sun, Mar 20, 2011 at 04:49:04PM -0400, Henry Baragar wrote:
> What about BSON (http://bsonspec.org/), binary encoded JSON?
The C code for BSON isn't very good, and as I mentioned with msgpack,
binary protocols are hard to parse in a lot of languages.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Andrew Cholakian
- Date:
- 2011-03-20 @ 19:34
What are your thoughts as far as MessagePack?
It correlates well to JSON, has a very compact representation, and has a
very fast set of widely available bindings.
http://msgpack.org/
On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is? I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>
--
Andrew Cholakian
http://www.andrewvc.com
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 21:34
On Sun, Mar 20, 2011 at 12:34:14PM -0700, Andrew Cholakian wrote:
> What are your thoughts as far as MessagePack?
>
> It correlates well to JSON, has a very compact representation, and has a
> very fast set of widely available bindings.
>
> http://msgpack.org/
msg = [1,2,3].to_msgpack #=> "\x93\x01\x02\x03"
Says it all. That's damn hard to parse well in lots of languages, most
notably javascript. Basically, netstrings are parseable by everything
that can handle ascii text, but most other "fast" formats like msgpack
and BIRT are not.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- joshua simmons
- Date:
- 2011-03-20 @ 20:33
MessagePack is horribly implemented. Get that protocol with a nice plain C
implementation and I'll be there. But not before.
On Mon, Mar 21, 2011 at 6:34 AM, Andrew Cholakian <andrew@andrewvc.com>wrote:
> What are your thoughts as far as MessagePack?
>
> It correlates well to JSON, has a very compact representation, and has a
> very fast set of widely available bindings.
>
> http://msgpack.org/
>
> On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>
>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>> realizing that we sort of need a second protocol that's faster and
>> easier to parse in C (and others) than JSON. JSON's great for getting
>> things going, but I think we can support both JSON and another protocol.
>>
>> As an idea, I cooked up "tagged netstrings". This is simply the idea
>> that you encode JSON style data as a sequence of nested netstrings with
>> their character terminators saying what's inside.
>>
>> Here's a python implementation of parsing it:
>>
>> http://codepad.org/xct0E5ac
>>
>> This needs to have one more type of Blob with ',' terminator so that
>> it's backward compatible with regular netstrings, and you can transmit
>> binary data safely on platforms where that's not possible (javascript),
>> but otherwise this is very easy to parse and generate.
>>
>> Can someone in another language try to replicate this and see how hard
>> it is? I'll be doing C next as a test, but I'd like to see a few others
>> to compare.
>>
>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>> hard as hell to parse compared to this and probably don't buy much in
>> terms of speed for the usability costs.
>>
>> Thanks!
>>
>> --
>> Zed A. Shaw
>> http://zedshaw.com/
>>
>
>
>
> --
> Andrew Cholakian
> http://www.andrewvc.com
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- joshua simmons
- Date:
- 2011-03-20 @ 20:43
And yeah, something that's easier to generate in C would be nice too.
Request_to_payload was (is) one of the major hot spots in the mongrel2 code
and even after changing it up it's still quite nasty.
A protocol that minimises work there would be highly beneficial.
On Mon, Mar 21, 2011 at 7:33 AM, joshua simmons <simmons.44@gmail.com>wrote:
> MessagePack is horribly implemented. Get that protocol with a nice plain C
> implementation and I'll be there. But not before.
>
>
> On Mon, Mar 21, 2011 at 6:34 AM, Andrew Cholakian <andrew@andrewvc.com>wrote:
>
>> What are your thoughts as far as MessagePack?
>>
>> It correlates well to JSON, has a very compact representation, and has a
>> very fast set of widely available bindings.
>>
>> http://msgpack.org/
>>
>> On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com>wrote:
>>
>>> Hey, so I'm tinkering with the idea of a proxy as a handler, but
>>> realizing that we sort of need a second protocol that's faster and
>>> easier to parse in C (and others) than JSON. JSON's great for getting
>>> things going, but I think we can support both JSON and another protocol.
>>>
>>> As an idea, I cooked up "tagged netstrings". This is simply the idea
>>> that you encode JSON style data as a sequence of nested netstrings with
>>> their character terminators saying what's inside.
>>>
>>> Here's a python implementation of parsing it:
>>>
>>> http://codepad.org/xct0E5ac
>>>
>>> This needs to have one more type of Blob with ',' terminator so that
>>> it's backward compatible with regular netstrings, and you can transmit
>>> binary data safely on platforms where that's not possible (javascript),
>>> but otherwise this is very easy to parse and generate.
>>>
>>> Can someone in another language try to replicate this and see how hard
>>> it is? I'll be doing C next as a test, but I'd like to see a few others
>>> to compare.
>>>
>>> Also, no, we won't use protocol bufs, BIRT, or others since those are
>>> hard as hell to parse compared to this and probably don't buy much in
>>> terms of speed for the usability costs.
>>>
>>> Thanks!
>>>
>>> --
>>> Zed A. Shaw
>>> http://zedshaw.com/
>>>
>>
>>
>>
>> --
>> Andrew Cholakian
>> http://www.andrewvc.com
>>
>
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Bobby Powers
- Date:
- 2011-03-20 @ 19:30
I like the extension. So the difference between ',' and '"' is that '"' is
a (utf8? null-terminated?) string, and ',' is a blob of bytes?
yours,
Bobby
On Sun, Mar 20, 2011 at 12:12 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> Hey, so I'm tinkering with the idea of a proxy as a handler, but
> realizing that we sort of need a second protocol that's faster and
> easier to parse in C (and others) than JSON. JSON's great for getting
> things going, but I think we can support both JSON and another protocol.
>
> As an idea, I cooked up "tagged netstrings". This is simply the idea
> that you encode JSON style data as a sequence of nested netstrings with
> their character terminators saying what's inside.
>
> Here's a python implementation of parsing it:
>
> http://codepad.org/xct0E5ac
>
> This needs to have one more type of Blob with ',' terminator so that
> it's backward compatible with regular netstrings, and you can transmit
> binary data safely on platforms where that's not possible (javascript),
> but otherwise this is very easy to parse and generate.
>
> Can someone in another language try to replicate this and see how hard
> it is? I'll be doing C next as a test, but I'd like to see a few others
> to compare.
>
> Also, no, we won't use protocol bufs, BIRT, or others since those are
> hard as hell to parse compared to this and probably don't buy much in
> terms of speed for the usability costs.
>
> Thanks!
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>
Re: [mongrel2] Proposing An Alternative To JSON
- From:
- Zed A. Shaw
- Date:
- 2011-03-20 @ 21:33
On Sun, Mar 20, 2011 at 12:30:50PM -0700, Bobby Powers wrote:
> I like the extension. So the difference between ',' and '"' is that '"' is
> a (utf8? null-terminated?) string, and ',' is a blob of bytes?
Well, whatever the language thinks a "string" is. It's hard to dictate
utf8 because in C that requires a metric pain of crap to handle well.
Additionally, we'll need:
boolean
null
Since JSON has those too. I'll be updating the python sample soon.
Meantime, here's the thing in factor:
http://re-factor.blogspot.com/2011/03/typed-netstrings.html
--
Zed A. Shaw
http://zedshaw.com/