librelist archives

« back to archive

Streaming Data

Streaming Data

From:
Alexander Kern
Date:
2010-07-13 @ 15:34
How exactly (if at all) will Mongrel2 handle streaming data, such as  
file uploads or large PUT or POST requests? Will the entity body first  
be completely downloaded into a temporary file and the filename sent  
to the handler, or will the handler be able to incrementally read the  
data as it comes in? One of the major points at which Node.js excels  
is it's ability to efficiently handle streaming data, but Mongrel2's  
architecture seems to be focused more on synchronous applications.

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 16:56
On Tue, Jul 13, 2010 at 08:34:01AM -0700, Alexander Kern wrote:
> How exactly (if at all) will Mongrel2 handle streaming data, such as  
> file uploads or large PUT or POST requests? Will the entity body first  
> be completely downloaded into a temporary file and the filename sent  
> to the handler, or will the handler be able to incrementally read the  
> data as it comes in? One of the major points at which Node.js excels  
> is it's ability to efficiently handle streaming data, but Mongrel2's  
> architecture seems to be focused more on synchronous applications.

So, I've been cooking up schemes in my mind, but right now I'm
contemplating kind of a "split" design on it with the goal of making it
easy to both tell the browser the request was done, but not block the
browser while you actually handle the upload contents.

What I've got so far, and tell me what you think, is having it do:

1. Mongrel2 sees the "MOBY" request, which means a big body.
2. It notifies the backend handler right away that one is coming in,
with all the relevant headers, but not actual body yet.
3. The backend will reply with the response that should go out when the upload
is done after doing the usual checks.
4. Mongrel2 will then deal with the browser's request by streaming the
actual contents to a temp file somewhere.
5. When the upload is complete, Mongrel2 shoots the response it has
saved from #3 above to the browser, and then...
6. Sends a new message to another "upload handler" telling it the upload
was done, and where to get the tmpfile.
7. The upload handler then does whatever needs to be done.  Video
transcoding, S3 push, notify other handlers, whatever.

It may even be possible to have the handler from #3 indicate what the
final "upload route target" should be, but probably as a later feature.

What do you think?

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Alexander Kern
Date:
2010-07-13 @ 17:12
Interesting idea. Some comments:

> 1. Mongrel2 sees the "MOBY" request, which means a big body.
Where do you draw the line between the size of the bodies? Maybe a  
specific SQLite setting that could be set as the buffer size (16kb or  
something small). On top of that, two ways of accessing the same  
information could be annoying: what if a form can be submitted with or  
without a file attached? This would complicate handler code.

> 3. The backend will reply with the response that should go out when  
> the upload
> is done after doing the usual checks.
What if the checks involve processing of the body itself? For example,  
the browser (or that god-awful thing called Flash) could send an  
invalid or unknown (think application/octet-stream, which Flash  
stupidly sends uploads with) Content-Type because of its inability to  
guess the MIME from the file extension. If you send one of these to  
something like an image upload service it *should* reply with a 415  
Unsupported Media Type (or 400 if you're lazy). This check can't be  
done unless you are able to check the file itself.

> 4. Mongrel2 will then deal with the browser's request by streaming the
> actual contents to a temp file somewhere.
Where would this tempfile be, and how would it be accessed? One of the  
benefits of using something like 0mq is that the handler can be  
located on the network rather than locally on the filesystem. Would  
you give the handler a URL that it could stream the data from (using  
some streaming protocol or raw TCP)?

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 17:26
On Tue, Jul 13, 2010 at 10:12:11AM -0700, Alexander Kern wrote:
> Interesting idea. Some comments:
> 
> >1. Mongrel2 sees the "MOBY" request, which means a big body.
> Where do you draw the line between the size of the bodies? Maybe a
> specific SQLite setting that could be set as the buffer size (16kb
> or something small). On top of that, two ways of accessing the same
> information could be annoying: what if a form can be submitted with
> or without a file attached? This would complicate handler code.

Yep, there'd be a cutoff somewhere in the config, so if you set it high
enough you wouldn't deal with the uploads.  Also, the first handler
could easily give a response of "screw it, just hand it to me" to make a
better decision.

As for complicating the handler, yep it does do that, but since handler
are fairly easy to write, and you'd write two, it's hopefully not too
hard and makes your app deal with file uploads way better.


> >3. The backend will reply with the response that should go out
> >when the upload
> >is done after doing the usual checks.
> What if the checks involve processing of the body itself? For
> example, the browser (or that god-awful thing called Flash) could
> send an invalid or unknown (think application/octet-stream, which
> Flash stupidly sends uploads with) Content-Type because of its
> inability to guess the MIME from the file extension. If you send one
> of these to something like an image upload service it *should* reply
> with a 415 Unsupported Media Type (or 400 if you're lazy). This
> check can't be done unless you are able to check the file itself.

So in this case you'd need a way for the handler that deals with the
actual file to give the response, not the first?  That's doable, and
actually it might be simple to just have handlers only send responses if
they're supposed to.  Remember, Mongrel2 doesn't care who sends what to
a connected browser, so both, none, or one of the handlers you've got
can give responses.

In this case, just don't have the first handler say anything if the
request is alright.  Then the second handler does its thing with the
file and sends the 415 response if needed.  That also simplifies the
design a bunch.

> >4. Mongrel2 will then deal with the browser's request by streaming the
> >actual contents to a temp file somewhere.
> Where would this tempfile be, and how would it be accessed? One of
> the benefits of using something like 0mq is that the handler can be
> located on the network rather than locally on the filesystem. Would
> you give the handler a URL that it could stream the data from (using
> some streaming protocol or raw TCP)?

There's three proposals I've thought of for this:

1. On a file on disk, it's up to your upload processing handler to
figure it out from there.
2. Out of an HTTP directory, so something on the network can grab it.
3. Off a raw socket so you can just connect FTP style and pull the whole
thing down.

Of the three I like #1 since it means you can implement #2 and #3 if you
need and it works for everyone in the simplest case, while letting
people get more complex if they need.  #3 is teh suck to me since that
means defining a new protocol for very little benefit.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Alexander Kern
Date:
2010-07-13 @ 17:44
> Yep, there'd be a cutoff somewhere in the config, so if you set it  
> high
> enough you wouldn't deal with the uploads.  Also, the first handler
> could easily give a response of "screw it, just hand it to me" to  
> make a
> better decision.
I think the reverse would also be useful, something like "never send  
me the entity body". Certain resources will almost always receive  
POSTs or PUTs with a binary attached (image upload services  
especially). It'd be beneficial to the handler of such a service to  
have only one way of accessing the body, even if the image is a 2kb  
PNG or a 10mb JPG.  Just a thought.

> In this case, just don't have the first handler say anything if the
> request is alright.  Then the second handler does its thing with the
> file and sends the 415 response if needed.  That also simplifies the
> design a bunch.
This would work, but why have 2 handlers then? Couldn't you just skip  
the first handler and let the second one take care of business?

> 1. On a file on disk, it's up to your upload processing handler to
> figure it out from there.
Definitely do this.

> 2. Out of an HTTP directory, so something on the network can grab it.
Wouldn't this defeat one of the major benefits of Mongrel2? This would  
force the client/handler/whoever to parse HTTP using whatever  
(possibly slow) parser they have instead of the ragel one.

> 3. Off a raw socket so you can just connect FTP style and pull the  
> whole
> thing down.
Blech, seems hacky and unfinished, I agree.

Could the handler be configured instead to receive something like  
chunked HTTP, with it receiving an initial header message, then  
blocking until it receives subsequent body messages?

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 18:05
On Tue, Jul 13, 2010 at 10:44:30AM -0700, Alexander Kern wrote:
> > In this case, just don't have the first handler say anything if the
> > request is alright.  Then the second handler does its thing with the
> > file and sends the 415 response if needed.  That also simplifies the
> > design a bunch.
> This would work, but why have 2 handlers then? Couldn't you just skip  
> the first handler and let the second one take care of business?

Because you don't want Mongrel2 to process any files that it shouldn't.
A *very* common scenario is that the front web server in a cluster
completes a whole request, then hits the backend only to find out that
the URL is wrong, or that request isn't valid, or the login is wrong,
or it's too big, etc.  Something that's easily checked from just headers
on an initial hit to a handler.

What this cuts down on is, if the base HTTP request indicates that the
upload should not happen, then Mongrel2 can cut the browser off right
away and shoot a response without having to complete the upload.

But, nothing prevents you from having the same handler deal with it.
It's just routing after all, and since the message formats are the
universal you just have to deal with both requests and do your thing.

> > 1. On a file on disk, it's up to your upload processing handler to
> > figure it out from there.
> Definitely do this.
> 
> > 2. Out of an HTTP directory, so something on the network can grab it.
> Wouldn't this defeat one of the major benefits of Mongrel2? This would  
> force the client/handler/whoever to parse HTTP using whatever  
> (possibly slow) parser they have instead of the ragel one.

Yep, thus why I'm not so interested in this.  I think the majority of
folks who do any serious uploads are most likely going to be pushing the
uploaded file to some "S3 like thing" and dealing with it there.  #1
makes this and anything else possible.  Like I said, you can implement
#2 if you have #1, but the inverse is harder.

> Could the handler be configured instead to receive something like  
> chunked HTTP, with it receiving an initial header message, then  
> blocking until it receives subsequent body messages?

Yep, except 99% of all HTTP libraries suck and couldn't handle this type
of streaming.  It'd be possible to implement it easily though using the
basic primitives.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Fred Alger
Date:
2010-07-13 @ 19:52
On Jul 13, 2010, at 14:05 , Zed A. Shaw wrote:

> On Tue, Jul 13, 2010 at 10:44:30AM -0700, Alexander Kern wrote:
>>> In this case, just don't have the first handler say anything if the
>>> request is alright.  Then the second handler does its thing with the
>>> file and sends the 415 response if needed.  That also simplifies the
>>> design a bunch.
>> This would work, but why have 2 handlers then? Couldn't you just skip  
>> the first handler and let the second one take care of business?
> 
> Because you don't want Mongrel2 to process any files that it shouldn't.
> A *very* common scenario is that the front web server in a cluster
> completes a whole request, then hits the backend only to find out that
> the URL is wrong, or that request isn't valid, or the login is wrong,
> or it's too big, etc.  Something that's easily checked from just headers
> on an initial hit to a handler.
> 
> What this cuts down on is, if the base HTTP request indicates that the
> upload should not happen, then Mongrel2 can cut the browser off right
> away and shoot a response without having to complete the upload.
This is brilliant; I haven't ever seen a web server that would do 
something other than blindly wait for the request to finish before 
processing and sending a response.  In order to still handle HTTP 
correctly, it seems like you'd need to introduce another mongrel2 state, 
like, "we're going to tell the client to piss off, but they're still 
sending garbage, so throw away what they send and then respond."  Then 
again, given the internal FSM, that should be straightforward.

But yeah, overall, I like the dual-handler design for uploads with a "fast
fail" if one of the handlers rejects the request outright; makes a hell of
a lot of sense to me.

best,
- Fred.
http://weblog.fredalger.net/
@_phred

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 22:20
On Tue, Jul 13, 2010 at 03:52:28PM -0400, Fred Alger wrote:
> On Jul 13, 2010, at 14:05 , Zed A. Shaw wrote:
> > What this cuts down on is, if the base HTTP request indicates that
> > the upload should not happen, then Mongrel2 can cut the browser off
> > right away and shoot a response without having to complete the
> > upload.
>
> This is brilliant; I haven't ever seen a web server that would do
> something other than blindly wait for the request to finish before
> processing and sending a response.  In order to still handle HTTP
> correctly, it seems like you'd need to introduce another mongrel2
> state, like, "we're going to tell the client to piss off, but they're
> still sending garbage, so throw away what they send and then respond."
> Then again, given the internal FSM, that should be straightforward.

I think actually this would be a new state of "MobyRequest", not to be
confused with the musician. :-)  It'd be similar to Proxying, but
instead it's negotiating the upload of a giant request body.

> But yeah, overall, I like the dual-handler design for uploads with a
> "fast fail" if one of the handlers rejects the request outright; makes
> a hell of a lot of sense to me.

About the only thing that'd have to be worked out is if this is kosher
with the protocol.  I think the server is allowed to close the socket
violently and send a reply, but not sure if browsers will like that or
get it.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Alexander Kern
Date:
2010-07-13 @ 18:24
> Because you don't want Mongrel2 to process any files that it  
> shouldn't.
> A *very* common scenario is that the front web server in a cluster
> completes a whole request, then hits the backend only to find out that
> the URL is wrong, or that request isn't valid, or the login is wrong,
> or it's too big, etc.  Something that's easily checked from just  
> headers
> on an initial hit to a handler.
>
> What this cuts down on is, if the base HTTP request indicates that the
> upload should not happen, then Mongrel2 can cut the browser off right
> away and shoot a response without having to complete the upload.
>
> But, nothing prevents you from having the same handler deal with it.
> It's just routing after all, and since the message formats are the
> universal you just have to deal with both requests and do your thing.
I love the way WebMachine deals with this. Basically it maps the HTTP  
protocol to a set of callback functions and has a decision engine  
behind them. Only after parsing the headers does it even touch the  
body (and yes, it does support streaming). I think we're thinking of  
the same thing, basically letting the handler (or multiple handlers)  
parse stuff in any order they want, sending a response once they have  
enough information.

(This is the perfect use case for 100 Continue, by the way. If only  
browsers actually *used* it...)

> Yep, except 99% of all HTTP libraries suck and couldn't handle this  
> type
> of streaming.  It'd be possible to implement it easily though using  
> the
> basic primitives.
I completely agree. True HTTP support in general sucks. When it  
exists, the interface usually sucks or is too low level to make code  
expressive. I'm writing a Ruby/Node library right now that deals with  
just this (since Node's is too low level and Ruby's just sucks).

Mongrel2 will require some changes in application deployment, so why  
not encourage users to use streaming HTTP libraries? :)

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 18:47
On Tue, Jul 13, 2010 at 11:24:47AM -0700, Alexander Kern wrote:
> Mongrel2 will require some changes in application deployment, so why  
> not encourage users to use streaming HTTP libraries? :)

Because nobody would write them.  HTTP chunked encoding is a bizarre
often abused corner of the standard, and it's horribly innefficient.
It's way better to just let whoever needs and want this use the basics
to get implement it than trying to do it myself and spend the next year
convincing people to do it my way.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Eric Wong
Date:
2010-07-13 @ 20:56
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Tue, Jul 13, 2010 at 11:24:47AM -0700, Alexander Kern wrote:
> > Mongrel2 will require some changes in application deployment, so why  
> > not encourage users to use streaming HTTP libraries? :)
> 
> Because nobody would write them.  HTTP chunked encoding is a bizarre
> often abused corner of the standard, and it's horribly innefficient.
> It's way better to just let whoever needs and want this use the basics
> to get implement it than trying to do it myself and spend the next year
> convincing people to do it my way.

While definitely a corner case and rarely seen, chunked encoding is
can be useful and more efficient if used in a pipeline.

Mobile devices can stream compressed voice data to a server as even
while that stream is active (somebody is speaking into it).  A
chunk-aware server can then start processing that data before the
speaker has even finished speaking and return a result sooner after the
last phrase is spoken.  Since processing audio data can be expensive, it
makes even more sense to process it incrementally as the client uploads
it.


Another potentially useful case is if I run out of space on my
local machine and need to backup to a storage provider:

   tar zcf - pr0n/ | curl -T- http://example.com/my_faxes.tar.gz

I've been meaning to teach curl to calculate and write Content-MD5:
trailers, too, so the data can be streamed once and checksummed
on-the-fly for the server to verify.


I find the above examples quite useful in case where writing large
amounts of data to the local filesystem isn't possible.  The extra
memory bandwidth on the server to needed to decode chunks in userspace
shouldn't be much compared to filesystem I/O on the client.

-- 
Eric Wong

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-13 @ 22:16
On Tue, Jul 13, 2010 at 01:56:14PM -0700, Eric Wong wrote:
> "Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> > On Tue, Jul 13, 2010 at 11:24:47AM -0700, Alexander Kern wrote:
> > > Mongrel2 will require some changes in application deployment, so why  
> > > not encourage users to use streaming HTTP libraries? :)
> > 
> > Because nobody would write them.  HTTP chunked encoding is a bizarre
> > often abused corner of the standard, and it's horribly innefficient.
> > It's way better to just let whoever needs and want this use the basics
> > to get implement it than trying to do it myself and spend the next year
> > convincing people to do it my way.
> 
> While definitely a corner case and rarely seen, chunked encoding is
> can be useful and more efficient if used in a pipeline.

For everything you said, you could replace chunked encoding with faster
and more reliable 0MQ messages, or "plain old sockets".  It almost
always breaks down that if you're trying to "stream" chunks over to a
server from a client, and you use chunked encoding, then you don't grok
sockets.  The *already* stream.  They're sockets.  That's what they do.
Stream.  No chunks needed.

For example, sending a chunked encoding from a client is retarded
because it's only gotta deal with one buffer.  There's no "memory
limit", it's a single buffer.  You call malloc, and then read/write from
it a bunch.  Why chunked encoding ever comes into play is beyond me.
That's like adding 500 bytes of overhead per message to TCP/IP just so
you can feel safer like when you wear belts and suspenders.

If however you're using chunked encoding like some ghetto RPC, then use
0MQ instead. Or RabbitMQ, or nearly anything else.  Trying to do
bidirectional chunked encoding as a message protocol is just baffling.

Anyway, sorry about the rant, just every time I see someone claiming to
need client side chunked encoding I call bullshit.  No offense
personally intended.

when it's all working then try it out.  I'm sure it's actually not hard
to implement, just something that every person is going to want to do
totally differently.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Eric Wong
Date:
2010-07-14 @ 02:05
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> On Tue, Jul 13, 2010 at 01:56:14PM -0700, Eric Wong wrote:
> > "Zed A. Shaw" <zedshaw@zedshaw.com> wrote:
> > > On Tue, Jul 13, 2010 at 11:24:47AM -0700, Alexander Kern wrote:
> > > > Mongrel2 will require some changes in application deployment, so why  
> > > > not encourage users to use streaming HTTP libraries? :)
> > > 
> > > Because nobody would write them.  HTTP chunked encoding is a bizarre
> > > often abused corner of the standard, and it's horribly innefficient.
> > > It's way better to just let whoever needs and want this use the basics
> > > to get implement it than trying to do it myself and spend the next year
> > > convincing people to do it my way.
> > 
> > While definitely a corner case and rarely seen, chunked encoding is
> > can be useful and more efficient if used in a pipeline.
> 
> For everything you said, you could replace chunked encoding with faster
> and more reliable 0MQ messages, or "plain old sockets".  It almost
> always breaks down that if you're trying to "stream" chunks over to a
> server from a client, and you use chunked encoding, then you don't grok
> sockets.  The *already* stream.  They're sockets.  That's what they do.
> Stream.  No chunks needed.

Of course plain old sockets will always be faster than chunking.  But
HTTP overhead isn't that much with large bodies/chunks.  HTTP is already
ubiquitous and trying to get client developers to adopt/learn new stuff
isn't easy.

I don't do much client-side development, but the popular libcurl already
supports HTTP chunking and I suspect other client libraries do, too,
especially when it comes to mobile devices.

> For example, sending a chunked encoding from a client is retarded
> because it's only gotta deal with one buffer.  There's no "memory
> limit", it's a single buffer.  You call malloc, and then read/write from
> it a bunch.  Why chunked encoding ever comes into play is beyond me.
> That's like adding 500 bytes of overhead per message to TCP/IP just so
> you can feel safer like when you wear belts and suspenders.

Assuming a 4K chunk, I only count 8 bytes of overhead per chunk:

  "1000\r\n", payload, "\r\n"

I haven't studied 0MQ, but since it can use TCP (and most likely, must
use TCP when dealing with remote/mobile clients) it would have to deal
with message boundaries to split them into messages, too.

I'm fine with paying an extra few bytes to avoid introducing the
maintenance overhead of more protocols.

> If however you're using chunked encoding like some ghetto RPC, then use
> 0MQ instead. Or RabbitMQ, or nearly anything else.  Trying to do
> bidirectional chunked encoding as a message protocol is just baffling.

It's a bit weird, yes, but it was fun to try once upon a time
and I could've used it to get around firewalls.

> Anyway, sorry about the rant, just every time I see someone claiming to
> need client side chunked encoding I call bullshit.  No offense
> personally intended.

None taken.

> when it's all working then try it out.  I'm sure it's actually not hard
> to implement, just something that every person is going to want to do
> totally differently.

Not hard for people on this mailing list, sure, but I've seen plenty of
"programmers" struggle to even make HTTP GET requests with whatever
libraries they're using.  Introducing them to new libraries/protocols
would take quite a lot of effort.

-- 
Eric Wong

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-14 @ 06:50
On Wed, Jul 14, 2010 at 02:05:16AM +0000, Eric Wong wrote:
> Of course plain old sockets will always be faster than chunking.  But
> HTTP overhead isn't that much with large bodies/chunks.  HTTP is already
> ubiquitous and trying to get client developers to adopt/learn new stuff
> isn't easy.

First off, HTTP is not ubiquitous.  It's not the only major protocol on
the internet, and that is still no reason to use it for everything.  If
you mean "everywhere" as in there's a library in every language, so is
sockets, and bastardizing HTTP to be some kind of lame socket protocol
just because the library is there is backwards.

Also, you realize you're advocating basing a protocol on the HTTP
libraries that are in most languages, which are total crap, and which
are then on top of TCP anyway.


> I don't do much client-side development, but the popular libcurl already
> supports HTTP chunking and I suspect other client libraries do, too,
> especially when it comes to mobile devices.

Nope, they don't, not as HTTP requests with chunked encodings in them.
Hell, the majority of them can't even get mime encoding for file uploads
right.  I mean seriously man, how are they going to get chunked encoding
right?

> Assuming a 4K chunk, I only count 8 bytes of overhead per chunk:
> 
>   "1000\r\n", payload, "\r\n"

Alright, but what's that get you?  What's it's purpose again?  So far
you've advocated it for:

1. Constrained RAM:  Nope, every socket library there is can allow me to
use a buffer of even 1 character in size, so that's not accurate.
2. Constrained Disk:  Again, sockets allow for arbitrary sized storage
as the buffer and do not require the entire dataset to be in RAM or on
Disk to use them.
3. To send chunks: A tautology, you need to use "chunked" encoding so
you can send chunks of data, on a socket which can already do that.
4. To avoid additional protocols: So rather than write a clean protocol
for an odd purpose, you would rather stack that protocol on HTTP which
is then on top of sockets?  What's next, SSL inside SOAP inside HTTP
inside SSL inside sockets?
5. For developer simplicity: First, it's a total myth that sockets are
hard. Second, how is X on top of HTTP on top of sockets easier than just
sockets?  All the *exact* same errors from sockets are there, plus any
other layers.
6. To send through port 80: First off, Mongrel2 shows that the port
doesn't matter.  The parser handles two different protocols just fine on
the same port.  Secondly, this is a security hack and solid proof that
this use case is just that, a hack.

Pretty much none of the reasoning stands, and so far your argument, and
that of other people's, is some weird idea that HTTP is simpler for
developers, so let's do everything through HTTP.

This belief that programmers are too stupid to understand basic sockets,
but that they'll understand a protocol written on top of sockets is just
maddening.

> I'm fine with paying an extra few bytes to avoid introducing the
> maintenance overhead of more protocols.

If you are putting chunks of video inside chunked encoding, you have
just invented a new protocol inside another protocol being used in an
odd way.  There is *no* way that's easier to maintain, debug, or operate
with.

The real solution is not, "Coders get HTTP, so put everything inside it no
matter what."  It is actually, "Create your protocol that works best for
you, then write a good library they can just use."  That's basically
what you are really trying to get with this neckbeard feature of
protocols inside HTTP.  If the problem is developer usability, then the
solution is not using something familiar, but to give them a *usable*
way to access your protocol.

For example, this is why in one day someone was able to craft a C++
library for doing mongrel handlers, and someon else helped them.
Because *I* crafted a protocol that was easy to understand, and then
wrote a nice clean simple library for others to work with.

That's the real solution, not this HTTP cargo culting.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Timothy M Rodriguez
Date:
2010-07-14 @ 13:05
To add an interesting side point.  It's interesting how more and more 
protocols and functionality have been layered over HTTP.  Firewalls used 
to be relatively simple in that you could block most ports except those 
necessary, and you'd block a huge swatch of attacks.  Now those ports 
barely even matter.  Everything is tunneled over HTTP, so we need fancy 
application firewalls that use DPI to figure out what the heck is being 
tunneled.  Add SSL in to the mix, and you can see how this rationale has 
gotten us into a bit of a quagmire.

-Tim


On Jul 14, 2010, at 2:50 AM, Zed A. Shaw wrote:

> On Wed, Jul 14, 2010 at 02:05:16AM +0000, Eric Wong wrote:
>> Of course plain old sockets will always be faster than chunking.  But
>> HTTP overhead isn't that much with large bodies/chunks.  HTTP is already
>> ubiquitous and trying to get client developers to adopt/learn new stuff
>> isn't easy.
> 
> First off, HTTP is not ubiquitous.  It's not the only major protocol on
> the internet, and that is still no reason to use it for everything.  If
> you mean "everywhere" as in there's a library in every language, so is
> sockets, and bastardizing HTTP to be some kind of lame socket protocol
> just because the library is there is backwards.
> 
> Also, you realize you're advocating basing a protocol on the HTTP
> libraries that are in most languages, which are total crap, and which
> are then on top of TCP anyway.
> 
> 
>> I don't do much client-side development, but the popular libcurl already
>> supports HTTP chunking and I suspect other client libraries do, too,
>> especially when it comes to mobile devices.
> 
> Nope, they don't, not as HTTP requests with chunked encodings in them.
> Hell, the majority of them can't even get mime encoding for file uploads
> right.  I mean seriously man, how are they going to get chunked encoding
> right?
> 
>> Assuming a 4K chunk, I only count 8 bytes of overhead per chunk:
>> 
>>  "1000\r\n", payload, "\r\n"
> 
> Alright, but what's that get you?  What's it's purpose again?  So far
> you've advocated it for:
> 
> 1. Constrained RAM:  Nope, every socket library there is can allow me to
> use a buffer of even 1 character in size, so that's not accurate.
> 2. Constrained Disk:  Again, sockets allow for arbitrary sized storage
> as the buffer and do not require the entire dataset to be in RAM or on
> Disk to use them.
> 3. To send chunks: A tautology, you need to use "chunked" encoding so
> you can send chunks of data, on a socket which can already do that.
> 4. To avoid additional protocols: So rather than write a clean protocol
> for an odd purpose, you would rather stack that protocol on HTTP which
> is then on top of sockets?  What's next, SSL inside SOAP inside HTTP
> inside SSL inside sockets?
> 5. For developer simplicity: First, it's a total myth that sockets are
> hard. Second, how is X on top of HTTP on top of sockets easier than just
> sockets?  All the *exact* same errors from sockets are there, plus any
> other layers.
> 6. To send through port 80: First off, Mongrel2 shows that the port
> doesn't matter.  The parser handles two different protocols just fine on
> the same port.  Secondly, this is a security hack and solid proof that
> this use case is just that, a hack.
> 
> Pretty much none of the reasoning stands, and so far your argument, and
> that of other people's, is some weird idea that HTTP is simpler for
> developers, so let's do everything through HTTP.
> 
> This belief that programmers are too stupid to understand basic sockets,
> but that they'll understand a protocol written on top of sockets is just
> maddening.
> 
>> I'm fine with paying an extra few bytes to avoid introducing the
>> maintenance overhead of more protocols.
> 
> If you are putting chunks of video inside chunked encoding, you have
> just invented a new protocol inside another protocol being used in an
> odd way.  There is *no* way that's easier to maintain, debug, or operate
> with.
> 
> The real solution is not, "Coders get HTTP, so put everything inside it no
> matter what."  It is actually, "Create your protocol that works best for
> you, then write a good library they can just use."  That's basically
> what you are really trying to get with this neckbeard feature of
> protocols inside HTTP.  If the problem is developer usability, then the
> solution is not using something familiar, but to give them a *usable*
> way to access your protocol.
> 
> For example, this is why in one day someone was able to craft a C++
> library for doing mongrel handlers, and someon else helped them.
> Because *I* crafted a protocol that was easy to understand, and then
> wrote a nice clean simple library for others to work with.
> 
> That's the real solution, not this HTTP cargo culting.
> 
> -- 
> Zed A. Shaw
> http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Zed A. Shaw
Date:
2010-07-14 @ 17:39
On Wed, Jul 14, 2010 at 09:05:07AM -0400, Timothy M Rodriguez wrote:
> To add an interesting side point.  It's interesting how more and more
> protocols and functionality have been layered over HTTP.  Firewalls
> used to be relatively simple in that you could block most ports except
> those necessary, and you'd block a huge swatch of attacks.  Now those
> ports barely even matter.  Everything is tunneled over HTTP, so we
> need fancy application firewalls that use DPI to figure out what the
> heck is being tunneled.  Add SSL in to the mix, and you can see how
> this rationale has gotten us into a bit of a quagmire.

True, but I think a counter to that is that you still need a cooperating
server on the other side that understands the layering.  I've found that
it's only companies who want to block traffic going out that have this
problem, or governments.  Traffic coming in still have to HTTP since it
would need a cooperating server internally to function.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Timothy M Rodriguez
Date:
2010-07-14 @ 20:02
Good point.  It doesn't matter as much on ingress.


On Jul 14, 2010, at 1:39 PM, Zed A. Shaw wrote:

> On Wed, Jul 14, 2010 at 09:05:07AM -0400, Timothy M Rodriguez wrote:
>> To add an interesting side point.  It's interesting how more and more
>> protocols and functionality have been layered over HTTP.  Firewalls
>> used to be relatively simple in that you could block most ports except
>> those necessary, and you'd block a huge swatch of attacks.  Now those
>> ports barely even matter.  Everything is tunneled over HTTP, so we
>> need fancy application firewalls that use DPI to figure out what the
>> heck is being tunneled.  Add SSL in to the mix, and you can see how
>> this rationale has gotten us into a bit of a quagmire.
> 
> True, but I think a counter to that is that you still need a cooperating
> server on the other side that understands the layering.  I've found that
> it's only companies who want to block traffic going out that have this
> problem, or governments.  Traffic coming in still have to HTTP since it
> would need a cooperating server internally to function.
> 
> -- 
> Zed A. Shaw
> http://zedshaw.com/

Re: [mongrel2] Streaming Data

From:
Andrew Cholakian
Date:
2010-07-13 @ 16:32
I was wondering the same thing myself, since ZMQ messages are atomic, you'd
need to send multiple messages to do streaming (which the backend protocol
doesn't seem to support yet). You could use the raw TCP backend though.

On Tue, Jul 13, 2010 at 8:34 AM, Alexander Kern <alex@kernul.com> wrote:

> How exactly (if at all) will Mongrel2 handle streaming data, such as
> file uploads or large PUT or POST requests? Will the entity body first
> be completely downloaded into a temporary file and the filename sent
> to the handler, or will the handler be able to incrementally read the
> data as it comes in? One of the major points at which Node.js excels
> is it's ability to efficiently handle streaming data, but Mongrel2's
> architecture seems to be focused more on synchronous applications.
>



-- 
Andrew Cholakian
http://www.andrewvc.com