Re: [mongrel2] Handling large requests across machines
- From:
- Matt Towers
- Date:
- 2011-08-15 @ 16:16
Is there any built-in support for setting a max size for file uploads or
is the preferred method to use the callback mechanism?
✈ Matt
On Aug 12, 2011, at 09:41 , Zed A. Shaw wrote:
> On Sun, Aug 07, 2011 at 12:31:01PM -0400, Jim Fulton wrote:
>> The manual describes handling large requests by having the server
>> create temporary files. That won't work if the handler is on a
>> different machine, at least not without some sort of network file
>> system.
>
> You'll have this problem no matter how you do it. The file has to be
> put somewhere, and you don't want to stream huge files into RAM, instead
> you want it to go to disk. But, if it goes to disk then it has to be
> where the server is.
>
>> I suppose I could implement a local handler
>> that forwards to other handlers, breaking the original
>> message into multiple messages.
>
> This is what I do, and when I have large upload volumes I use a
> *separate* upload server that handles just this. With zeromq it's
> pretty easy to make this work:
>
> 1. Upload goes to upload.mysite.com, mongrel2 streams to a diskstore.
> 2. Hanlder on upload.mysite.com handles the nitty gritty of receiving
> the uploaded file, checking it, altering it, authenticating, etc.
> 3. Handler on upload.mysite.com then shoves it into a permanent place,
> like S3 or another server, and then notifies anyone who cares that
> there's a new file.
> 4. Another handler on a different server then can complete the HTTP
> request after it sees the upload is done.
>
> There's quite a few ways to slice it, but if you don't want to stream
> into RAM, then you have to make a handler local to the upload machine.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
Re: [mongrel2] Handling large requests across machines
- From:
- Zed A. Shaw
- Date:
- 2011-08-15 @ 17:40
On Mon, Aug 15, 2011 at 09:16:22AM -0700, Matt Towers wrote:
> Is there any built-in support for setting a max size for file uploads
> or is the preferred method to use the callback mechanism?
I didn't add any, but it could be added, with the caveat that
chunked-encoding will be a little odd.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Handling large requests across machines
- From:
- michael j pan
- Date:
- 2011-08-12 @ 17:49
> You'll have this problem no matter how you do it. The file has to be
> put somewhere, and you don't want to stream huge files into RAM, instead
> you want it to go to disk. But, if it goes to disk then it has to be
> where the server is.
I'm just a newbie that started looking into mongrel2 today, but could
mongrel2 just push the bytes into a zmq socket, and let any
subscribers handle it (including one for writing to disk)?
Mike
Re: [mongrel2] Handling large requests across machines
- From:
- Zed A. Shaw
- Date:
- 2011-08-13 @ 16:31
On Sat, Aug 13, 2011 at 01:49:24AM +0800, michael j pan wrote:
> > You'll have this problem no matter how you do it. The file has to be
> > put somewhere, and you don't want to stream huge files into RAM, instead
> > you want it to go to disk. But, if it goes to disk then it has to be
> > where the server is.
>
> I'm just a newbie that started looking into mongrel2 today, but could
> mongrel2 just push the bytes into a zmq socket, and let any
> subscribers handle it (including one for writing to disk)?
It does this already, and you can pump the threshold for a request size
up with a setting, but we're talking files that are really huge that you
don't want to store in RAM. Like imagine a video site with 300M on
average uploads. You'd need a ton of RAM just to handle even a few
simultaneous uploads. Instead, mongrel2 let's you do a slightly more
complicated streaming scheme that helps you avoid getting files until
they're really worth handling.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Handling large requests across machines
- From:
- Jim Fulton
- Date:
- 2011-08-12 @ 17:38
On Fri, Aug 12, 2011 at 12:41 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> On Sun, Aug 07, 2011 at 12:31:01PM -0400, Jim Fulton wrote:
>> The manual describes handling large requests by having the server
>> create temporary files. Â That won't work if the handler is on a
>> different machine, at least not without some sort of network file
>> system.
>
> You'll have this problem no matter how you do it. Â The file has to be
> put somewhere, and you don't want to stream huge files into RAM, instead
> you want it to go to disk. Â But, if it goes to disk then it has to be
> where the server is.
>
>> I suppose I could implement a local handler
>> that forwards to other handlers, breaking the original
>> message into multiple messages.
>
> This is what I do, and when I have large upload volumes I use a
> *separate* upload server that handles just this. Â With zeromq it's
> pretty easy to make this work:
>
> 1. Upload goes to upload.mysite.com, mongrel2 streams to a diskstore.
> 2. Hanlder on upload.mysite.com handles the nitty gritty of receiving
> the uploaded file, checking it, altering it, authenticating, etc.
> 3. Handler on upload.mysite.com then shoves it into a permanent place,
> like S3 or another server, and then notifies anyone who cares that
> there's a new file.
> 4. Another handler on a different server then can complete the HTTP
> request after it sees the upload is done.
That's reasonable, although I'd prefer not to have to special case
these sorts of requests.
For my apps, I'f I took this approach, I'd end up writing blobs into
databases. This would end up involving streaming the data to the
database server and then to the apps that need it. (The database
network protocol supports streaming blob data.) So I'd end up
streaming data accross machines anyway.
> There's quite a few ways to slice it, but if you don't want to stream
> into RAM, then you have to make a handler local to the upload machine.
If Mongrel2 split large requests into multiple (0mq) frames and used
0mq flow control, then it could take data off of the HTTP port only as
fast as it could send data to 0mq. This would allow streaming of
large inputs across machines without using up lots of RAM. This would
require handlers to set the 0mw high-water mark on their PULL sockets
(and Mongrel2 to set it on it's push socket) and, of course, it would
require a change to the Mongrel2 handler protocol.
I'm still evaluating Mongrel2. If I use it, I'll implement a broker
(for dynamic work allocation based on request characteristics, worker
availability and load), which could run on the same machine as the
Mongrel2 machine and stream large requests to workers using a tweaked
handler protocol.
Jim
--
Jim Fulton
http://www.linkedin.com/in/jimfulton
Re: [mongrel2] Handling large requests across machines
- From:
- Zed A. Shaw
- Date:
- 2011-08-13 @ 16:28
On Fri, Aug 12, 2011 at 01:38:35PM -0400, Jim Fulton wrote:
> I'm still evaluating Mongrel2. If I use it, I'll implement a broker
> (for dynamic work allocation based on request characteristics, worker
> availability and load), which could run on the same machine as the
> Mongrel2 machine and stream large requests to workers using a tweaked
> handler protocol.
I'd be down with that, and do you mean something that answers the upload
requests and does the chunked streaming out? That's pretty easy to
implement in just about any scripting language.
--
Zed A. Shaw
http://zedshaw.com/
Re: [mongrel2] Handling large requests across machines
- From:
- Jim Fulton
- Date:
- 2011-08-13 @ 17:11
On Sat, Aug 13, 2011 at 12:28 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
> On Fri, Aug 12, 2011 at 01:38:35PM -0400, Jim Fulton wrote:
>> I'm still evaluating Mongrel2. If I use it, I'll implement a broker
>> (for dynamic work allocation based on request characteristics, worker
>> availability and load), which could run on the same machine as the
>> Mongrel2 machine and stream large requests to workers using a tweaked
>> handler protocol.
>
> I'd be down with that, and do you mean something that answers the upload
> requests and does the chunked streaming out?
Yup.
> That's pretty easy to
> implement in just about any scripting language.
Yup.
Jim
--
Jim Fulton
http://www.linkedin.com/in/jimfulton
Re: [mongrel2] Handling large requests across machines
- From:
- Jim Fulton
- Date:
- 2011-08-12 @ 18:59
On Fri, Aug 12, 2011 at 1:38 PM, Jim Fulton <jim@zope.com> wrote:
> On Fri, Aug 12, 2011 at 12:41 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:
>> On Sun, Aug 07, 2011 at 12:31:01PM -0400, Jim Fulton wrote:
...
>> There's quite a few ways to slice it, but if you don't want to stream
>> into RAM, then you have to make a handler local to the upload machine.
>
> If Mongrel2 split large requests into multiple (0mq) frames and used
> 0mq flow control, then it could take data off of the HTTP port only as
> fast as it could send data to 0mq. This would allow streaming of
> large inputs across machines without using up lots of RAM. This would
> require handlers to set the 0mw high-water mark on their PULL sockets
> (and Mongrel2 to set it on it's push socket) and, of course, it would
> require a change to the Mongrel2 handler protocol.
Thinking about this some more, this wouldn't work with Mongrel's
push model, as there's no way to route related frames to the right
handler.
Jim
--
Jim Fulton
http://www.linkedin.com/in/jimfulton
Re: [mongrel2] Handling large requests across machines
- From:
- Zed A. Shaw
- Date:
- 2011-08-13 @ 16:29
On Fri, Aug 12, 2011 at 02:59:05PM -0400, Jim Fulton wrote:
> > If Mongrel2 split large requests into multiple (0mq) frames and used
> > 0mq flow control, then it could take data off of the HTTP port only as
> > fast as it could send data to 0mq. This would allow streaming of
> > large inputs across machines without using up lots of RAM. This would
> > require handlers to set the 0mw high-water mark on their PULL sockets
> > (and Mongrel2 to set it on it's push socket) and, of course, it would
> > require a change to the Mongrel2 handler protocol.
>
> Thinking about this some more, this wouldn't work with Mongrel's
> push model, as there's no way to route related frames to the right
> handler.
You'd definitely have to use a XREP socket and target specific
receivers. It's going to have all sorts of other issues too, like if
the connection is closed, then you just wasted a ton of time spewing
data at your handlers for no reason.
--
Zed A. Shaw
http://zedshaw.com/