I'm thinking of using mongrel2 for an embedded application, mostly because the handler format is very attractive and easy to support. However, something I need to do is stream large (many gigabytes) amounts of generated data; considerably more data than my device has RAM. Currently, there appears to be no form of flow control available, so when handling a large request where bandwidth is limited on the other side, the mongrel2 process's memory usage bloats up immensely until it is killed. The only way I could see to fix this within the current 0mq handler scheme would be to add flow control messages, similar the the current JSON disconnect message. Preferably these would report the server's buffer size for the connection once it's past a certain threshold, so the handler can decide to choke off output until the other side catches up. Are there any plans for something like this in the works, or any better idea on how to fix this problem? (Obviously I could proxy the HTTP or otherwise handle the socket directly, but I'm trying to avoid that.) -bcd
On Tue, Aug 09, 2011 at 01:12:02AM -0500, Brian Downing wrote: > I'm thinking of using mongrel2 for an embedded application, mostly > because the handler format is very attractive and easy to support. Very cool. > However, something I need to do is stream large (many gigabytes) > amounts of generated data; considerably more data than my device has RAM. > Currently, there appears to be no form of flow control available, so when > handling a large request where bandwidth is limited on the other side, the > mongrel2 process's memory usage bloats up immensely until it is killed. Do you need to stream large gigabytes TO the mongrel2 server or FROM? In other words, are you trying to abuse HTTP to create a bidirectional chat protocol that streams out gigabytes of data perpetually in both directions? Or, do you want to have someone connect and then you stream out huge amounts of data. OR, do you want to have someone send you huge amounts of data streamed piece by piece? -- Zed A. Shaw http://zedshaw.com/
On Fri, Aug 12, 2011 at 09:45:51AM -0700, Zed A. Shaw wrote: > > However, something I need to do is stream large (many gigabytes) > > amounts of generated data; considerably more data than my device has RAM. > > Currently, there appears to be no form of flow control available, so when > > handling a large request where bandwidth is limited on the other side, the > > mongrel2 process's memory usage bloats up immensely until it is killed. > > Do you need to stream large gigabytes TO the mongrel2 server or FROM? > > In other words, are you trying to abuse HTTP to create a bidirectional > chat protocol that streams out gigabytes of data perpetually in both > directions? Nope. > Or, do you want to have someone connect and then you stream out huge > amounts of data. I want to stream FROM mongrel2. Typically this will be a simple GET request, so there's no incoming body from the client at all. However, I will be generating the response to send to the client on the fly, and I will be able to do that much faster than a WiFi network or most Internet connections can transmit. I want to be able to send at full available bandwidth to the client without filling memory. With current mongrel2, the handler generating the data has no way of knowing the socket to the HTTP client is full and just keeps happily spewing out to the PUB socket, and the mongrel2 process's outgoing io buffer bloats up until it is oomkilled. If I were handling the socket myself it is obviously trivial to cease filling the write buffer when socket writes start returning EAGAIN. This is once scenerio. In another I am streaming data, again FROM mongrel2, to more than one connected client at once. Unlike the above scenario, the data are being generated live. In most cases they will be generated slower than the bandwidth available to the client, but if this is not the case and a client gets too far behind I want to close the connection to that client rather than have it get data that are too out-of-date. (This is not a real-time application; Internet latencies are fine, but I don't want to keep getting further and further behind if the client can't keep up.) > OR, do you want to have someone send you huge amounts of data streamed > piece by piece? Nope, nothing like that. -bcd
On Fri, Aug 12, 2011 at 01:05:15PM -0500, Brian Downing wrote: > I want to stream FROM mongrel2. Typically this will be a simple GET > request, so there's no incoming body from the client at all. Ok, so it's easy to do with a handler, but... > With current mongrel2, the handler generating the data has no way of > knowing the socket to the HTTP client is full and just keeps happily > spewing out to the PUB socket, and the mongrel2 process's outgoing io > buffer bloats up until it is oomkilled. If I were handling the socket > myself it is obviously trivial to cease filling the write buffer when > socket writes start returning EAGAIN. Yeah, that's a problem, because you aren't really in control of the socket directly so you have very little idea what's going on. If I were to tackle this I'd be looking at the control port's status information (which you can work with a simple json) protocol. It keeps track of average bytes transfered and how long the connection's been active. You could use that to throttle your handler, but it'd get complicated. To be honest, it sounds to me like for this application Mongrel2 might not work for you. You *really* want to have full control of the socket in this case and without direct access to the socket you'd be screwed. You'll always be a 2nd class citizen. If you can think of some data you could use to make this work I'd be happy to entertain adding it, or work up something that does it off the control port. One final thing to try, is just set the client's timeout such that if it can't keep up with your transfer rate requirements then mongrel2 will kill it. Look at the limits.min_ping, limits.min_read, limits.min_write settings in "Tweakable Expert Settings": http://mongrel2.org/static/mongrel2-manual.html#x1-400003.10 I think if you set those to something reasonable, and set the limits.tick_timer low enough, then Mongrel2 will monitor those connections for you and throw them out. If you want better control than that, then hit me up with ideas. > This is once scenerio. In another I am streaming data, again FROM > mongrel2, to more than one connected client at once. Unlike the above > scenario, the data are being generated live. In most cases they will > be generated slower than the bandwidth available to the client, but if > this is not the case and a client gets too far behind I want to close > the connection to that client rather than have it get data that are too > out-of-date. (This is not a real-time application; Internet latencies > are fine, but I don't want to keep getting further and further behind > if the client can't keep up.) This one shouldn't be hard, assuming you know that the client is going too slow, you just send it a close (which is a 0 length message to that client). Also, you know you can send one message to target up to 128 clients at a time right? That include closing them, so you can easily handle large amounts of streaming. Let me know if you want to play with the control port idea as well. Basically, fire up "m2sh control" and try this: status what=net If you have a few connections going at the time, then you can see what data it tracks. This control port is fully accessible from a programming language, since it's just a simple tnetstring protocol. With that data you could possibly make a little thing that watches the connections and signals your handlers to back off when it sees they're getting overloaded. -- Zed A. Shaw http://zedshaw.com/
> I want to stream FROM mongrel2. Typically this will be a simple GET > request, so there's no incoming body from the client at all. However, > I will be generating the response to send to the client on the fly, > and I will be able to do that much faster than a WiFi network or most > Internet connections can transmit. I want to be able to send at full > available bandwidth to the client without filling memory. We have a similar requirement and here's how we solve it. Our current implementation is not using mongrel2, but mongrel2 is only the mediator. - client requests data - mediator forward the request to the handler - the handler replies to the mediator the socket info to which it will push the data (eg a zmq endpoint tcp://1.1.1.1:5000) - the client connects to that endpoint using zmq sub - the handler binds to that endpoint, and publishes all the data there HTH Mike
On Sat, Aug 13, 2011 at 02:18:12AM +0800, michael j pan wrote: > We have a similar requirement and here's how we solve it. Our current > implementation is not using mongrel2, but mongrel2 is only the > mediator. > > - client requests data > - mediator forward the request to the handler > - the handler replies to the mediator the socket info to which it will > push the data (eg a zmq endpoint tcp://1.1.1.1:5000) > - the client connects to that endpoint using zmq sub > - the handler binds to that endpoint, and publishes all the data there Just to check to see if I am understanding this correctly - the data never goes out over HTTP, and instead the client makes a second zmq connection directly to the handler? Unfortunately in my situation having it go out as part of the HTTP connection is a hard requirement - it should look like a normal GET request to the client, which will typically be a web browser. -bcd
On Sat, Aug 13, 2011 at 02:25, Brian Downing <bdowning@lavos.net> wrote: > On Sat, Aug 13, 2011 at 02:18:12AM +0800, michael j pan wrote: >> We have a similar requirement and here's how we solve it. Our current >> implementation is not using mongrel2, but mongrel2 is only the >> mediator. >> >> - client requests data >> - mediator forward the request to the handler >> - the handler replies to the mediator the socket info to which it will >> push the data (eg a zmq endpoint tcp://1.1.1.1:5000) >> - the client connects to that endpoint using zmq sub >> - the handler binds to that endpoint, and publishes all the data there > > Just to check to see if I am understanding this correctly - the data never > goes out over HTTP, and instead the client makes a second zmq connection > directly to the handler? Unfortunately in my situation having it go out > as part of the HTTP connection is a hard requirement - it should look like > a normal GET request to the client, which will typically be a web browser. > Yup, you understand our scenario correctly. In your case though, you could have the mediator tell the client(s) a HTTP endpoint (as opposed to a ZMQ one). The client would then make an HTTP GET request to that endpoint. Though in this case, one might ask what's the point of mongrel2 for your use case... Mike
On Tue, Aug 9, 2011 at 2:12 AM, Brian Downing <bdowning@lavos.net> wrote: > I'm thinking of using mongrel2 for an embedded application, mostly > because the handler format is very attractive and easy to support. > However, something I need to do is stream large (many gigabytes) > amounts of generated data; considerably more data than my device has RAM. > Currently, there appears to be no form of flow control available, so when > handling a large request where bandwidth is limited on the other side, the > mongrel2 process's memory usage bloats up immensely until it is killed. > > The only way I could see to fix this within the current 0mq handler > scheme would be to add flow control messages, similar the the current > JSON disconnect message. Â Preferably these would report the server's > buffer size for the connection once it's past a certain threshold, so the > handler can decide to choke off output until the other side catches up. > Are there any plans for something like this in the works, or any better > idea on how to fix this problem? Â (Obviously I could proxy the HTTP or > otherwise handle the socket directly, but I'm trying to avoid that.) If mongrel2 used push-pull sockets rather than pub-sub sockets for getting responses from handlers, then 0mq's built-in flow control could be used. The mongrel2 documentation threatens to make the socket types configurable in a later version. I suspect the simplest way to fix this would be to allow push-pull sockets to be used for returning responses and to allow configuration of their high-water marks. This would be far less intrusive at the application level than some sort of application-level flow control. (I'm assuming that mongrel2 only pulls data off it's incoming sockets as fast as it could send it to HTTP clients.) Then again, I'm new to both mongrel2 and 0mq, so I may not know what I'm talking about. :) Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton
On 01:12 Tue 09 Aug , Brian Downing wrote: > The only way I could see to fix this within the current 0mq handler > scheme would be to add flow control messages, similar the the current > JSON disconnect message. Preferably these would report the server's > buffer size for the connection once it's past a certain threshold, so the > handler can decide to choke off output until the other side catches up. > Are there any plans for something like this in the works, or any better > idea on how to fix this problem? (Obviously I could proxy the HTTP or > otherwise handle the socket directly, but I'm trying to avoid that.) > > -bcd > The simplest implementation would be for mongrel2 to just send a message when the data has gone out. Then you can control the amount of outstanding data by: 1) The size of messages you send 2) The number of outstanding messages at a time Something like that ought to solve your problem. I think whether or not mongrel sends these messages ought to be an option rather than just sending them to every handler. Thoughts? -Jason
On Tue, Aug 09, 2011 at 10:23:09AM -0700, Jason Miller wrote: > The simplest implementation would be for mongrel2 to just send a message > when the data has gone out. Then you can control the amount of > outstanding data by: > 1) The size of messages you send > 2) The number of outstanding messages at a time > > Something like that ought to solve your problem. I think whether or not > mongrel sends these messages ought to be an option rather than just > sending them to every handler. Thoughts? That would work for me, though I'm not sure that'd actually be simplest to implement since I doubt the socket buffering code keeps the message boundaries from the handler intact (though I admit I have not looked very deeply at the code). -bcd