I've got some changes to the proxy code that might take me a while to get right so I've started a new branch. I'm also tinkering with the idea of testing doing proxying as a separate handler written in C. Any thoughts on this? It might simplify things but I'm not sure how well it will work or if it'll be painfully slow. -- Zed A. Shaw http://zedshaw.com/
Attached is a prototype handler->proxy layer. I'm sure there are lots and lots of bugs (plus it copies the data many times) but it should work as a proof-of-concept. It is designed to run as the "handlertest" and it strips the "/handlertest" off of the path. Everything is hard-coded, and it connects to localhost:8080 so you can do an easy compare with the default web-proxy setup. To change what it connects to search for "8080" My C is rusty, so don't laugh too loudly. It relies on the mongrel2_c_handler, which is linked from the mongrel2 homepage. -Jason
Aha! I was wondering if someone would use the C handler :-D So I'm looking over your code and I have some ideas for moving your work into the C library. * A connection pool/lookup structure (for quick lookups: the hash lib used inside mongrel2) * Should I add the ZMQ socket to the request? Should requests from mongrel2_recv have that info built in? Ideas * Use mongrel2_request_for_disconnect(mongrel2_request *req) to determine whether a connection is closed (line 246). Should I rename this 'mongrel2_request_is_for_disconnect' or something else? * Make the while loop flatter * Use in-proc ZMQ, http://api.zeromq.org/2-1-1:zmq-inproc , instead of TCP and handle files as a separate thread in Mongrel2. Basic dir handling shouldn't require procer. Mongrel2 should start those directory handlers in-process at boot. What say you Zed? ** An entry point like mongrel2_dir_handler(directory, pull_socket, pub_socket) as a friendly entry point for mongrel2 to invoke? ** What about ignoring certain files? Sometimes people want to express 'ignore .svn/' in their directory handling. Current directory handler does not do this so I'm not worried about it. * Use ragel to generate the allowable states/transitions and use the task library in mongrel2. Essentially, take the existing parts of the connection state machine used to handle files and move them to this handler. * find_request_by_id should not depend on a global variable. Who knows if people will want one handler for multiple directories, backends, or something. Or they'll want to tweak the connection store a certain store, etc. I'm not asking you to do any of those, but I'm throwing them out for review. And now I'm racking my noodle on how to efficiently saturate the pipes to the client. I don't think we'll be using the multipart support in ZMQ, as it will only deliver the message the mongrel2 after all the parts have been delivered (aka, 200mb in some ZMQ buffer before m2 even sees it). How would a Linux distro use this to host their DVD ISOs? I'm curious to see how ZMQ pollout option to zmq_poll works. And finally, if you had any issues installing or using the C api, let me know. I'm not happy with the API just yet and some feedback would help. Thanks for contributing the handler code to the list! Xavier On Tue, Mar 15, 2011 at 4:41 PM, Jason Miller <jason@milr.com> wrote: > Attached is a prototype handler->proxy layer. > > I'm sure there are lots and lots of bugs (plus it copies the data many > times) but it should work as a proof-of-concept. > > It is designed to run as the "handlertest" and it strips the > "/handlertest" off of the path. Everything is hard-coded, and it > connects to localhost:8080 so you can do an easy compare with the > default web-proxy setup. To change what it connects to search for > "8080" > > My C is rusty, so don't laugh too loudly. > > It relies on the mongrel2_c_handler, which is linked from the mongrel2 > homepage. > > -Jason > >
On 22:33 Tue 15 Mar , Xavier Lange wrote: > Aha! I was wondering if someone would use the C handler :-D > > * Should I add the ZMQ socket to the request? Should requests from > mongrel2_recv have that info built in? The Python backend is kind of the de-facto reference backend, so I'll tell you what it does: There is a "connection" object that encapsulates the pub/sub and push/pull connections to mongrel2, and there are request objects that encapsulate each message sent from mongrel2. So if you want to be more like other backends, you could wrap the 2 zmq sockets in a struct and call it mongrel2_connection or something. I don't think it's a big deal though. > > Ideas > ><snip> > I'm not asking you to do any of those, but I'm throwing them out for review. All of these seem reasonable. However, this is throw-away code. It's a half-assed state-machine that uses the poll flags to determine what state a connection is in. Look at the part where I strip "/handlertest" from the beginning of the location string by incrementing a pointer by 12 if you aren't convinced this shouldn't be the basis of anything. If we decide that a handler->proxy backend is superior to a built-in mongrel2 backend, I would suggest starting from scratch rather than trying to improve this code. It was about 4 hours of work so you're not losing much. > > And now I'm racking my noodle on how to efficiently saturate the pipes > to the client. I don't think we'll be using the multipart support in > ZMQ, as it will only deliver the message the mongrel2 after all the > parts have been delivered (aka, 200mb in some ZMQ buffer before m2 > even sees it). How would a Linux distro use this to host their DVD > ISOs? I'm curious to see how ZMQ pollout option to zmq_poll works. I think other people have already mentioned this, but you can send your data for a request back piecemeal. > > And finally, if you had any issues installing or using the C api, let > me know. I'm not happy with the API just yet and some feedback would > help. Biggest one is that there are no functions to send without using a request object. You ought to be able to send with just a connection id. I had to keep the request objects around forever; I would have like to be able to finalize them as soon as I sent the data out. Also, see the deliver* functions in other backends that let you send to multiple connections at one time. I didn't need that here, but I looked for it as a possible way to respond without a request object. -Jason
On Tue, Mar 15, 2011 at 10:33:29PM -0700, Xavier Lange wrote: > Aha! I was wondering if someone would use the C handler :-D Actually, my plan was to use the code already in Mongrel2 for doing proxies (which works great) and just take it out of mongrel2 core and put it in a 0mq handler. > And now I'm racking my noodle on how to efficiently saturate the pipes > to the client. I don't think we'll be using the multipart support in > ZMQ, as it will only deliver the message the mongrel2 after all the > parts have been delivered (aka, 200mb in some ZMQ buffer before m2 > even sees it). How would a Linux distro use this to host their DVD > ISOs? I'm curious to see how ZMQ pollout option to zmq_poll works. Yes, not sure what the performance of this is, but I think people trying to saturate a pipe by sending a big DVD from a proxy backend are probably doing it wrong. They should be sending it directly out of a directory from Mongrel2. -- Zed A. Shaw http://zedshaw.com/
On 09:04 Wed 16 Mar , Zed A. Shaw wrote: > Actually, my plan was to use the code already in Mongrel2 for doing > proxies (which works great) and just take it out of mongrel2 core and > put it in a 0mq handler. That sounds great. My intuition was that if you do that, 0mq won't be a significant bottleneck. I hacked this little demo up to see if my intuition was right, and it seems to be. -Jason
On 09:04 Wed 16 Mar , Zed A. Shaw wrote: > On Tue, Mar 15, 2011 at 10:33:29PM -0700, Xavier Lange wrote: > > Aha! I was wondering if someone would use the C handler :-D > > Actually, my plan was to use the code already in Mongrel2 for doing > proxies (which works great) and just take it out of mongrel2 core and > put it in a 0mq handler. > > > And now I'm racking my noodle on how to efficiently saturate the pipes > > to the client. I don't think we'll be using the multipart support in > > ZMQ, as it will only deliver the message the mongrel2 after all the > > parts have been delivered (aka, 200mb in some ZMQ buffer before m2 > > even sees it). How would a Linux distro use this to host their DVD > > ISOs? I'm curious to see how ZMQ pollout option to zmq_poll works. > > Yes, not sure what the performance of this is, but I think people trying > to saturate a pipe by sending a big DVD from a proxy backend are > probably doing it wrong. They should be sending it directly out of a > directory from Mongrel2. Performance is fine for large files through handlers; I've saturated GigE on a year-old laptop. Are you talking about a limitation of the existing proxy code? > > > -- > Zed A. Shaw > http://zedshaw.com/ >
Hello, On 2011-03-16 17:04, Zed A. Shaw wrote: > On Tue, Mar 15, 2011 at 10:33:29PM -0700, Xavier Lange wrote: >> Aha! I was wondering if someone would use the C handler :-D > > Actually, my plan was to use the code already in Mongrel2 for doing > proxies (which works great) and just take it out of mongrel2 core and > put it in a 0mq handler. > >> And now I'm racking my noodle on how to efficiently saturate the pipes >> to the client. I don't think we'll be using the multipart support in >> ZMQ, as it will only deliver the message the mongrel2 after all the >> parts have been delivered (aka, 200mb in some ZMQ buffer before m2 >> even sees it). How would a Linux distro use this to host their DVD >> ISOs? I'm curious to see how ZMQ pollout option to zmq_poll works. > > Yes, not sure what the performance of this is, but I think people trying > to saturate a pipe by sending a big DVD from a proxy backend are > probably doing it wrong. They should be sending it directly out of a > directory from Mongrel2. Another perspective, for me the happy side of Mongrel2 is that I can push my websites one level deeper in my infrastructure without the need to worry about where Mongrel2 is. I just need to know: where to connect my handlers, a bit of path, host configuration and nothing more. client <-> Mongrel2 <- my LAN -> my happy farm of handlers I can send a very large file from my handler to my client by just sending a series of smaller chunks to the given connection ID. In fact I am working on a MongoDB GridFS to send my files by iterating over the 256K chunks. This is why I am happy you published mulletdb with grace as it means it should be possible to create a small multithreaded handler to do the job pretty easily. I will not serve a DVD, but at the moment files up to 45MB. This is also why I am not using the tmp files for the upload and let zmq do the spooling for me. Because if I setup an upload path, I need to have a shared folder where to read the files from or another little zmq daemon where my handlers will request the file. Another piece of logic I do not want to have/manage. Added bonus of the handler way to serve ISOs is that I have always several backups of everything, so I can put my handlers directly on my file servers and get a MogileFS for free. I need to stop, the ability to address any payload from any backend to a given client with the connection id is so liberating that all the hacks I have can be replaced by a clean farm of handlers. I think again and again that this is the point people are not catching when they think "Mongrel2 is just like WSGI but the same for all the languages or just a proxy". Performance wise, I was able to get 150Mbps throughput doing a siege on a 50kB page served by my handlers. It was PHP doing PHP work, so I suppose we can saturate any links. True, the day I will need to process more than 8k req/s is not here. Again, sorry to write emails which are so long... loïc
On Wed, Mar 16, 2011 at 9:04 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > On Tue, Mar 15, 2011 at 10:33:29PM -0700, Xavier Lange wrote: >> Aha! I was wondering if someone would use the C handler :-D > > Actually, my plan was to use the code already in Mongrel2 for doing > proxies (which works great) and just take it out of mongrel2 core and > put it in a 0mq handler. Yes, that makes sense. For some reason I thought proxymustdie also meant dirmustdie.
Hello, > And now I'm racking my noodle on how to efficiently saturate the pipes > to the client. I don't think we'll be using the multipart support in > ZMQ, as it will only deliver the message the mongrel2 after all the > parts have been delivered (aka, 200mb in some ZMQ buffer before m2 > even sees it). How would a Linux distro use this to host their DVD > ISOs? The structure of Mongrel does not force you to go this way. At the handler level you can buffer up to let say 1MB and send it as a message to the connection id doing the request, then again in chunks of 1MB. You don't need the multipart messages of ZMQ to do so, it is just "long polling/streaming" and we already have it for free. loïc > On Tue, Mar 15, 2011 at 4:41 PM, Jason Miller <jason@milr.com> wrote: >> Attached is a prototype handler->proxy layer. >> >> I'm sure there are lots and lots of bugs (plus it copies the data many >> times) but it should work as a proof-of-concept. >> >> It is designed to run as the "handlertest" and it strips the >> "/handlertest" off of the path. Everything is hard-coded, and it >> connects to localhost:8080 so you can do an easy compare with the >> default web-proxy setup. To change what it connects to search for >> "8080" >> >> My C is rusty, so don't laugh too loudly. >> >> It relies on the mongrel2_c_handler, which is linked from the mongrel2 >> homepage. >> >> -Jason >> >>
On 12:36 Mon 14 Mar , Zed A. Shaw wrote: > I've got some changes to the proxy code that might take me a while to > get right so I've started a new branch. I'm also tinkering with the > idea of testing doing proxying as a separate handler written in C. > > Any thoughts on this? It might simplify things but I'm not sure how > well it will work or if it'll be painfully slow. I like the idea of the handler code and the proxy code being the same code-path. My intuition is that the overhead wouldn't be too bad for most dynamic content. The backend would be basically JSON headers -> HTTP headers, then copy the body unmodified, right? Plus perhaps rewriting one or two of the header fields (e.g. path) If you want I could prototype a handler that does just this. Whether or not it qualifies painfully slow really depends on what is behind the proxy. If it's nginx serving a static file, then it obviously would be slow in comparison, but that begs the question of why you didn't just register it as a static file in mongrel2. -Jason
Hello, tl;dr: Good, this is flexible and will simplify Mongrel2 itself, wonder with really by files. On 2011-03-15 07:22, Jason Miller wrote: > On 12:36 Mon 14 Mar , Zed A. Shaw wrote: >> I've got some changes to the proxy code that might take me a while to >> get right so I've started a new branch. I'm also tinkering with the >> idea of testing doing proxying as a separate handler written in C. >> >> Any thoughts on this? It might simplify things but I'm not sure how >> well it will work or if it'll be painfully slow. > I like the idea of the handler code and the proxy code being the same > code-path. > > My intuition is that the overhead wouldn't be too bad for most dynamic > content. The backend would be basically JSON headers -> HTTP headers, > then copy the body unmodified, right? Plus perhaps rewriting one or two > of the header fields (e.g. path) In fact this would be great. My use cases for a proxy is to proxy Apache serving Subversion. One thing one always need to do, is to rewrite the destination header when you are running SSL in the front and not in the back. One could do that at the handler level. But, what about large files? At the moment, we get and send the data from/to Mongrel2 with one (possibly big) ZMQ message. If my users are doing a checkout of a 200MB file or if you simply proxy a 200MB file, we will need to be smart by sending the messages in chunks (no chunk encoding, but several messages to the same connection id - yes this ability to address a message to a given client connection is insanely great) or reuse the ZMQ multipart messages. I don't know what is the best. > If you want I could prototype a handler that does just this. > > Whether or not it qualifies painfully slow really depends on what is > behind the proxy. If it's nginx serving a static file, then it > obviously would be slow in comparison, but that begs the question of why > you didn't just register it as a static file in mongrel2. The "slow" is always very relative to your needs in fact. The dispatching of a request from Mongrel2 to the handler and back (I will call that the dispatch loop of Mongrel2) is submillisecond. With the handler approach, it means that you can add more handlers in the back to proxy your server. Added benefit, you get a simpler FSM for Mongrel2 because you can remove all the proxy code. So it may even speed up Mongrel2. The only point I am not sure is if Mongrel2 can take fully benefit of a multicore system. I am not familiar (this name ring some bells) enough with the coroutine/FSM approach of Mongrel2 to understand if it can. If it can, it will be a very long way before one single box is not enough to handle all my traffic at the front end level. If I need more than one Mongrel2 in front to handle the traffic, I will start to have the right routing at the backend to send back to the right Mongrel2 (not very complex, but still another layer). loïc
On Tue, Mar 15, 2011 at 09:04:12AM +0100, Loic d'Anterroches wrote: > Hello, > > But, what about large files? At the moment, we get and send the data > from/to Mongrel2 with one (possibly big) ZMQ message. Yeah, that's the part that needs figuring out. I'll probably just test it out based on Jason's experiments so far. -- Zed A. Shaw http://zedshaw.com/