librelist archives

« back to archive

how to safely shut down a handler?

how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-24 @ 13:57
could not decode message

Re: [mongrel2] how to safely shut down a handler?

From:
Zed A. Shaw
Date:
2011-01-25 @ 02:44
On Tue, Jan 25, 2011 at 12:57:06AM +1100, Ryan Kelly wrote:
> 
> Hi All,
> 
>   I am currently getting up to speed with mongrel2, and am trying to
> understand all the bits and pieces by writing my own mongrel2 => wsgi
> gateway in python.  I've got the handler up and running and serving
> requests fine, but I'm not happy with the procedure for shutting it
> down.

Cool, that's a good way to do it.  There's already a couple of these I
believe.

>   There doesn't seem to be any way for a handler to close its 0mq socket
> without potentially dropping requests on the floor and causing clients
> to hang forever.

Yes, the current flaw in Mongrel2 is there's no timeout mechanism so
it's possible to lose a client request and have them hang around for too
long.  I didn't want to come up with a timeout that was generic since
everyone had bizarre ideas of how things should time out.  Instead
there's gear in the control port for doing your own control, but most
folks don't use it.  I'll have to add a timeout that people can disable
just so the common case is handled.

However, this isn't really true:

>   If a request comes in after the call to c.recv() but before the
> sockets are closed, 0mq may queue it to this handler while it is in the
> process of shutting down.  Such a request will never be handled - when
> the handler closes its socket, it will be silently dropped on the floor.

What happens is 0mq has a very good backlog mechanism as long as you
name the receiver.  It's actually may favorite feature of Mongrel2 and
0MQ.  Close doesn't work the same in 0mq as in a regular socket, where
you will lose messages if you're not careful.  In 0mq a close just stops
receiving messages and you can actually not even do that.  Then when
your handler starts back up it continues where it left off and gets the
next message.

Problem is, 0mq will "fake" push out more requests than it can really
send, and then have a thread do the real processing.  It does this by
putting them in the backlog on shutdown.  But, if your process isn't
running then it can't finish the work.  So, in order to make sure your
messages continue going out when you restart, you have to *name the
sender*.

Naming the sender consistently makes it possible for you to stop a
process, then kick it back up and all the messages that hand't gone out
will get restarted.

>   I have written a small test case that can reproduce this
> connection-hanging behaviour (see attached).  The file testhandler.py is
> a simple handler as outlined above, and the file testrunner.py launches
> mongrel with two instances of this handler, then runs some requests
> against them.  Shutting down one of the handler processes causes a
> client connection to hang indefinitely, despite another handler being
> available to service it.

Yep, this happens because of this:

    send_ident = str(uuid.uuid4())

On line 20 of your test handler you're making a totally random
sender_ident, so any previous messages that were in the queue won't go
out and get dropped on the restart.  Set it to a fixed ident for the two
handlers and it should work fine.

In fact, I do this all the time when coding with mongrel2.  I'll kill a
handler, accidentally hit it with a browser, which causes it to hang,
then I go and start the browser back up and it just completes the
request like nothing happened.  I'll also be thrashing a handler then
kill it mid-swing, start it back up and, like magic the messages start
going out again from the previous run.

>   On my machine, `python testrunner.py` will reliably hang when making
> its final request.
> 
> 
>   So:  is there a reliable way to shut down a handler without dropping
> connections on the floor?

Make sure you use a consistent ident like I mention above.  Also look at
options for the high water mark, queueing, and setting a send ident as
well as they all have an effect on this in 0mq.  When in doubt, go look
at what we've already made since I have this all the time without
problems and do reliable restarts.


-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-25 @ 10:01
could not decode message

Re: [mongrel2] how to safely shut down a handler?

From:
Zed A. Shaw
Date:
2011-01-25 @ 10:28
On Tue, Jan 25, 2011 at 09:01:54PM +1100, Ryan Kelly wrote:
> I guess my understanding of 0mq isn't quite up to speed yet.  How does
> this backlog mechanism mesh with the manpage for zmq_close, which states
> that "any outstanding messages physically received from the network but
> not yet received by the application with zmq_recv() shall also be
> dropped"?

That's the receive side, what you're using is the send log.  You can
also give the receive side a name and it'll have a similar backlog and
some other stuff, but I haven't delved into that much.

But, at a certain point, you can't rely on the network for giving you
perfect total message transport.  0mq isn't like rabbit or IBM mq, it's
just more like a non-mq (thus the 0 in it).  You could build a "message
is always transmitted and stored and totally always delivered no matter
what" system, but you'd have to do that.  With 0mq it'd be trivial easy
but you'd need some storage, and then a bit of a protocol for it.

> Just to clarify, do you mean give both handler processes the same ident,
> or give each handler process a different-but-fixed ident?

Nope you need two idents for this test, but they should be consistent.
It's the way the mongrel2 handlers are configured anyway.  In Tir I just
read the ident and junk out of the .sqlite based on the route.  Makes it
very easy to config.

> Trying to give both handlers the same ident gives me an assertion error
> down in the bowels of zmq.  I'll double-check my build and try again
> tomorrow.

But, you should be able to start X handlers with the same ident.
They'll get the messages round-robin then, but I do it all the time.

> But this case is slightly different, in that I am not *restarting* any
> handlers.  The test begins with two handlers running initially and
> load-balancing requests between them.  Then after serving some requests
> one of the handlers goes away forever.

So, thinking about that for a second, you're describing this:

1. I have two handlers A and B.
2. Both A and B begin handling requests, with the idents A and B.
3. I then kill B and never restart it.
4. Why is the message B tried to send not being sent by A?

Simple:  B and A don't know about each other so if you never restart B
it'll never send its messages.  It needs to be running to send them.

> By way of explanation, I'm trying to simulate on-demand scaling of the
> number of handler processes - as load goes up, start more handlers; as
> load goes down, kill some off.

In order for that to work, you'll need to make sure all of the handlers
that are running have the exact same send_idents, and you'll have to get
into the 0mq docs on how to adjust the queue and delivery options.
Right now the scenario you're emulating is more of what I describe
above, where you're effectively taking an entire handler offline then
wondering why it's not sending messages.

Really, it sounds to me like you need to do this with a real application
that's using the existing Python code I have and a simple handler.
Fire up say 3 of them and simulate the same thing, then figure out what
options get you close to what you want.  Rather than trying to both
figure out how to make 0mq do what you want and write a wsgi handler.

> Ideally, the handlers that die off will do so cleanly and not take any
> queued requests along with them.  It won't be the end of the world if a
> few requests time out, as long as they don't get stuck forever.  So I'll
> also take a closer look at timing out requests via the control port.

I still need to do a timeout mechanism.  Play with the control port to
cook up whatever you want for timeout, and when you have something (or
an idea) that isn't totally crazy then let me know and I'll just put it
in mongrel2.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-25 @ 11:03
On Tue, 2011-01-25 at 02:28 -0800, Zed A. Shaw wrote:
> On Tue, Jan 25, 2011 at 09:01:54PM +1100, Ryan Kelly wrote:
> > I guess my understanding of 0mq isn't quite up to speed yet.  How does
> > this backlog mechanism mesh with the manpage for zmq_close, which states
> > that "any outstanding messages physically received from the network but
> > not yet received by the application with zmq_recv() shall also be
> > dropped"?
> 
> That's the receive side, what you're using is the send log.  You can
> also give the receive side a name and it'll have a similar backlog and
> some other stuff, but I haven't delved into that much.

I think that it's more the recv side than the send side that is the
cause of my trouble - see next response.  I will try using a proper
ident on the recv side as well and see if it makes a difference.

> > Just to clarify, do you mean give both handler processes the same ident,
> > or give each handler process a different-but-fixed ident?
> 
> Nope you need two idents for this test, but they should be consistent.
> It's the way the mongrel2 handlers are configured anyway.  In Tir I just
> read the ident and junk out of the .sqlite based on the route.  Makes it
> very easy to config.
> 
> > Trying to give both handlers the same ident gives me an assertion error
> > down in the bowels of zmq.  I'll double-check my build and try again
> > tomorrow.
> 
> But, you should be able to start X handlers with the same ident.
> They'll get the messages round-robin then, but I do it all the time.
> 
> > But this case is slightly different, in that I am not *restarting* any
> > handlers.  The test begins with two handlers running initially and
> > load-balancing requests between them.  Then after serving some requests
> > one of the handlers goes away forever.
> 
> So, thinking about that for a second, you're describing this:
> 
> 1. I have two handlers A and B.
> 2. Both A and B begin handling requests, with the idents A and B.
> 3. I then kill B and never restart it.
> 4. Why is the message B tried to send not being sent by A?
>
> Simple:  B and A don't know about each other so if you never restart B
> it'll never send its messages.  It needs to be running to send them.

Hmmm...I don't *think* that's the scenario I'm describing, but it could
be that I don't have the mental model of mongrel2/0mq sorted out quite
right...

I'm not worried about messages *sent* by B being lost, of course that
will happen when B goes offline.  What I'm worried about is that B seems
to take down an unrelated request when it dies.

Here's my mental picture of things:

  1.  I have Mongrel2 configured with a single Handler, sending reqs
      out on socket X and receiving response data on socket Y.
  2.  I have two handler processes A and B, with idents A and B.
  3.  Both A and B start handling requests, receiving them on socket
      X and sending responses on socket Y.
  4.  This works nicely for a while, with reqs distributed round-robin.
  5.  I ask B to shut itself down, and never restart it.
  6.  A request hangs forever, apparently lost when B closed its socket.

Zooming in, here's the race condition that I think is happening, based
on debugging the thing with print statements and reading up on 0mq:

  5.1  I ask B to shut down. It breaks out of the recv() loop and
       prepares to close its socket.
  5.2  Meanwhile, a new request R arrives.  Mongrel sends R out on
       socket X, and 0mq round-robin delivers it to B.
  5.3  B closes its connection to socket X.  The unreceived request R
       is dropped on the floor.

Since this is a clean shutdown, B could quite happily respond to R if it
knew about it, but I can't find a way for B to say "give me any requests
you have queued, but don't send me any more".

> > By way of explanation, I'm trying to simulate on-demand scaling of the
> > number of handler processes - as load goes up, start more handlers; as
> > load goes down, kill some off.
> 
> In order for that to work, you'll need to make sure all of the handlers
> that are running have the exact same send_idents, and you'll have to get
> into the 0mq docs on how to adjust the queue and delivery options.
> Right now the scenario you're emulating is more of what I describe
> above, where you're effectively taking an entire handler offline then
> wondering why it's not sending messages.
> 
> Really, it sounds to me like you need to do this with a real application
> that's using the existing Python code I have and a simple handler.
> Fire up say 3 of them and simulate the same thing, then figure out what
> options get you close to what you want.  Rather than trying to both
> figure out how to make 0mq do what you want and write a wsgi handler.

I did try using the bundled "mongrel2" python module, but the Connection
class doesn't have any methods for closing down the connection.  So I
pulled the relevant bits out directly into my test code.

I will try again with your prebuilt python code, having the handler
process simply die rather than attempting a clean shutdown.


> > Ideally, the handlers that die off will do so cleanly and not take any
> > queued requests along with them.  It won't be the end of the world if a
> > few requests time out, as long as they don't get stuck forever.  So I'll
> > also take a closer look at timing out requests via the control port.
> 
> I still need to do a timeout mechanism.  Play with the control port to
> cook up whatever you want for timeout, and when you have something (or
> an idea) that isn't totally crazy then let me know and I'll just put it
> in mongrel2.

Will do.


  Thanks,


     Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] how to safely shut down a handler?

From:
Zed A. Shaw
Date:
2011-01-25 @ 19:28
I rewrote it so you have to do the handler restarts manually making it
easier to debug and figure out what's going on.  I then wrote a correct
HTTP reply method and a little time+counter so you can see what messages
are getting sent.

Here's my code:

http://zedshaw.com/m2test-handler-close-zed.tar.gz

Run it like this:

1. Start mongrel2 in one window.
2. Start testhandler.py in a window.
3. run testrunner.py in a 3rd window.
4. Go back to the testhandler.py window and restart it, message
completes.
5. Do this a whole bunch manually and see that it works reliably.

Now, look at like line 35.  That's the secret sauce.  If you want to get
messages with reliable delivery you need to set the ident on the side
you need it.  In your case you want send and receive durability so ident
both sides.

However, there's kind of a weird thing:  you have to restart mongrel2 if
you *remove* the recv ident on a handler.  No idea why, it some 0mq
limitation.  So, comment out line 36 so you're not setting the recv
ident.  Go back and restart mongrel2.

Do the above test again, and by about the 3rd cycle you'll lose a
message and everything gets stuck.

A lot of your code for starting mongrel2 and firing up processes just
confounded the test, since you can actually make it happen with manual
restarts.  Now you can probably take this code and put your process
starting stuff back in to do your test, but I'd suggest creating a 3rd
script that does the process things.  Have testrunner.py just send HTTP.
testhandler.py be the handler.  And testprocess.py be the thing screwing
with processes.

Hope that helps.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] how to safely shut down a handler?

From:
Zed A. Shaw
Date:
2011-01-25 @ 18:14
> Here's my mental picture of things:
> 
>   1.  I have Mongrel2 configured with a single Handler, sending reqs
>       out on socket X and receiving response data on socket Y.
>   2.  I have two handler processes A and B, with idents A and B.

Stop right there.  If you have two handlers with two different idents
using the same route -> 0mq socket from mongrel2 then this is why they
aren't resending messages.

I think I'm going to have to write code for you, since English I think
is failing.  Give me a little bit.


-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-25 @ 22:23
On Tue, 2011-01-25 at 10:14 -0800, Zed A. Shaw wrote:
> > Here's my mental picture of things:
> > 
> >   1.  I have Mongrel2 configured with a single Handler, sending reqs
> >       out on socket X and receiving response data on socket Y.
> >   2.  I have two handler processes A and B, with idents A and B.
> 
> Stop right there.  If you have two handlers with two different idents
> using the same route -> 0mq socket from mongrel2 then this is why they
> aren't resending messages.
> 
> I think I'm going to have to write code for you, since English I think
> is failing.

No, it's me that's failing - looking back at the messages, I appear to
have sent an old version of the test code that shows a different issue.
No wonder we seem to be talking about different things!  Sorry.

Thankfully all is not lost, as it's been a most illuminating discussion
from my end.

I'm going to reboot my test code using the pre-built mongrel2 module and
taking all your advice in this thread on board.


  Thanks,

      Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-25 @ 22:54
could not decode message

Re: [mongrel2] how to safely shut down a handler?

From:
Ryan Kelly
Date:
2011-01-25 @ 20:51
On Tue, 2011-01-25 at 10:14 -0800, Zed A. Shaw wrote:
> > Here's my mental picture of things:
> > 
> >   1.  I have Mongrel2 configured with a single Handler, sending reqs
> >       out on socket X and receiving response data on socket Y.
> >   2.  I have two handler processes A and B, with idents A and B.
> 
> Stop right there.  If you have two handlers with two different idents
> using the same route -> 0mq socket from mongrel2 then this is why they
> aren't resending messages.

Fair enough.  But can I run two handlers simultaneously with the same
ident?  It always crashes with an assertion error for me.

Using your modified code:

 1. Start mongrel2 in one window.
 2. Start testhandler.py in a new window.
 3. Start another testhandler.py in a new window
 =>  mongrel2 crashes with "Assertion failed: !engine (session.cpp:287)"

I guess this is the "segfault Mongrel2 from your handler 
by not using the identities correctly" that Loic mentioned.

(BTW, I'm using the 1.5 release of mongrel2 and have tried this with
both 0mq 2.0.10 and 0mq 2.1.0)


  Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

Re: [mongrel2] how to safely shut down a handler?

From:
Zed A. Shaw
Date:
2011-01-25 @ 23:54
On Wed, Jan 26, 2011 at 07:51:28AM +1100, Ryan Kelly wrote:
> On Tue, 2011-01-25 at 10:14 -0800, Zed A. Shaw wrote:
> Fair enough.  But can I run two handlers simultaneously with the same
> ident?  It always crashes with an assertion error for me.
> 
> Using your modified code:
> 
>  1. Start mongrel2 in one window.
>  2. Start testhandler.py in a new window.
>  3. Start another testhandler.py in a new window
>  =>  mongrel2 crashes with "Assertion failed: !engine (session.cpp:287)"
> 
> I guess this is the "segfault Mongrel2 from your handler 
> by not using the identities correctly" that Loic mentioned.

Well that's special. WTF.  Let me check this out some more, 'cause
that's annoying as hell.



-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] how to safely shut down a handler?

From:
Loic d'Anterroches
Date:
2011-01-25 @ 18:31

On 2011-01-25 19:14, Zed A. Shaw wrote:
>> Here's my mental picture of things:
>>
>>    1.  I have Mongrel2 configured with a single Handler, sending reqs
>>        out on socket X and receiving response data on socket Y.
>>    2.  I have two handler processes A and B, with idents A and B.
>
> Stop right there.  If you have two handlers with two different idents
> using the same route ->  0mq socket from mongrel2 then this is why they
> aren't resending messages.
>
> I think I'm going to have to write code for you, since English I think
> is failing.  Give me a little bit.

I started to really understand what was going on with these identities 
when reading the guide on Transient vs. Durable Sockets". It explains 
very well all the machinery behind and how it affects the message 
delivery. It even shows how you can segfault Mongrel2 from your handler 
by not using the identities correctly.

http://zguide.zeromq.org/chapter:all#toc37

loïc