librelist archives

« back to archive

Error message question

Error message question

From:
Rob LaRubbio
Date:
2012-01-17 @ 00:17
I'm just starting to look into the mongrel2 code and test my assumptions, 
I'm looking for some feedback to let me know if I'm looking in the correct
direction. 

We recently had an issue in our production environment with missing 
database index that caused our handlers to take a long time to respond.  
That resulted in these error messages from mongrel2 (which quickly filled 
up our disk)

[ERROR] (src/task/fd.c:229: errno: None) Error adding fd: -1 or socket: 
0x1cd9460 to task wait list.
[ERROR] (src/handler.c:180: errno: None) Receive on handler socket failed.
[ERROR] (src/superpoll.c:140: errno: Resource temporarily unavailable) Too
many handler requests outstanding, your handler isn't running: 2560 is 
greater than hot 2560 max.

After fixing the database our handlers were responding in a reasonable 
time, but now we are getting these errors from mongrel2 on stdout:

[ERROR] (src/register.c:199: errno: None) Invalid FD given for exists check
[ERROR] (src/register.c:214: errno: None) Nothing registered under id 103.


We are getting a lot of these, so I've had to set up scripts to cat 
/dev/null onto the stdout log.

Looking at the code I'm wondering if we have too many incoming requests so
mongrel2 is holding open more sockets than the os allowed file descriptors
(I haven't changed that value from it's default)  So when the response 
comes back from the handler and mongrel2 attempts to find the socket to 
send the response to it's been closed.

So my first question is if my assumption sounds correct?  Has anyone dealt
with this before or have tips on optimized setup for mongrel2?  (FYI I'm 
running on ubuntu server)

My second question is about printing timestamps in the logs.  Is there a 
reason that isn't being done?  Would a patch with that be accepted?

My third question is about the "hot 2560 max" is that a list of handlers?
Can that number be increased?  I didn't find a way to do that in the docs 
or the code (but again I've just started looking at the code)

Thanks

-- 
Rob LaRubbio
Zipline Games
Creators of Moai - The Mobile Platform for Pro Game Developers
http://getmoai.com

Re: [mongrel2] Error message question

From:
Dalton Barreto
Date:
2012-01-17 @ 13:02
2012/1/16 Rob LaRubbio <rob@ziplinegames.com>:
> I'm just starting to look into the mongrel2 code and test my assumptions,
> I'm looking for some feedback to let me know if I'm looking in the correct
> direction.
>
> We recently had an issue in our production environment with missing database
> index that caused our handlers to take a long time to respond.  That
> resulted in these error messages from mongrel2 (which quickly filled up our
> disk)
>
> [ERROR] (src/task/fd.c:229: errno: None) Error adding fd: -1 or socket:
> 0x1cd9460 to task wait list.
> [ERROR] (src/handler.c:180: errno: None) Receive on handler socket failed.
> [ERROR] (src/superpoll.c:140: errno: Resource temporarily unavailable) Too
> many handler requests outstanding, your handler isn't running: 2560 is
> greater than hot 2560 max.
>
> After fixing the database our handlers were responding in a reasonable time,
> but now we are getting these errors from mongrel2 on stdout:
>
> [ERROR] (src/register.c:199: errno: None) Invalid FD given for exists check
> [ERROR] (src/register.c:214: errno: N one) Nothing registered under id 103.
>
> We are getting a lot of these, so I've had to set up scripts to cat
> /dev/null onto the stdout log.
>
> Looking at the code I'm wondering if we have too many incoming requests so
> mongrel2 is holding open more sockets than the os allowed file descriptors
> (I haven't changed that value from it's default)  So when the response comes
> back from the handler and mongrel2 attempts to find the socket to send the
> response to it's been closed.
>
> So my first question is if my assumption sounds correct?  Has anyone dealt
> with this before or have tips on optimized setup for mongrel2?  (FYI I'm
> running on ubuntu server)
>

Hello Rob,

Don't know if mongrel2 would even be able to hold more sockets that
the max permitted
by the OS. I think that when trying to open the next socket beyond the
max, the OS would return
an error and the client would probably receive a connection refused
error. Note that I may be totally wrong here =)
I'm just guessing what I think would happen.

Can you measure the amount of requests that are hitting your mongrel2 instance?

I tested my mongrel2 instance with blitz.io, using 1024 concurrent
connections for 60 segonds
and mongrel2 responded very well. Of course this could be nothing
compared to the load your mongrel2
is handling. Just an example. =)


> My second question is about printing timestamps in the logs.  Is there a
> reason that isn't being done?  Would a patch with that be accepted?
>

This is implemented in the develop branch [1] and will be available on
the next release.

> My third question is about the "hot 2560 max" is that a list of handlers?

According to the code this is just a number. This value is returned by
SuperPoll_get_max_hot(SupoerPoll *sp), that is
just a macro that return sp->max_hot.

>  Can that number be increased?  I didn't find a way to do that in the docs
> or the code (but again I've just started looking at the code)

This value (sp->max_hot) is calculated by:

max_hot = SuperPoll_get_max_fd(sp) / hot_dividend

where "hot_dividend" comes for the setting: "superpoll.hot_dividend".

If you don't have EPOLL enabled, max_hot is just the value of
SuperPoll_get_max_fd(sp).

This calculation is done in the function SuperPoll_create()[2] and the
documentation about hot_dividend
is in the "3.10 Tweakable Expert Settings" section [3] of the mongrel2 manual.

Hope it helps, thanks.

[1] https://github.com/zedshaw/mongrel2/pull/95
[2] https://github.com/zedshaw/mongrel2/blob/develop/src/superpoll.c#L84
[3] http://mongrel2.org/static/book-finalch4.html#x6-410003.10


-- 
Dalton Barreto
http://daltonmatos.com

Re: [mongrel2] Error message question

From:
Rob LaRubbio
Date:
2012-01-18 @ 18:07
You're probably correct about the max sockets.  At least if I was 
designing an OS I'd probably not accept a new connection when the limit is
exceeded instead of randomly closing an already open one.  That seems like
the better thing to do. :) 

Our load wasn't near 1024/sec, we were getting about 150 req/s however 
given the slow response of the handlers I would have expected outstanding 
requests to exceed 1024 (but not the 65k available in the REGISTRATIONS 
array).  When I figure out what is going on if it is something that seems 
like it would be of interest to the group I'll post a follow up.  Thanks.

-Rob 

-- 
Rob LaRubbio
Zipline Games
Creators of Moai - The Mobile Platform for Pro Game Developers
http://getmoai.com


On Tuesday, January 17, 2012 at 5:02 AM, Dalton Barreto wrote:

> 2012/1/16 Rob LaRubbio <rob@ziplinegames.com (mailto:rob@ziplinegames.com)>:
> > I'm just starting to look into the mongrel2 code and test my assumptions,
> > I'm looking for some feedback to let me know if I'm looking in the correct
> > direction.
> > 
> > We recently had an issue in our production environment with missing database
> > index that caused our handlers to take a long time to respond.  That
> > resulted in these error messages from mongrel2 (which quickly filled up our
> > disk)
> > 
> > [ERROR] (src/task/fd.c:229: errno: None) Error adding fd: -1 or socket:
> > 0x1cd9460 to task wait list.
> > [ERROR] (src/handler.c:180: errno: None) Receive on handler socket failed.
> > [ERROR] (src/superpoll.c:140: errno: Resource temporarily unavailable) Too
> > many handler requests outstanding, your handler isn't running: 2560 is
> > greater than hot 2560 max.
> > 
> > After fixing the database our handlers were responding in a reasonable time,
> > but now we are getting these errors from mongrel2 on stdout:
> > 
> > [ERROR] (src/register.c:199: errno: None) Invalid FD given for exists check
> > [ERROR] (src/register.c:214: errno: N one) Nothing registered under id 103.
> > 
> > We are getting a lot of these, so I've had to set up scripts to cat
> > /dev/null onto the stdout log.
> > 
> > Looking at the code I'm wondering if we have too many incoming requests so
> > mongrel2 is holding open more sockets than the os allowed file descriptors
> > (I haven't changed that value from it's default)  So when the response comes
> > back from the handler and mongrel2 attempts to find the socket to send the
> > response to it's been closed.
> > 
> > So my first question is if my assumption sounds correct?  Has anyone dealt
> > with this before or have tips on optimized setup for mongrel2?  (FYI I'm
> > running on ubuntu server)
> > 
> 
> 
> Hello Rob,
> 
> Don't know if mongrel2 would even be able to hold more sockets that
> the max permitted
> by the OS. I think that when trying to open the next socket beyond the
> max, the OS would return
> an error and the client would probably receive a connection refused
> error. Note that I may be totally wrong here =)
> I'm just guessing what I think would happen.
> 
> Can you measure the amount of requests that are hitting your mongrel2 instance?
> 
> I tested my mongrel2 instance with blitz.io (http://blitz.io), using 
1024 concurrent
> connections for 60 segonds
> and mongrel2 responded very well. Of course this could be nothing
> compared to the load your mongrel2
> is handling. Just an example. =)
> 
> 
> > My second question is about printing timestamps in the logs.  Is there a
> > reason that isn't being done?  Would a patch with that be accepted?
> > 
> 
> 
> This is implemented in the develop branch [1] and will be available on
> the next release.
> 
> > My third question is about the "hot 2560 max" is that a list of handlers?
> 
> According to the code this is just a number. This value is returned by
> SuperPoll_get_max_hot(SupoerPoll *sp), that is
> just a macro that return sp->max_hot.
> 
> >  Can that number be increased?  I didn't find a way to do that in the docs
> > or the code (but again I've just started looking at the code)
> > 
> 
> 
> This value (sp->max_hot) is calculated by:
> 
> max_hot = SuperPoll_get_max_fd(sp) / hot_dividend
> 
> where "hot_dividend" comes for the setting: "superpoll.hot_dividend".
> 
> If you don't have EPOLL enabled, max_hot is just the value of
> SuperPoll_get_max_fd(sp).
> 
> This calculation is done in the function SuperPoll_create()[2] and the
> documentation about hot_dividend
> is in the "3.10 Tweakable Expert Settings" section [3] of the mongrel2 manual.
> 
> Hope it helps, thanks.
> 
> [1] https://github.com/zedshaw/mongrel2/pull/95
> [2] https://github.com/zedshaw/mongrel2/blob/develop/src/superpoll.c#L84
> [3] http://mongrel2.org/static/book-finalch4.html#x6-410003.10
> 
> 
> -- 
> Dalton Barreto
> http://daltonmatos.com
> 
> 

Re: [mongrel2] Error message question

From:
Zed A. Shaw
Date:
2012-02-08 @ 02:57
Hey Rob, quick response, but what that message means is your backend
isn't handling the requests fast enough so they're piling up in
Mongrel2.  It's not a socket error it's a zeromq error really.

Try running more backend handlers to deal with the requests and also
trace how many req/sec your backends can handle.  Most likely you've got
to also look at DNS load balancing among multiple mongrel2's on the
front to share load.  A lot of this is just common "web server with slow
app" problems in all web apps.

On Wed, Jan 18, 2012 at 10:07:12AM -0800, Rob LaRubbio wrote:
> You're probably correct about the max sockets.  At least if I was
> designing an OS I'd probably not accept a new connection when the
> limit is exceeded instead of randomly closing an already open one.
> That seems like the better thing to do. :) 
> 
> Our load wasn't near 1024/sec, we were getting about 150 req/s however
> given the slow response of the handlers I would have expected
> outstanding requests to exceed 1024 (but not the 65k available in the
> REGISTRATIONS array).  When I figure out what is going on if it is
> something that seems like it would be of interest to the group I'll
> post a follow up.  Thanks.

-- 
Zed A. Shaw
http://zedshaw.com/