Problem with large numbers of hot sockets on Linux?
- From:
- Amos, Matt
- Date:
- 2011-10-11 @ 13:02
Hi,
We've run into a little problem trying to deploy Mongrel2. It's not clear
to me whether this is a configuration problem or an out-and-out bug and
we'd appreciate any help in figuring this out.
What seems (with my limited knowledge of Mongrel2 internals) to happen is
that the number of hot connections exceeds the defined hot limit, which is
fine, but sometimes that happens when the handler socket is idle and then
it can't be brought back into the hot set. This appears to cause a loop
with the following logged over and over, apparently infinitely:
[ERROR] (src/task/fd.c:229: errno: None) Error adding fd: -1 or socket:
0x7fb9fc021ea0 to task wait list.
[ERROR] (src/handler.c:180: errno: None) Receive on handler socket failed.
[ERROR] (src/superpoll.c:140: errno: Resource temporarily unavailable) Too
many handler requests outstanding, your handler isn't running: 256 is
greater than hot 256 max.
Although the message suggests that the handler isn't running, it is; it's
blocking on recv - apparently waiting for the next request. The server
continues to loop, long after the driving requests have stopped.
The immediate reaction is to increase the max_fd setting, but this only
postpones the problem to a larger number of connections. It's particularly
bad when running behind a load-balancer, as a spike on one host may cause
it to loop, be taken out of the rotation and the failure may cascade along
all the servers.
It's not obvious to me where the problem lies, and we'd greatly appreciate
any help in figuring this out.
Many thanks,
Matt
--- Test case ---
I've tried to make a minimal test-case for this (based on the m2r
example), which can be triggered on my machine by running the following:
# mkdir run logs
# m2sh load
# m2sh start -every
# ruby handler.rb
# httperf --hog --wsess=750,5,5 --rate 750 --server localhost --port 8080
I'm not sure if the 750 figure above is load-dependent, so it might need
adjustment for different machines. The versions of relevant software I'm
running are:
Linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64
ruby 1.9.2p136
ffi-rzmq (0.8.2)
m2r (0.0.3)
Mongrel2/1.7.5
zeromq-2.1.10
And the config / handler files are:
--- mongrel2.conf ---
default_handler = Handler(send_spec='ipc://run/mongrel_send',
send_ident='MONGREL2',
recv_spec='ipc://run/mongrel_recv',
recv_ident='')
server_localhost = Server(
uuid="e6145e04-a663-4b1e-907c-417a7f4c5487",
access_log="/logs/mongrel2_localhost_access_8002.log",
error_log="/logs/mongrel2_localhost_error_8002.log",
chroot='.',
default_host="(.+)",
name="localhost",
pid_file="/run/mongrel2.pid",
port=8080,
hosts = [
Host(name="(.+)", routes={
'/' : default_handler
})
]
)
settings = {"zeromq.threads": 1, "superpoll.max_fd": 1024}
servers = [server_localhost]
--- handler.rb ---
require 'rubygems'
require 'm2r'
class Http0MQHandler < Mongrel2::Handler
def on_disconnect
puts "DISCONNECT"
end
def process(req)
response = "<pre>\nSENDER:
%s\nIDENT:%s\nPATH:%s\nHEADERS:%s\nBODY:%s</pre>" % [
req.sender.inspect, req.conn_id.inspect, req.path.inspect,
JSON.generate(req.headers).inspect, req.body.inspect]
#puts response
response
end
end
sender_id = "C2256F34-14A1-45DD-BB73-97CAE25E25B4"
handler = Http0MQHandler.new(sender_id, "ipc://run/mongrel_send",
"ipc://run/mongrel_recv")
handler.listen