librelist archives

« back to archive

Mongrel2 corruption

Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-18 @ 02:16
hi,

I use mongrel2 1.7.5 for some time. I find it is not very stable. It will
disappear (corrupted?) some times when in our developement. And I check the
access.log and error.log, but find nothing about this kind of corruption.

My configration is :

handler_test = Handler(send_spec='tcp://127.0.0.1:9999',
                send_ident='e884a439-31be-4f74-8050-a93565795b20',
                recv_spec='tcp://127.0.0.1:9998', recv_ident='')

server1 = Server(
    uuid="505417b8-1de4-454f-98b6-07eb9225cca1"
    access_log="/logs/access.log"
    error_log="/logs/error.log"
    chroot="./"
    pid_file="/run/mongrel2.pid"
    default_host="xxxxx"
    name="server1"
    port=80
    hosts=[
        Host(   name="xxxxx",
                matching="xxxxx.com",
                routes={
                    '/': handler_xxxxx,
                    '/favicon.ico': static_xxxxx,
                    '/media/': static_xxxxx
                }
        )
     ]
)
settings = {    "zeromq.threads": 1,
        'limits.content_length': 20971520,
        'upload.temp_store': '/tmp/mongrel2.upload.XXXXXX'
}
servers = [server1]



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Zed A. Shaw
Date:
2011-08-18 @ 23:10
On Thu, Aug 18, 2011 at 10:16:11AM +0800, Tang Daogang wrote:
> hi,
> 
> I use mongrel2 1.7.5 for some time. I find it is not very stable. It will
> disappear (corrupted?) some times when in our developement. And I check the
> access.log and error.log, but find nothing about this kind of corruption.

I've been doing a bunch of code reviews and found a few bad spots where
it could corrupt ram if there were some things with routes.  Can you try
the develop branch on github and see if that works better?

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Mongrel2 corruption

From:
Nathan Duran
Date:
2011-08-18 @ 03:11
Pretty sure you're going to need to define what "corruption" means to you 
before anyone can even hazard a guess here. 

Re: [mongrel2] Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-18 @ 03:39
I'm sorry, my original meaning is 'breakdown', I made a mistake to
'corruption'  -_-!!!

On Thu, Aug 18, 2011 at 11:11 AM, Nathan Duran <principal@khiltd.com> wrote:

> Pretty sure you're going to need to define what "corruption" means to you
> before anyone can even hazard a guess here.
>



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Victor Young
Date:
2011-08-18 @ 03:35
I guess it means "crash"...

2011/8/18 Nathan Duran <principal@khiltd.com>

> Pretty sure you're going to need to define what "corruption" means to you
> before anyone can even hazard a guess here.
>

Re: [mongrel2] Mongrel2 corruption

From:
Josh Simmons
Date:
2011-08-18 @ 03:41
On Thu, Aug 18, 2011 at 1:35 PM, Victor Young <littlehaker@gmail.com> wrote:
> I guess it means "crash"...
>
> 2011/8/18 Nathan Duran <principal@khiltd.com>
>>
>> Pretty sure you're going to need to define what "corruption" means to you
>> before anyone can even hazard a guess here.
>
>

Crash though is not very helpful either, is there any chance you could
run mongrel under gdb and try and get a stack trace when it fails? We
can provide instruction on how to do this if necessary?

Are there any particular events that seem to precipitate the crash?
You say there's nothing in the logs, however it might be useful to see
the activity before the failure at any rate in case there is some
pattern showing?

Really as much information as you can give us is likely to be helpful
in tracking down your issue.

Cheers,
Josh.

Re: [mongrel2] Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-18 @ 10:33
Thanks your suggestion. I will use gdb to run it.

But before that, I have found another clue on it: when my handler crashed,
mongrel2 crashed at the same time. But not always like this.

I met this case once today.


>
> Crash though is not very helpful either, is there any chance you could
> run mongrel under gdb and try and get a stack trace when it fails? We
> can provide instruction on how to do this if necessary?
>
> Are there any particular events that seem to precipitate the crash?
> You say there's nothing in the logs, however it might be useful to see
> the activity before the failure at any rate in case there is some
> pattern showing?
>
> Really as much information as you can give us is likely to be helpful
> in tracking down your issue.
>
> Cheers,
> Josh.
>



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Zed A. Shaw
Date:
2011-08-18 @ 23:12
On Thu, Aug 18, 2011 at 06:33:58PM +0800, Tang Daogang wrote:
> Thanks your suggestion. I will use gdb to run it.
> 
> But before that, I have found another clue on it: when my handler crashed,
> mongrel2 crashed at the same time. But not always like this.

ZeroMQ! Whenever that's happened to me it's been because I upgraded
zeromq, but didn't update *all* of the other libraries that use it.
Like I updated to 2.1.7 but didn't rebuild the python and lua bindings.

This causes incredibly weird crashes like, if a handler explodes it
takes everything down on other servers.

Also, do *not* use 2.1.6, it caused this same error over and over.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-19 @ 03:42
hi, shaw, my ZeroMQ version is 2.1.7. Let me see how to do next test.

Can you tell me how to run mongrel2 using gdb?

On Fri, Aug 19, 2011 at 7:12 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> On Thu, Aug 18, 2011 at 06:33:58PM +0800, Tang Daogang wrote:
> > Thanks your suggestion. I will use gdb to run it.
> >
> > But before that, I have found another clue on it: when my handler
> crashed,
> > mongrel2 crashed at the same time. But not always like this.
>
> ZeroMQ! Whenever that's happened to me it's been because I upgraded
> zeromq, but didn't update *all* of the other libraries that use it.
> Like I updated to 2.1.7 but didn't rebuild the python and lua bindings.
>
> This causes incredibly weird crashes like, if a handler explodes it
> takes everything down on other servers.
>
> Also, do *not* use 2.1.6, it caused this same error over and over.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Zed A. Shaw
Date:
2011-08-19 @ 21:10
On Fri, Aug 19, 2011 at 11:42:42AM +0800, Tang Daogang wrote:
> hi, shaw, my ZeroMQ version is 2.1.7. Let me see how to do next test.

So, make sure you reinstall all the Lua and Python ZMQ libraries,
probably by removing them entirely and then installing them again to
make sure.

> Can you tell me how to run mongrel2 using gdb?

Easiest is to start mongrel2 like normal, then attach to the process:

http://ftp.gnu.org/old-gnu/Manuals/gdb-5.1.1/html_node/gdb_22.html

Another options, which may be better, is to tell Linux to dump a core
file:

http://stackoverflow.com/questions/17965/generate-a-core-dump-in-linux

That way, when it dies, you won't block inside gdb.  Instead you wait
for it to crash, then open the core file in gdb and do the backtrace:

gdb /usr/local/bin/mongrel2 corefile

Then inside gdb do:

backtrace

And send me that.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-20 @ 04:23
ulimit -c unlimited
will generate a coredump file? or use gcore?

On Sat, Aug 20, 2011 at 5:10 AM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> On Fri, Aug 19, 2011 at 11:42:42AM +0800, Tang Daogang wrote:
> > hi, shaw, my ZeroMQ version is 2.1.7. Let me see how to do next test.
>
> So, make sure you reinstall all the Lua and Python ZMQ libraries,
> probably by removing them entirely and then installing them again to
> make sure.
>
> > Can you tell me how to run mongrel2 using gdb?
>
> Easiest is to start mongrel2 like normal, then attach to the process:
>
> http://ftp.gnu.org/old-gnu/Manuals/gdb-5.1.1/html_node/gdb_22.html
>
> Another options, which may be better, is to tell Linux to dump a core
> file:
>
> http://stackoverflow.com/questions/17965/generate-a-core-dump-in-linux
>
> That way, when it dies, you won't block inside gdb.  Instead you wait
> for it to crash, then open the core file in gdb and do the backtrace:
>
> gdb /usr/local/bin/mongrel2 corefile
>
> Then inside gdb do:
>
> backtrace
>
> And send me that.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Zed A. Shaw
Date:
2011-08-20 @ 14:02
On Sat, Aug 20, 2011 at 12:23:39PM +0800, Tang Daogang wrote:
> ulimit -c unlimited
> will generate a coredump file? or use gcore?

Should indicate that it can generate core files.  Be careful though,
because you can generate enough to fill your disk if you're not careful.
I'd setup monit to watch the mongrel2 process and email you whenever it
dies so you can go clean it out.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-22 @ 08:28
hi, I find something new.

When our handler crashes (error in non-pretected mode), the console print
these:

    Bad file descriptor
    rc != -1 (epoll.cpp:110)
The version of mongrel is v1.7.3,  this case occured in my colleague's
mechine. I will update his environment to v1.7.5 developement branch later.


On Sat, Aug 20, 2011 at 10:02 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote:

> On Sat, Aug 20, 2011 at 12:23:39PM +0800, Tang Daogang wrote:
> > ulimit -c unlimited
> > will generate a coredump file? or use gcore?
>
> Should indicate that it can generate core files.  Be careful though,
> because you can generate enough to fill your disk if you're not careful.
> I'd setup monit to watch the mongrel2 process and email you whenever it
> dies so you can go clean it out.
>
> --
> Zed A. Shaw
> http://zedshaw.com/
>



-- 
Nothing is impossible.

Re: [mongrel2] Mongrel2 corruption

From:
Zed A. Shaw
Date:
2011-08-22 @ 16:36
On Mon, Aug 22, 2011 at 04:28:32PM +0800, Tang Daogang wrote:
> hi, I find something new.
> 
> When our handler crashes (error in non-pretected mode), the console print
> these:
> 
>     Bad file descriptor
>     rc != -1 (epoll.cpp:110)
> The version of mongrel is v1.7.3,  this case occured in my colleague's
> mechine. I will update his environment to v1.7.5 developement branch later.

Ok, try the development version, as there's been some fixing.  This bug
is related probably to closing a client connection.  See if I'm right
about that.

-- 
Zed A. Shaw
http://zedshaw.com/

Re: [mongrel2] Mongrel2 corruption

From:
Dalton Barreto
Date:
2011-08-18 @ 15:14
2011/8/18 Tang Daogang <daogangtang@gmail.com>:
> Thanks your suggestion. I will use gdb to run it.
>
> But before that, I have found another clue on it: when my handler crashed,
> mongrel2 crashed at the same time. But not always like this.
>
> I met this case once today.

Are you using more than one handler using the *same* ident when
connectiong to the zmq endpoint?

Would be possible to provide the relevand code of you handler that
makes the connection to the zmq endpoints?

Not so log ago we had here a thread [1] that touched this subject,
specifically this message by Ryan Kelly [2]. Maybe this is the same
problem you are facing.

Note: I'm having some trouble reading the archives, but the message at
[2] is the most important part and I could browse it. Don't know if is
a local problem or at librelist.com.

[1] 
http://librelist.com/browser//mongrel2/2011/1/24/how-to-safely-shut-down-a-handler/
[2] 
http://librelist.com/browser//mongrel2/2011/1/24/how-to-safely-shut-down-a-handler/#389f1ab7c2a9cc5e09fd1c40eee55f82

-- 
Dalton Barreto
http://daltonmatos.wordpress.com
http://wsgid.com

Re: Mongrel2 corruption

From:
Tang Daogang
Date:
2011-08-18 @ 02:18
And, the corruption period is about 6 hours ~ 3 days.

On Thu, Aug 18, 2011 at 10:16 AM, Tang Daogang <daogangtang@gmail.com>wrote:

> hi,
>
> I use mongrel2 1.7.5 for some time. I find it is not very stable. It will
> disappear (corrupted?) some times when in our developement. And I check the
> access.log and error.log, but find nothing about this kind of corruption.
>
> My configration is :
>
> handler_test = Handler(send_spec='tcp://127.0.0.1:9999',
>                 send_ident='e884a439-31be-4f74-8050-a93565795b20',
>                 recv_spec='tcp://127.0.0.1:9998', recv_ident='')
>
> server1 = Server(
>     uuid="505417b8-1de4-454f-98b6-07eb9225cca1"
>     access_log="/logs/access.log"
>     error_log="/logs/error.log"
>     chroot="./"
>     pid_file="/run/mongrel2.pid"
>     default_host="xxxxx"
>     name="server1"
>     port=80
>     hosts=[
>         Host(   name="xxxxx",
>                 matching="xxxxx.com",
>                 routes={
>                     '/': handler_xxxxx,
>                     '/favicon.ico': static_xxxxx,
>                     '/media/': static_xxxxx
>                 }
>         )
>      ]
> )
> settings = {    "zeromq.threads": 1,
>         'limits.content_length': 20971520,
>         'upload.temp_store': '/tmp/mongrel2.upload.XXXXXX'
> }
> servers = [server1]
>
>
>
> --
> Nothing is impossible.
>
>


-- 
Nothing is impossible.