librelist archives

« back to archive

Invalid transaction persisting across requests

Invalid transaction persisting across requests

From:
Matt Phipps
Date:
2014-04-28 @ 15:27
I've also posted this at Stack Overflow (

http://stackoverflow.com/questions/23301968/invalid-transaction-persisting-across-requests)
but it seems like a problem that would need Flask specialist knowledge.
This is a weird one, guys...

Summary
=======
One of our threads in production hit an error and is now producing
`InvalidRequestError: This session is in 'prepared' state; no further SQL
can be emitted within this transaction.` errors, on every request with a
query that it serves, for the rest of its life! It's been doing this for
*days*, now! How is this possible, and how can we prevent it going forward?

Background
==========
We are using a Flask app on uWSGI (4 processes, 2 threads), with
Flask-SQLAlchemy providing us DB connections to SQL Server.

The problem seemed to start when one of our threads in production was
tearing down its request, inside this Flask-SQLAlchemy method:

    @teardown
    def shutdown_session(response_or_exc):
        if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
            if response_or_exc is None:
                self.session.commit()
        self.session.remove()
        return response_or_exc

...and somehow managed to call `self.session.commit()` when the transaction
was invalid. This resulted in `sqlalchemy.exc.InvalidRequestError: Can't
reconnect until invalid transaction is rolled back` getting output to
stdout, in defiance of our logging configuration, which makes sense since
it happened during the app context tearing down, which is never supposed to
raise exceptions. I'm not sure *how* the transaction got to be invalid
without `response_or_exec` getting set, but that's actually the lesser
problem AFAIK.

The bigger problem is, that's when the "'prepared' state" errors started,
and haven't stopped since. Every time this thread serves a request that
hits the DB, it 500s. Every other thread seems to be fine: as far as I can
tell, even the thread that's in the same process is doing OK.

Wild guess
==========
The SQLAlchemy mailing list has an entry about the "'prepared' state" error
saying it happens if a session started committing and hasn't finished yet,
and something else tries to use it. My guess is that the session in this
thread never got to the `self.session.remove()` step, and now it never will.

I still feel like that doesn't explain how this session is persisting
*across requests* though. We haven't modified Flask-SQLAlchemy's use of
request-scoped sessions, so the session should get returned to SQLAlchemy's
pool and rolled back at the end of the request, even the ones that are
erroring (though admittedly, probably not the first one, since that raised
during the app context tearing down). Why are the rollbacks not happening?
I could understand it if we were seeing the "invalid transaction" errors on
stdout (in uwsgi's log) every time, but we're not: I only saw it once, the
first time. But I see the "'prepared' state" error (in our app's log) every
time the 500s occur.

Configuration details
=====================
We've turned off `expire_on_commit` in the `session_options`, and we've
turned on `SQLALCHEMY_COMMIT_ON_TEARDOWN`. We're only reading from the
database, not writing yet. We're also using Dogpile-Cache for all of our
queries (using the memcached lock since we have multiple processes, and
actually, 2 load-balanced servers). The cache expires every minute for our
major query.

Resolution steps
============
Restarting the server seems to have fixed the problem, which isn't entirely
surprising. That said, I expect to see it again until we figure out how to
stop it. Someone suggested writing our own teardown callback with exception
handling around the commit, but I feel like the bigger problem is that the
thread was messed up for the rest of its life. The fact that this *didn't*
go away after a request or two really makes me nervous!

Thanks,
Matt Phipps