librelist archives

« back to archive

flask and multithreading

flask and multithreading

From:
John Fries
Date:
2011-11-03 @ 08:16
I'm running a very basic Flask setup with Apache as my webserver. I
have a python list that lives at the global level (aka at the same
nesting level as my function declarations). Looks like this:

mylist = []
app.route('/append_to_mylist')
def append_to_mylist():
  mylist.append(1)
  return str(mylist)


So as http://www.mysite.com/append_to_mylist is invoked, it should
return a growing list of ones (unless the server is restarted, in
which case I am back to the beginning).

I'm trying to understand if I'm inadvertently screwing myself in the
case where my Flask setup is multi-threaded and mylist could somehow
be accessed by more than one thread, potentially corrupting it. Is
there some other construct I should be using to handle that situation?
I know I could stick mylist in memcached, but that seems to be
overkill for my needs.

I'm not even sure how to tell if my configuration is multi-threaded.
The only part that mentioned threads during configuration was in my
httpd.conf file:
   WSGIDaemonProcess jomit user=ubuntu group=ubuntu threads=5

This seems to indicate that Apache might start as many as 5 WSGI
threads, so I guess my code is actually multi-threaded?

Thanks and apologies for the n00b questions,
John

Re: [flask] flask and multithreading

From:
Simon Sapin
Date:
2011-11-03 @ 13:06
Le 03/11/2011 09:16, John Fries a écrit :
> I'm trying to understand if I'm inadvertently screwing myself in the
> case where my Flask setup is multi-threaded and mylist could somehow
> be accessed by more than one thread, potentially corrupting it. Is
> there some other construct I should be using to handle that situation?
> I know I could stick mylist in memcached, but that seems to be
> overkill for my needs.
>
> I'm not even sure how to tell if my configuration is multi-threaded.
> The only part that mentioned threads during configuration was in my
> httpd.conf file:
>     WSGIDaemonProcess jomit user=ubuntu group=ubuntu threads=5

Hi,

In short: don’t do that. Use some kind of shared data store to keep data 
across requests and clients.

This configuration does use multiple threads, but the Python GIL (global 
interpreter lock) makes sure that eg. a list will never get corrupted. 
However more complex code mail fail in subtle ways. For example, 
some_global += 1 is actually three operations: read, increment and 
write. Each of these is atomic but the thread may be interrupted 
in-between. If another thread changes the value between a read and the 
matching write, you get incorrect results. So in general you should use 
locks to protect relevant code areas:

http://docs.python.org/library/threading.html#lock-objects

However using global state is considered a bad idea anyway. This seems 
not to be the case with the configuration you pasted, but if your server 
has multiples processes, they will each have their own version of the 
list. Each process does not know about the others.

Regards,
-- 
Simon Sapin

Re: [flask] flask and multithreading

From:
Matthew Frazier
Date:
2011-11-03 @ 13:17
On Nov 3, 2011, at 9:06 , Simon Sapin wrote:

> Le 03/11/2011 09:16, John Fries a écrit :
>> I'm trying to understand if I'm inadvertently screwing myself in the
>> case where my Flask setup is multi-threaded and mylist could somehow
>> be accessed by more than one thread, potentially corrupting it. Is
>> there some other construct I should be using to handle that situation?
>> I know I could stick mylist in memcached, but that seems to be
>> overkill for my needs.
> 
> Hi,
> 
> In short: don’t do that. Use some kind of shared data store to keep data 
> across requests and clients.

I would recommend Redis (http://redis.io/, use the redis-py library at 
https://github.com/andymccurdy/redis-py) for this. It's easy to set up, 
and has really fast operations on lists, sets, counters, etc. etc. 
Persistent too.

Thanks,
Matthew Frazier
http://leafstorm.us/

Re: [flask] flask and multithreading

From:
Joe Esposito
Date:
2011-11-03 @ 14:46
+ for Redis. It has great list support and is surprisingly easy to begin
using it.

On Thu, Nov 3, 2011 at 9:17 AM, Matthew Frazier <leafstormrush@gmail.com>wrote:

> On Nov 3, 2011, at 9:06 , Simon Sapin wrote:
>
> > Le 03/11/2011 09:16, John Fries a écrit :
> >> I'm trying to understand if I'm inadvertently screwing myself in the
> >> case where my Flask setup is multi-threaded and mylist could somehow
> >> be accessed by more than one thread, potentially corrupting it. Is
> >> there some other construct I should be using to handle that situation?
> >> I know I could stick mylist in memcached, but that seems to be
> >> overkill for my needs.
> >
> > Hi,
> >
> > In short: don’t do that. Use some kind of shared data store to keep data
> > across requests and clients.
>
> I would recommend Redis (http://redis.io/, use the redis-py library at
> https://github.com/andymccurdy/redis-py) for this. It's easy to set up,
> and has really fast operations on lists, sets, counters, etc. etc.
> Persistent too.
>
> Thanks,
> Matthew Frazier
> http://leafstorm.us/
>
>

Re: [flask] flask and multithreading

From:
Cheng-Han Lee
Date:
2011-11-03 @ 20:10
Redis is a great solution.

You can also use mongodb, as it has atomic operations.
http://www.mongodb.org/display/DOCS/Atomic+Operations

-Jonathan

On Thu, Nov 3, 2011 at 7:46 AM, Joe Esposito <espo58@gmail.com> wrote:

> + for Redis. It has great list support and is surprisingly easy to begin
> using it.
>
>
> On Thu, Nov 3, 2011 at 9:17 AM, Matthew Frazier <leafstormrush@gmail.com>wrote:
>
>> On Nov 3, 2011, at 9:06 , Simon Sapin wrote:
>>
>> > Le 03/11/2011 09:16, John Fries a écrit :
>> >> I'm trying to understand if I'm inadvertently screwing myself in the
>> >> case where my Flask setup is multi-threaded and mylist could somehow
>> >> be accessed by more than one thread, potentially corrupting it. Is
>> >> there some other construct I should be using to handle that situation?
>> >> I know I could stick mylist in memcached, but that seems to be
>> >> overkill for my needs.
>> >
>> > Hi,
>> >
>> > In short: don’t do that. Use some kind of shared data store to keep data
>> > across requests and clients.
>>
>> I would recommend Redis (http://redis.io/, use the redis-py library at
>> https://github.com/andymccurdy/redis-py) for this. It's easy to set up,
>> and has really fast operations on lists, sets, counters, etc. etc.
>> Persistent too.
>>
>> Thanks,
>> Matthew Frazier
>> http://leafstorm.us/
>>
>>
>

Re: [flask] flask and multithreading

From:
John Fries
Date:
2011-11-04 @ 19:21
I'm less concerned about inconsistency between processes/threads than
I am in the global list just getting flat-out corrupted (although
Simon says that the GIL will protect me from that).

I understand that redis or mongodb as an off-process atomic cache is a
natural solution for this problem. However, my concern is performance.
It seems that even an ideal atomic store is going to take at least
100ms round-trip, so it seems inefficient to cache data there without
first checking some smaller in-process cache. Does anyone see a flaw
in my reasoning in the case where eventual consistency between
processes is acceptable? It seems surprising to me that this is not a
more common pattern.

On Thu, Nov 3, 2011 at 1:10 PM, Cheng-Han Lee <lee.chenghan@gmail.com> wrote:
> Redis is a great solution.
>
> You can also use mongodb, as it has atomic operations.
> http://www.mongodb.org/display/DOCS/Atomic+Operations
>
> -Jonathan
>
> On Thu, Nov 3, 2011 at 7:46 AM, Joe Esposito <espo58@gmail.com> wrote:
>>
>> + for Redis. It has great list support and is surprisingly easy to begin
>> using it.
>>
>> On Thu, Nov 3, 2011 at 9:17 AM, Matthew Frazier <leafstormrush@gmail.com>
>> wrote:
>>>
>>> On Nov 3, 2011, at 9:06 , Simon Sapin wrote:
>>>
>>> > Le 03/11/2011 09:16, John Fries a écrit :
>>> >> I'm trying to understand if I'm inadvertently screwing myself in the
>>> >> case where my Flask setup is multi-threaded and mylist could somehow
>>> >> be accessed by more than one thread, potentially corrupting it. Is
>>> >> there some other construct I should be using to handle that situation?
>>> >> I know I could stick mylist in memcached, but that seems to be
>>> >> overkill for my needs.
>>> >
>>> > Hi,
>>> >
>>> > In short: don’t do that. Use some kind of shared data store to keep
>>> > data
>>> > across requests and clients.
>>>
>>> I would recommend Redis (http://redis.io/, use the redis-py library at
>>> https://github.com/andymccurdy/redis-py) for this. It's easy to set up, and
>>> has really fast operations on lists, sets, counters, etc. etc. Persistent
>>> too.
>>>
>>> Thanks,
>>> Matthew Frazier
>>> http://leafstorm.us/
>>>
>>
>
>

Re: [flask] flask and multithreading

From:
Luca Lesinigo
Date:
2011-11-04 @ 22:23
Il giorno 04/nov/2011, alle ore 20:21, John Fries ha scritto:
> It seems that even an ideal atomic store is going to take at least 100ms
round-trip,
Where does that number come from?

> so it seems inefficient to cache data there without first checking some 
smaller in-process cache.
I'd guess (no scientific reasoning here, you've been warned) that 
interacting with a well written daemon on the same system will be faster 
than waiting on a poorly written atomic store in the same process.

Anyway, just in case you haven't noticed, Apache and mod_wsgi themselves 
are using multiple processes and they pass around requests - so it mustn't
be that slow. Your http request will probably come in to an apache 
process, which pass the request to another apache process (the prefork mpm
child), which pass the request to another process (the mod_wsgi 
DaemonProcess you mentioned in your first post).

> I'm not even sure how to tell if my configuration is multi-threaded. The
only part that mentioned threads during configuration was in my httpd.conf
file:
>    WSGIDaemonProcess jomit user=ubuntu group=ubuntu threads=5
Your code is indeed multithreaded as you noted. And you have no choice 
about it, because if you switch to the multiprocess model (setting the 
'processes' parameter to the WSGIDaemonProcess directive) your code would 
screw up as different processes would have different sets of global data 
(ie, no shared lists or anything between them) so different http request 
would be answered by processes of your app with different internal state.

Your only choices are:
1- use an external store as already suggested
2- strictly stick to the single process mod_wsgi mode and implement a good
multithreaded store (maybe look out for existing implementation, there 
must be something somewhere).

--
Luca Lesinigo

Re: [flask] flask and multithreading

From:
John Fries
Date:
2011-11-04 @ 23:09
hmm, your feedback is sinking in now


http://www.quora.com/What-are-the-numbers-that-every-computer-engineer-should-know-according-to-Jeff-Dean

I'm used to dealing with key-value stores that are across datacenters,
so mentally I budget about 100 to 150ms for these. But looking at it
more closely, I realize that most of this is round-trip time, and
within a datacenter we could probably get something close to 1ms
latency (half a ms for round-trip time between machines, plus however
much time redis needs to do its thing).

Is 1ms latency in line with what people are seeing on their redis or
memcached installations? If so, I don't think I will miss having an
in-process cache, if it ends up causing me more concurrency headaches.

On Fri, Nov 4, 2011 at 3:23 PM, Luca Lesinigo <luca@lesinigo.it> wrote:
> Il giorno 04/nov/2011, alle ore 20:21, John Fries ha scritto:
>> It seems that even an ideal atomic store is going to take at least 
100ms round-trip,
> Where does that number come from?
>
>> so it seems inefficient to cache data there without first checking some
smaller in-process cache.
> I'd guess (no scientific reasoning here, you've been warned) that 
interacting with a well written daemon on the same system will be faster 
than waiting on a poorly written atomic store in the same process.
>
> Anyway, just in case you haven't noticed, Apache and mod_wsgi themselves
are using multiple processes and they pass around requests - so it mustn't
be that slow. Your http request will probably come in to an apache 
process, which pass the request to another apache process (the prefork mpm
child), which pass the request to another process (the mod_wsgi 
DaemonProcess you mentioned in your first post).
>
>> I'm not even sure how to tell if my configuration is multi-threaded. 
The only part that mentioned threads during configuration was in my 
httpd.conf file:
>>    WSGIDaemonProcess jomit user=ubuntu group=ubuntu threads=5
> Your code is indeed multithreaded as you noted. And you have no choice 
about it, because if you switch to the multiprocess model (setting the 
'processes' parameter to the WSGIDaemonProcess directive) your code would 
screw up as different processes would have different sets of global data 
(ie, no shared lists or anything between them) so different http request 
would be answered by processes of your app with different internal state.
>
> Your only choices are:
> 1- use an external store as already suggested
> 2- strictly stick to the single process mod_wsgi mode and implement a 
good multithreaded store (maybe look out for existing implementation, 
there must be something somewhere).
>
> --
> Luca Lesinigo
>
>

Re: [flask] flask and multithreading

From:
Cheng-Han Lee
Date:
2011-11-04 @ 20:38
If you are set on using a native python list, you need to make sure the
critical sections of your code are wrapped around some synchronization
primitives (e.g. semaphores or mutex), so those sections are ensured to
execute atomically.

But you'll run into a whole new set of issues when writing synchronized
code.  You need to account for issues such as deadlocks, thread starvation,
and such. Not to mention, debugging multi-threaded code can be a painful
experience.

Hope this helps.

On Fri, Nov 4, 2011 at 12:21 PM, John Fries <john.a.fries@gmail.com> wrote:

> I'm less concerned about inconsistency between processes/threads than
> I am in the global list just getting flat-out corrupted (although
> Simon says that the GIL will protect me from that).
>
> I understand that redis or mongodb as an off-process atomic cache is a
> natural solution for this problem. However, my concern is performance.
> It seems that even an ideal atomic store is going to take at least
> 100ms round-trip, so it seems inefficient to cache data there without
> first checking some smaller in-process cache. Does anyone see a flaw
> in my reasoning in the case where eventual consistency between
> processes is acceptable? It seems surprising to me that this is not a
> more common pattern.
>
> On Thu, Nov 3, 2011 at 1:10 PM, Cheng-Han Lee <lee.chenghan@gmail.com>
> wrote:
> > Redis is a great solution.
> >
> > You can also use mongodb, as it has atomic operations.
> > http://www.mongodb.org/display/DOCS/Atomic+Operations
> >
> > -Jonathan
> >
> > On Thu, Nov 3, 2011 at 7:46 AM, Joe Esposito <espo58@gmail.com> wrote:
> >>
> >> + for Redis. It has great list support and is surprisingly easy to begin
> >> using it.
> >>
> >> On Thu, Nov 3, 2011 at 9:17 AM, Matthew Frazier <
> leafstormrush@gmail.com>
> >> wrote:
> >>>
> >>> On Nov 3, 2011, at 9:06 , Simon Sapin wrote:
> >>>
> >>> > Le 03/11/2011 09:16, John Fries a écrit :
> >>> >> I'm trying to understand if I'm inadvertently screwing myself in the
> >>> >> case where my Flask setup is multi-threaded and mylist could somehow
> >>> >> be accessed by more than one thread, potentially corrupting it. Is
> >>> >> there some other construct I should be using to handle that
> situation?
> >>> >> I know I could stick mylist in memcached, but that seems to be
> >>> >> overkill for my needs.
> >>> >
> >>> > Hi,
> >>> >
> >>> > In short: don’t do that. Use some kind of shared data store to keep
> >>> > data
> >>> > across requests and clients.
> >>>
> >>> I would recommend Redis (http://redis.io/, use the redis-py library at
> >>> https://github.com/andymccurdy/redis-py) for this. It's easy to set
> up, and
> >>> has really fast operations on lists, sets, counters, etc. etc.
> Persistent
> >>> too.
> >>>
> >>> Thanks,
> >>> Matthew Frazier
> >>> http://leafstorm.us/
> >>>
> >>
> >
> >
>

Re: [flask] flask and multithreading

From:
Simon Sapin
Date:
2011-11-04 @ 19:35
Le 04/11/2011 20:21, John Fries a écrit :
> However, my concern is performance.

Measure, measure, measure.