librelist archives

« back to archive

Threading/Tasking with Flask

Threading/Tasking with Flask

From:
Jan Riechers
Date:
2012-08-19 @ 09:01
Hello,

I posted this message already on google groups, but Steven pointed me 
here (thx again for the hints and tips in your previous post on the 
google groups).
--------------------------------------------------

I am currently developing a quite big web application using Flask, but 
have concerns about the tasking in general.

At present the setup looks like the following:
Webserver: nginx
Routing: werkzeug
Templating: jinja2 (what else? )
Database: mongoDB
Python-Interpreter: pypy 1.9
wsgi-middleware: none at present, later uWSGI

My question arises from the fact that I can't make usage of 
greenlet/eventlet using pypy, but Im unsure if I can maintain 
availability for the webusers running only the above setup.

I read in the meanwhile that pypy implements Greenlet as a spin of from 
Stackless Python. But I haven't found a way to implement this in the 
application.

As for the current state the application runs solely without any type of 
threading, but this will of course be required.

I have all application logic done, meaning that each route and routes 
(called handles) do the logic to communicate with the mongoDB backend, 
aggregate data and processing the results, then returning it to the user.

The question is, how can I make this logic multi-threaded or better, how 
can I serve more then one request at a time without blocking the 
interpreter when a request/query from the database to the user is beeing 
processed to serve required information for the display?

I also found out of several message brokers, like Celery (and Steven 
recommend me also the following information) as I have some i/o which 
look up my system (at present I develop on Windows 7, but I want to 
switch to a Linux system for production, in reason of non-blocking file 
input/output.

QUOTE from Steven -------------------------------------------------
Celery is one system that does this, but there are others. I have used
pyres in the past with flask and really liked how it worked:

https://github.com/binarydud/pyres/

This looked interesting as well: http://python-rq.org/

There's also this flask snippet: http://flask.pocoo.org/snippets/73/
-------------------------------------------------


Celery does support pypy (pypy which I have choosen for speed reasons) 
and mongoDB as a message broker/task scheduler database, pyres also 
does, but it only does work with Redis at present - if possible I would 
like to avoid using another database in the system as I have quite good 
knowledge of mongoDB and integrating another DB might leverage the 
difficulty implementing the overwhole system.

On one hand, I might need a messaging system, as I have 2-3 task which 
would run better scheduled, but perhaps this can also be archived using 
the pypy implementation of greenlet only?

As for now the main issues which locked my interpreter are:
File storage (upload and post-processing for thumbing of images in a 
separate module) and calculation of several values of database data.

Has anyone suggestions on how I can avoid locking of the interpreter on 
the route(s) / routing level of the application in order to acquire and 
serve data for the templating in and output?

To give an example how it looks like (also pseudocode):

@app.route('/handle/getAUser')
def handles_getUser():
     data = getUserInfoFromdatabase(database);
     output = processTheInput(input, data)
     return render_template('gotUser.html', input=output)

other functions look like this (i/o problems):

@app.route('/handleFormData')
def handles_processForm():
     for file in request.files:
         storeTheImageOneTime()
         processTheSavedImageAndCreateThumbnail1()
	processTheSavedImageAndCreateThumbnail2()
	processTheSavedImageAndCreateThumbnailX()

     saveInformationToDatabase();

     return render_template('success.html')



I only make use of Flask without any additional modules at present, so 
no caching or similar is involved.

So at the bottom line, the routing functions and immediate returns block 
each other without any "threading" of any kind involved - but this 
shouldn't be the case in the production environment with, lets say, 10k 
users or more - and I assume that there will be traffic in this starting 
scale.

Jan