librelist archives

« back to archive

Beginner's question

Beginner's question

From:
Garrett Robinson
Date:
2011-09-03 @ 05:10
I'm working on a site that will have a queue for transferring files
uploaded by its users, to better control bandwidth. I'm wondering how
to design the queue.

My thoughts are:
1) keep track of it inside the flask app using Python's Queue; but
then i don't know how to poll it regularly because everything in Flask
is request-driven
2) use a file or database to keep track of the queue, then write a
separate python script to manage it, and run it regularly with cron

Then is there anything special about accessing the file that I need to
know re: Flask, multithreading, etc.? Sorry if this is a n00b
question, I don't know much about multithreading/file access issues -
just enough to know that it could be a problem : )

I'd really appreciate any help or guidance.

Re: [flask] Beginner's question

From:
Karsten Hoffrath
Date:
2011-09-05 @ 09:37
I did something similar: the user uploads a file, the file get processed on one of
multiple nodes and the result gets back to the user.
Redis is used for queueing and caching.

It basically works like this:

- the user uploads a file
- the flask process saves the file under a unipue name in an upload directory and
puts the filename to a queue in Redis
- a python process watches this queue and processes the file (in your case
it would move
the file to another server).
- during the process the state of the file is updated in another Redis database
- the website polls the database for the file state and shows the progress
to the user.


Am 03.09.2011 07:10, schrieb Garrett Robinson:
> I'm working on a site that will have a queue for transferring files
> uploaded by its users, to better control bandwidth. I'm wondering how
> to design the queue.
> 
> My thoughts are:
> 1) keep track of it inside the flask app using Python's Queue; but
> then i don't know how to poll it regularly because everything in Flask
> is request-driven
> 2) use a file or database to keep track of the queue, then write a
> separate python script to manage it, and run it regularly with cron
> 
> Then is there anything special about accessing the file that I need to
> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> question, I don't know much about multithreading/file access issues -
> just enough to know that it could be a problem : )
> 
> I'd really appreciate any help or guidance.

Re: [flask] Beginner's question

From:
Rodrigo Aliste P.
Date:
2011-09-03 @ 05:25
AFAIK file uploading works in a per-request basis, so I think at best you
could have more than 1 server and load balance uploads betwen them.

For post processing or moving the files I would encourage you to think of
other (nicer) options, such as Gearman <http://gearman.org/> or
Flask-Celery<https://github.com/ask/flask-celery/>
.

On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson <
garrett.f.robinson@gmail.com> wrote:

> I'm working on a site that will have a queue for transferring files
> uploaded by its users, to better control bandwidth. I'm wondering how
> to design the queue.
>
> My thoughts are:
> 1) keep track of it inside the flask app using Python's Queue; but
> then i don't know how to poll it regularly because everything in Flask
> is request-driven
> 2) use a file or database to keep track of the queue, then write a
> separate python script to manage it, and run it regularly with cron
>
> Then is there anything special about accessing the file that I need to
> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> question, I don't know much about multithreading/file access issues -
> just enough to know that it could be a problem : )
>
> I'd really appreciate any help or guidance.
>



-- 
Rodrigo Aliste P.

Re: [flask] Beginner's question

From:
Joe Esposito
Date:
2011-09-03 @ 14:45
The Django community has had some talk about using the multiprocessing
module (Python 2.6+). That would probably be the easiest thing to set up and
get you going since there's no external dependencies and it's all pure
Python code. I have yet to try it out but I also have an upcoming project
that requires extensive server-side processing.

I've tried Celery so far. If you can get it going, I think it'll be nice. I
had a lot of problems with it though. Seems to do too much, so it's hard to
find docs on what you actually want to do with it. Was also looking for an
alternative.

Anyone have any good experience with multiprocessing?

On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com> wrote:

> AFAIK file uploading works in a per-request basis, so I think at best you
> could have more than 1 server and load balance uploads betwen them.
>
> For post processing or moving the files I would encourage you to think of
> other (nicer) options, such as Gearman <http://gearman.org/> or
> Flask-Celery <https://github.com/ask/flask-celery/>.
>
> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson <
> garrett.f.robinson@gmail.com> wrote:
>
>> I'm working on a site that will have a queue for transferring files
>> uploaded by its users, to better control bandwidth. I'm wondering how
>> to design the queue.
>>
>> My thoughts are:
>> 1) keep track of it inside the flask app using Python's Queue; but
>> then i don't know how to poll it regularly because everything in Flask
>> is request-driven
>> 2) use a file or database to keep track of the queue, then write a
>> separate python script to manage it, and run it regularly with cron
>>
>> Then is there anything special about accessing the file that I need to
>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>> question, I don't know much about multithreading/file access issues -
>> just enough to know that it could be a problem : )
>>
>> I'd really appreciate any help or guidance.
>>
>
>
>
> --
> Rodrigo Aliste P.
>
>

Re: [flask] Beginner's question

From:
Garrett Robinson
Date:
2011-09-03 @ 14:50
Thanks for the replies. I agree, Celery seems like too much for me
too. All I want is to implement is a simple file queue to rate-limit
the transfer of files from the web server to a dedicated file server,
in a threadsafe way.

I will check out multiprocessing. Thanks!

On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
> The Django community has had some talk about using the multiprocessing
> module (Python 2.6+). That would probably be the easiest thing to set up and
> get you going since there's no external dependencies and it's all pure
> Python code. I have yet to try it out but I also have an upcoming project
> that requires extensive server-side processing.
> I've tried Celery so far. If you can get it going, I think it'll be nice. I
> had a lot of problems with it though. Seems to do too much, so it's hard to
> find docs on what you actually want to do with it. Was also looking for an
> alternative.
> Anyone have any good experience with multiprocessing?
>
> On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com> wrote:
>>
>> AFAIK file uploading works in a per-request basis, so I think at best you
>> could have more than 1 server and load balance uploads betwen them.
>> For post processing or moving the files I would encourage you to think of
>> other (nicer) options, such as Gearman or Flask-Celery.
>> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>> <garrett.f.robinson@gmail.com> wrote:
>>>
>>> I'm working on a site that will have a queue for transferring files
>>> uploaded by its users, to better control bandwidth. I'm wondering how
>>> to design the queue.
>>>
>>> My thoughts are:
>>> 1) keep track of it inside the flask app using Python's Queue; but
>>> then i don't know how to poll it regularly because everything in Flask
>>> is request-driven
>>> 2) use a file or database to keep track of the queue, then write a
>>> separate python script to manage it, and run it regularly with cron
>>>
>>> Then is there anything special about accessing the file that I need to
>>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>>> question, I don't know much about multithreading/file access issues -
>>> just enough to know that it could be a problem : )
>>>
>>> I'd really appreciate any help or guidance.
>>
>>
>>
>> --
>> Rodrigo Aliste P.
>>
>
>

Re: [flask] Beginner's question

From:
Cheng-Han Lee
Date:
2011-09-03 @ 21:12
Celery can be a bit difficult to setup.

I would also recommend beanstalkd. Its work-queue similar to that of celery,
but its easier to setup and use (at least from what I've read).

On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson <
garrett.f.robinson@gmail.com> wrote:

> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
>
> I will check out multiprocessing. Thanks!
>
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
> > The Django community has had some talk about using the multiprocessing
> > module (Python 2.6+). That would probably be the easiest thing to set up
> and
> > get you going since there's no external dependencies and it's all pure
> > Python code. I have yet to try it out but I also have an upcoming project
> > that requires extensive server-side processing.
> > I've tried Celery so far. If you can get it going, I think it'll be nice.
> I
> > had a lot of problems with it though. Seems to do too much, so it's hard
> to
> > find docs on what you actually want to do with it. Was also looking for
> an
> > alternative.
> > Anyone have any good experience with multiprocessing?
> >
> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
> wrote:
> >>
> >> AFAIK file uploading works in a per-request basis, so I think at best
> you
> >> could have more than 1 server and load balance uploads betwen them.
> >> For post processing or moving the files I would encourage you to think
> of
> >> other (nicer) options, such as Gearman or Flask-Celery.
> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
> >> <garrett.f.robinson@gmail.com> wrote:
> >>>
> >>> I'm working on a site that will have a queue for transferring files
> >>> uploaded by its users, to better control bandwidth. I'm wondering how
> >>> to design the queue.
> >>>
> >>> My thoughts are:
> >>> 1) keep track of it inside the flask app using Python's Queue; but
> >>> then i don't know how to poll it regularly because everything in Flask
> >>> is request-driven
> >>> 2) use a file or database to keep track of the queue, then write a
> >>> separate python script to manage it, and run it regularly with cron
> >>>
> >>> Then is there anything special about accessing the file that I need to
> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> >>> question, I don't know much about multithreading/file access issues -
> >>> just enough to know that it could be a problem : )
> >>>
> >>> I'd really appreciate any help or guidance.
> >>
> >>
> >>
> >> --
> >> Rodrigo Aliste P.
> >>
> >
> >
>

Re: [flask] Beginner's question

From:
Joe Esposito
Date:
2011-09-03 @ 23:05
Wow Beanstalk looks amazing. Going to try it out. Thanks for sharing!

Some observations:

   - beanstalkd is the external dependency. It's very lightweight though and
   has no configuration files. You simply run it as a
daemon<http://kr.github.com/beanstalkd/>
   .
   - There are many clients written for various languages. Two for Python,
   though beanstalkc <http://github.com/earl/beanstalkc/> is the more active
   one and is the only one on PyPi
   - The clients look incredibly simple to use. Simply import, call Connect
   using the host/port specified in the beanstalkd cmdline, then:
      - On the producer side, call "put" to create a job
      - On the consumer side, call "reserve" to safely get a job and
      "delete" the job is complete
   - You can have multiple named job queues (called "tubes")
   - Jobs can be persistent (configurable via beanstalkd cmdline)
   - All messages are strings. This shouldn't be a problem though since you
   should really use a DB for more complex communication.
   - There's no built-in mechanism for reporting progress, though using a
   tube and the job id, you can manually communicate the other

direction<http://groups.google.com/group/beanstalk-talk/browse_thread/thread/9039eb85e8cefba0/241a115af42d8d26?lnk=gst&q=progress#241a115af42d8d26>
   - There's no Windows support. Perhaps you could handle the
   beanstalkc.SocketError to call the job function directly when in debug mode
   if you're developing on Windows.

References:
Beanstalkd <http://kr.github.com/beanstalkd/>
Beanstalkc (Python client) <https://github.com/earl/beanstalkc>
Beanstalk FAQ <https://github.com/kr/beanstalkd/wiki/faq>
A simple 
tutorial<http://parand.com/say/index.php/2008/10/12/beanstalkd-python-basic-tutorial/>

On Sat, Sep 3, 2011 at 5:12 PM, Cheng-Han Lee <lee.chenghan@gmail.com>wrote:

> Celery can be a bit difficult to setup.
>
> I would also recommend beanstalkd. Its work-queue similar to that of
> celery, but its easier to setup and use (at least from what I've read).
>
>
> On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson <
> garrett.f.robinson@gmail.com> wrote:
>
>> Thanks for the replies. I agree, Celery seems like too much for me
>> too. All I want is to implement is a simple file queue to rate-limit
>> the transfer of files from the web server to a dedicated file server,
>> in a threadsafe way.
>>
>> I will check out multiprocessing. Thanks!
>>
>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> > The Django community has had some talk about using the multiprocessing
>> > module (Python 2.6+). That would probably be the easiest thing to set up
>> and
>> > get you going since there's no external dependencies and it's all pure
>> > Python code. I have yet to try it out but I also have an upcoming
>> project
>> > that requires extensive server-side processing.
>> > I've tried Celery so far. If you can get it going, I think it'll be
>> nice. I
>> > had a lot of problems with it though. Seems to do too much, so it's hard
>> to
>> > find docs on what you actually want to do with it. Was also looking for
>> an
>> > alternative.
>> > Anyone have any good experience with multiprocessing?
>> >
>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>> wrote:
>> >>
>> >> AFAIK file uploading works in a per-request basis, so I think at best
>> you
>> >> could have more than 1 server and load balance uploads betwen them.
>> >> For post processing or moving the files I would encourage you to think
>> of
>> >> other (nicer) options, such as Gearman or Flask-Celery.
>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>> >> <garrett.f.robinson@gmail.com> wrote:
>> >>>
>> >>> I'm working on a site that will have a queue for transferring files
>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>> >>> to design the queue.
>> >>>
>> >>> My thoughts are:
>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>> >>> then i don't know how to poll it regularly because everything in Flask
>> >>> is request-driven
>> >>> 2) use a file or database to keep track of the queue, then write a
>> >>> separate python script to manage it, and run it regularly with cron
>> >>>
>> >>> Then is there anything special about accessing the file that I need to
>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>> >>> question, I don't know much about multithreading/file access issues -
>> >>> just enough to know that it could be a problem : )
>> >>>
>> >>> I'd really appreciate any help or guidance.
>> >>
>> >>
>> >>
>> >> --
>> >> Rodrigo Aliste P.
>> >>
>> >
>> >
>>
>
>

Re: [flask] Beginner's question

From:
Garrett Robinson
Date:
2011-09-29 @ 21:11
Wow, thanks for all the responses, everybody. Sorry this is much later
on. I'm currently using beanstalkd, with the beanstalkc python
library. It is the perfect solution for what I'm doing. Thanks to Joe
for recommending it, and thanks to everyone else for all your
thoughtful suggestions!

If anyone is interested, I'm using beanstalkd in these projects:
https://github.com/handsomeransoms/haps (with Flask)
https://github.com/handsomeransoms/haps-hidserv (no Flask here)

On Sat, Sep 3, 2011 at 7:05 PM, Joe Esposito <espo58@gmail.com> wrote:
> Wow Beanstalk looks amazing. Going to try it out. Thanks for sharing!
> Some observations:
>
> beanstalkd is the external dependency. It's very lightweight though and has
> no configuration files. You simply run it as a daemon.
> There are many clients written for various languages. Two for Python,
> though beanstalkc is the more active one and is the only one on PyPi
> The clients look incredibly simple to use. Simply import, call Connect using
> the host/port specified in the beanstalkd cmdline, then:
>
> On the producer side, call "put" to create a job
> On the consumer side, call "reserve" to safely get a job and "delete" the
> job is complete
>
> You can have multiple named job queues (called "tubes")
> Jobs can be persistent (configurable via beanstalkd cmdline)
> All messages are strings. This shouldn't be a problem though since you
> should really use a DB for more complex communication.
> There's no built-in mechanism for reporting progress, though using a tube
> and the job id, you can manually communicate the other direction
> There's no Windows support. Perhaps you could handle the
> beanstalkc.SocketError to call the job function directly when in debug mode
> if you're developing on Windows.
>
> References:
> Beanstalkd
> Beanstalkc (Python client)
> Beanstalk FAQ
> A simple tutorial
> On Sat, Sep 3, 2011 at 5:12 PM, Cheng-Han Lee <lee.chenghan@gmail.com>
> wrote:
>>
>> Celery can be a bit difficult to setup.
>>
>> I would also recommend beanstalkd. Its work-queue similar to that of
>> celery, but its easier to setup and use (at least from what I've read).
>>
>> On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson
>> <garrett.f.robinson@gmail.com> wrote:
>>>
>>> Thanks for the replies. I agree, Celery seems like too much for me
>>> too. All I want is to implement is a simple file queue to rate-limit
>>> the transfer of files from the web server to a dedicated file server,
>>> in a threadsafe way.
>>>
>>> I will check out multiprocessing. Thanks!
>>>
>>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>>> > The Django community has had some talk about using the multiprocessing
>>> > module (Python 2.6+). That would probably be the easiest thing to set
>>> > up and
>>> > get you going since there's no external dependencies and it's all pure
>>> > Python code. I have yet to try it out but I also have an upcoming
>>> > project
>>> > that requires extensive server-side processing.
>>> > I've tried Celery so far. If you can get it going, I think it'll be
>>> > nice. I
>>> > had a lot of problems with it though. Seems to do too much, so it's
>>> > hard to
>>> > find docs on what you actually want to do with it. Was also looking for
>>> > an
>>> > alternative.
>>> > Anyone have any good experience with multiprocessing?
>>> >
>>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>>> > wrote:
>>> >>
>>> >> AFAIK file uploading works in a per-request basis, so I think at best
>>> >> you
>>> >> could have more than 1 server and load balance uploads betwen them.
>>> >> For post processing or moving the files I would encourage you to think
>>> >> of
>>> >> other (nicer) options, such as Gearman or Flask-Celery.
>>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>>> >> <garrett.f.robinson@gmail.com> wrote:
>>> >>>
>>> >>> I'm working on a site that will have a queue for transferring files
>>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>>> >>> to design the queue.
>>> >>>
>>> >>> My thoughts are:
>>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>>> >>> then i don't know how to poll it regularly because everything in
>>> >>> Flask
>>> >>> is request-driven
>>> >>> 2) use a file or database to keep track of the queue, then write a
>>> >>> separate python script to manage it, and run it regularly with cron
>>> >>>
>>> >>> Then is there anything special about accessing the file that I need
>>> >>> to
>>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>>> >>> question, I don't know much about multithreading/file access issues -
>>> >>> just enough to know that it could be a problem : )
>>> >>>
>>> >>> I'd really appreciate any help or guidance.
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Rodrigo Aliste P.
>>> >>
>>> >
>>> >
>>
>
>

Re: [flask] Beginner's question

From:
Benjamin Sergeant
Date:
2011-09-03 @ 16:30
One thing I did in the past (and it worked great, did that twice) was 
using an inotify daemon on Linux.

In your case you would copy files to an inotify monitored folder, and from there 
you could copy those files to your dedicated file server with maybe rsync,
giving it a
rate limit option. Once your upload is done, you'll have copied your 
uploaded file to some folder. 
The event to listen to is the CLOSE event (can't remember from the top of 
my head), 
which mean the file was written and then closed.

You can use inotifywait (come with an Ubuntu package, or can be compiled 
from source) from a pure 
shell program, or shell it out from python. If you are copying random 
files to a single transfer folder you'll have to generate unique names.

Sounds like a fun thing to write :)

Cheers,
- Benjamin

On Sep 3, 2011, at 7:50 AM, Garrett Robinson wrote:

> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
> 
> I will check out multiprocessing. Thanks!
> 
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> The Django community has had some talk about using the multiprocessing
>> module (Python 2.6+). That would probably be the easiest thing to set up and
>> get you going since there's no external dependencies and it's all pure
>> Python code. I have yet to try it out but I also have an upcoming project
>> that requires extensive server-side processing.
>> I've tried Celery so far. If you can get it going, I think it'll be nice. I
>> had a lot of problems with it though. Seems to do too much, so it's hard to
>> find docs on what you actually want to do with it. Was also looking for an
>> alternative.
>> Anyone have any good experience with multiprocessing?
>> 
>> On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com> wrote:
>>> 
>>> AFAIK file uploading works in a per-request basis, so I think at best you
>>> could have more than 1 server and load balance uploads betwen them.
>>> For post processing or moving the files I would encourage you to think of
>>> other (nicer) options, such as Gearman or Flask-Celery.
>>> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>>> <garrett.f.robinson@gmail.com> wrote:
>>>> 
>>>> I'm working on a site that will have a queue for transferring files
>>>> uploaded by its users, to better control bandwidth. I'm wondering how
>>>> to design the queue.
>>>> 
>>>> My thoughts are:
>>>> 1) keep track of it inside the flask app using Python's Queue; but
>>>> then i don't know how to poll it regularly because everything in Flask
>>>> is request-driven
>>>> 2) use a file or database to keep track of the queue, then write a
>>>> separate python script to manage it, and run it regularly with cron
>>>> 
>>>> Then is there anything special about accessing the file that I need to
>>>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>>>> question, I don't know much about multithreading/file access issues -
>>>> just enough to know that it could be a problem : )
>>>> 
>>>> I'd really appreciate any help or guidance.
>>> 
>>> 
>>> 
>>> --
>>> Rodrigo Aliste P.
>>> 
>> 
>> 

Re: [flask] Beginner's question

From:
Sean Lynch
Date:
2011-09-03 @ 16:52
Another option is to use the Google App Engine Task Queue (
http://code.google.com/appengine/docs/python/taskqueue/overview.html) if you
would want to host there.

There is also a MapReduce API that uses the task queue (
http://code.google.com/p/appengine-mapreduce/).

I use the MapReduce API to allow a user to upload a CSV that I then process
each line into datastore entities.  I plan to write a Flask snippet in the
future on how to make this work (probably be a few weeks as I'll be away for
the next week).

On Sat, Sep 3, 2011 at 10:50 AM, Garrett Robinson <
garrett.f.robinson@gmail.com> wrote:

> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
>
> I will check out multiprocessing. Thanks!
>
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
> > The Django community has had some talk about using the multiprocessing
> > module (Python 2.6+). That would probably be the easiest thing to set up
> and
> > get you going since there's no external dependencies and it's all pure
> > Python code. I have yet to try it out but I also have an upcoming project
> > that requires extensive server-side processing.
> > I've tried Celery so far. If you can get it going, I think it'll be nice.
> I
> > had a lot of problems with it though. Seems to do too much, so it's hard
> to
> > find docs on what you actually want to do with it. Was also looking for
> an
> > alternative.
> > Anyone have any good experience with multiprocessing?
> >
> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
> wrote:
> >>
> >> AFAIK file uploading works in a per-request basis, so I think at best
> you
> >> could have more than 1 server and load balance uploads betwen them.
> >> For post processing or moving the files I would encourage you to think
> of
> >> other (nicer) options, such as Gearman or Flask-Celery.
> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
> >> <garrett.f.robinson@gmail.com> wrote:
> >>>
> >>> I'm working on a site that will have a queue for transferring files
> >>> uploaded by its users, to better control bandwidth. I'm wondering how
> >>> to design the queue.
> >>>
> >>> My thoughts are:
> >>> 1) keep track of it inside the flask app using Python's Queue; but
> >>> then i don't know how to poll it regularly because everything in Flask
> >>> is request-driven
> >>> 2) use a file or database to keep track of the queue, then write a
> >>> separate python script to manage it, and run it regularly with cron
> >>>
> >>> Then is there anything special about accessing the file that I need to
> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> >>> question, I don't know much about multithreading/file access issues -
> >>> just enough to know that it could be a problem : )
> >>>
> >>> I'd really appreciate any help or guidance.
> >>
> >>
> >>
> >> --
> >> Rodrigo Aliste P.
> >>
> >
> >
>

Re: [flask] Beginner's question

From:
Joe Esposito
Date:
2011-09-03 @ 17:32
Thanks, I was wondering whether GAE would be able to help with this.
MapReduce sounds interesting, I'd really like to see your snippet when you
get around to it =)

On Sat, Sep 3, 2011 at 12:52 PM, Sean Lynch <techniq35@gmail.com> wrote:

> Another option is to use the Google App Engine Task Queue (
> http://code.google.com/appengine/docs/python/taskqueue/overview.html) if
> you would want to host there.
>
> There is also a MapReduce API that uses the task queue (
> http://code.google.com/p/appengine-mapreduce/).
>
> I use the MapReduce API to allow a user to upload a CSV that I then process
> each line into datastore entities.  I plan to write a Flask snippet in the
> future on how to make this work (probably be a few weeks as I'll be away for
> the next week).
>
> On Sat, Sep 3, 2011 at 10:50 AM, Garrett Robinson <
> garrett.f.robinson@gmail.com> wrote:
>
>> Thanks for the replies. I agree, Celery seems like too much for me
>> too. All I want is to implement is a simple file queue to rate-limit
>> the transfer of files from the web server to a dedicated file server,
>> in a threadsafe way.
>>
>> I will check out multiprocessing. Thanks!
>>
>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> > The Django community has had some talk about using the multiprocessing
>> > module (Python 2.6+). That would probably be the easiest thing to set up
>> and
>> > get you going since there's no external dependencies and it's all pure
>> > Python code. I have yet to try it out but I also have an upcoming
>> project
>> > that requires extensive server-side processing.
>> > I've tried Celery so far. If you can get it going, I think it'll be
>> nice. I
>> > had a lot of problems with it though. Seems to do too much, so it's hard
>> to
>> > find docs on what you actually want to do with it. Was also looking for
>> an
>> > alternative.
>> > Anyone have any good experience with multiprocessing?
>> >
>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>> wrote:
>> >>
>> >> AFAIK file uploading works in a per-request basis, so I think at best
>> you
>> >> could have more than 1 server and load balance uploads betwen them.
>> >> For post processing or moving the files I would encourage you to think
>> of
>> >> other (nicer) options, such as Gearman or Flask-Celery.
>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>> >> <garrett.f.robinson@gmail.com> wrote:
>> >>>
>> >>> I'm working on a site that will have a queue for transferring files
>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>> >>> to design the queue.
>> >>>
>> >>> My thoughts are:
>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>> >>> then i don't know how to poll it regularly because everything in Flask
>> >>> is request-driven
>> >>> 2) use a file or database to keep track of the queue, then write a
>> >>> separate python script to manage it, and run it regularly with cron
>> >>>
>> >>> Then is there anything special about accessing the file that I need to
>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>> >>> question, I don't know much about multithreading/file access issues -
>> >>> just enough to know that it could be a problem : )
>> >>>
>> >>> I'd really appreciate any help or guidance.
>> >>
>> >>
>> >>
>> >> --
>> >> Rodrigo Aliste P.
>> >>
>> >
>> >
>>
>
>