Re: [flask] Beginner's question
- From:
- Cheng-Han Lee
- Date:
- 2011-09-03 @ 21:12
Celery can be a bit difficult to setup.
I would also recommend beanstalkd. Its work-queue similar to that of celery,
but its easier to setup and use (at least from what I've read).
On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson <
garrett.f.robinson@gmail.com> wrote:
> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
>
> I will check out multiprocessing. Thanks!
>
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
> > The Django community has had some talk about using the multiprocessing
> > module (Python 2.6+). That would probably be the easiest thing to set up
> and
> > get you going since there's no external dependencies and it's all pure
> > Python code. I have yet to try it out but I also have an upcoming project
> > that requires extensive server-side processing.
> > I've tried Celery so far. If you can get it going, I think it'll be nice.
> I
> > had a lot of problems with it though. Seems to do too much, so it's hard
> to
> > find docs on what you actually want to do with it. Was also looking for
> an
> > alternative.
> > Anyone have any good experience with multiprocessing?
> >
> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
> wrote:
> >>
> >> AFAIK file uploading works in a per-request basis, so I think at best
> you
> >> could have more than 1 server and load balance uploads betwen them.
> >> For post processing or moving the files I would encourage you to think
> of
> >> other (nicer) options, such as Gearman or Flask-Celery.
> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
> >> <garrett.f.robinson@gmail.com> wrote:
> >>>
> >>> I'm working on a site that will have a queue for transferring files
> >>> uploaded by its users, to better control bandwidth. I'm wondering how
> >>> to design the queue.
> >>>
> >>> My thoughts are:
> >>> 1) keep track of it inside the flask app using Python's Queue; but
> >>> then i don't know how to poll it regularly because everything in Flask
> >>> is request-driven
> >>> 2) use a file or database to keep track of the queue, then write a
> >>> separate python script to manage it, and run it regularly with cron
> >>>
> >>> Then is there anything special about accessing the file that I need to
> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> >>> question, I don't know much about multithreading/file access issues -
> >>> just enough to know that it could be a problem : )
> >>>
> >>> I'd really appreciate any help or guidance.
> >>
> >>
> >>
> >> --
> >> Rodrigo Aliste P.
> >>
> >
> >
>
Re: [flask] Beginner's question
- From:
- Joe Esposito
- Date:
- 2011-09-03 @ 23:05
Wow Beanstalk looks amazing. Going to try it out. Thanks for sharing!
Some observations:
- beanstalkd is the external dependency. It's very lightweight though and
has no configuration files. You simply run it as a
daemon<http://kr.github.com/beanstalkd/>
.
- There are many clients written for various languages. Two for Python,
though beanstalkc <http://github.com/earl/beanstalkc/> is the more active
one and is the only one on PyPi
- The clients look incredibly simple to use. Simply import, call Connect
using the host/port specified in the beanstalkd cmdline, then:
- On the producer side, call "put" to create a job
- On the consumer side, call "reserve" to safely get a job and
"delete" the job is complete
- You can have multiple named job queues (called "tubes")
- Jobs can be persistent (configurable via beanstalkd cmdline)
- All messages are strings. This shouldn't be a problem though since you
should really use a DB for more complex communication.
- There's no built-in mechanism for reporting progress, though using a
tube and the job id, you can manually communicate the other
direction<http://groups.google.com/group/beanstalk-talk/browse_thread/thread/9039eb85e8cefba0/241a115af42d8d26?lnk=gst&q=progress#241a115af42d8d26>
- There's no Windows support. Perhaps you could handle the
beanstalkc.SocketError to call the job function directly when in debug mode
if you're developing on Windows.
References:
Beanstalkd <http://kr.github.com/beanstalkd/>
Beanstalkc (Python client) <https://github.com/earl/beanstalkc>
Beanstalk FAQ <https://github.com/kr/beanstalkd/wiki/faq>
A simple
tutorial<http://parand.com/say/index.php/2008/10/12/beanstalkd-python-basic-tutorial/>
On Sat, Sep 3, 2011 at 5:12 PM, Cheng-Han Lee <lee.chenghan@gmail.com>wrote:
> Celery can be a bit difficult to setup.
>
> I would also recommend beanstalkd. Its work-queue similar to that of
> celery, but its easier to setup and use (at least from what I've read).
>
>
> On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson <
> garrett.f.robinson@gmail.com> wrote:
>
>> Thanks for the replies. I agree, Celery seems like too much for me
>> too. All I want is to implement is a simple file queue to rate-limit
>> the transfer of files from the web server to a dedicated file server,
>> in a threadsafe way.
>>
>> I will check out multiprocessing. Thanks!
>>
>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> > The Django community has had some talk about using the multiprocessing
>> > module (Python 2.6+). That would probably be the easiest thing to set up
>> and
>> > get you going since there's no external dependencies and it's all pure
>> > Python code. I have yet to try it out but I also have an upcoming
>> project
>> > that requires extensive server-side processing.
>> > I've tried Celery so far. If you can get it going, I think it'll be
>> nice. I
>> > had a lot of problems with it though. Seems to do too much, so it's hard
>> to
>> > find docs on what you actually want to do with it. Was also looking for
>> an
>> > alternative.
>> > Anyone have any good experience with multiprocessing?
>> >
>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>> wrote:
>> >>
>> >> AFAIK file uploading works in a per-request basis, so I think at best
>> you
>> >> could have more than 1 server and load balance uploads betwen them.
>> >> For post processing or moving the files I would encourage you to think
>> of
>> >> other (nicer) options, such as Gearman or Flask-Celery.
>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>> >> <garrett.f.robinson@gmail.com> wrote:
>> >>>
>> >>> I'm working on a site that will have a queue for transferring files
>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>> >>> to design the queue.
>> >>>
>> >>> My thoughts are:
>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>> >>> then i don't know how to poll it regularly because everything in Flask
>> >>> is request-driven
>> >>> 2) use a file or database to keep track of the queue, then write a
>> >>> separate python script to manage it, and run it regularly with cron
>> >>>
>> >>> Then is there anything special about accessing the file that I need to
>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>> >>> question, I don't know much about multithreading/file access issues -
>> >>> just enough to know that it could be a problem : )
>> >>>
>> >>> I'd really appreciate any help or guidance.
>> >>
>> >>
>> >>
>> >> --
>> >> Rodrigo Aliste P.
>> >>
>> >
>> >
>>
>
>
Re: [flask] Beginner's question
- From:
- Garrett Robinson
- Date:
- 2011-09-29 @ 21:11
Wow, thanks for all the responses, everybody. Sorry this is much later
on. I'm currently using beanstalkd, with the beanstalkc python
library. It is the perfect solution for what I'm doing. Thanks to Joe
for recommending it, and thanks to everyone else for all your
thoughtful suggestions!
If anyone is interested, I'm using beanstalkd in these projects:
https://github.com/handsomeransoms/haps (with Flask)
https://github.com/handsomeransoms/haps-hidserv (no Flask here)
On Sat, Sep 3, 2011 at 7:05 PM, Joe Esposito <espo58@gmail.com> wrote:
> Wow Beanstalk looks amazing. Going to try it out. Thanks for sharing!
> Some observations:
>
> beanstalkd is the external dependency. It's very lightweight though and has
> no configuration files. You simply run it as a daemon.
> There are many clients written for various languages. Two for Python,
> though beanstalkc is the more active one and is the only one on PyPi
> The clients look incredibly simple to use. Simply import, call Connect using
> the host/port specified in the beanstalkd cmdline, then:
>
> On the producer side, call "put" to create a job
> On the consumer side, call "reserve" to safely get a job and "delete" the
> job is complete
>
> You can have multiple named job queues (called "tubes")
> Jobs can be persistent (configurable via beanstalkd cmdline)
> All messages are strings. This shouldn't be a problem though since you
> should really use a DB for more complex communication.
> There's no built-in mechanism for reporting progress, though using a tube
> and the job id, you can manually communicate the other direction
> There's no Windows support. Perhaps you could handle the
> beanstalkc.SocketError to call the job function directly when in debug mode
> if you're developing on Windows.
>
> References:
> Beanstalkd
> Beanstalkc (Python client)
> Beanstalk FAQ
> A simple tutorial
> On Sat, Sep 3, 2011 at 5:12 PM, Cheng-Han Lee <lee.chenghan@gmail.com>
> wrote:
>>
>> Celery can be a bit difficult to setup.
>>
>> I would also recommend beanstalkd. Its work-queue similar to that of
>> celery, but its easier to setup and use (at least from what I've read).
>>
>> On Sat, Sep 3, 2011 at 7:50 AM, Garrett Robinson
>> <garrett.f.robinson@gmail.com> wrote:
>>>
>>> Thanks for the replies. I agree, Celery seems like too much for me
>>> too. All I want is to implement is a simple file queue to rate-limit
>>> the transfer of files from the web server to a dedicated file server,
>>> in a threadsafe way.
>>>
>>> I will check out multiprocessing. Thanks!
>>>
>>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>>> > The Django community has had some talk about using the multiprocessing
>>> > module (Python 2.6+). That would probably be the easiest thing to set
>>> > up and
>>> > get you going since there's no external dependencies and it's all pure
>>> > Python code. I have yet to try it out but I also have an upcoming
>>> > project
>>> > that requires extensive server-side processing.
>>> > I've tried Celery so far. If you can get it going, I think it'll be
>>> > nice. I
>>> > had a lot of problems with it though. Seems to do too much, so it's
>>> > hard to
>>> > find docs on what you actually want to do with it. Was also looking for
>>> > an
>>> > alternative.
>>> > Anyone have any good experience with multiprocessing?
>>> >
>>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>>> > wrote:
>>> >>
>>> >> AFAIK file uploading works in a per-request basis, so I think at best
>>> >> you
>>> >> could have more than 1 server and load balance uploads betwen them.
>>> >> For post processing or moving the files I would encourage you to think
>>> >> of
>>> >> other (nicer) options, such as Gearman or Flask-Celery.
>>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>>> >> <garrett.f.robinson@gmail.com> wrote:
>>> >>>
>>> >>> I'm working on a site that will have a queue for transferring files
>>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>>> >>> to design the queue.
>>> >>>
>>> >>> My thoughts are:
>>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>>> >>> then i don't know how to poll it regularly because everything in
>>> >>> Flask
>>> >>> is request-driven
>>> >>> 2) use a file or database to keep track of the queue, then write a
>>> >>> separate python script to manage it, and run it regularly with cron
>>> >>>
>>> >>> Then is there anything special about accessing the file that I need
>>> >>> to
>>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>>> >>> question, I don't know much about multithreading/file access issues -
>>> >>> just enough to know that it could be a problem : )
>>> >>>
>>> >>> I'd really appreciate any help or guidance.
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Rodrigo Aliste P.
>>> >>
>>> >
>>> >
>>
>
>
Re: [flask] Beginner's question
- From:
- Benjamin Sergeant
- Date:
- 2011-09-03 @ 16:30
One thing I did in the past (and it worked great, did that twice) was
using an inotify daemon on Linux.
In your case you would copy files to an inotify monitored folder, and from there
you could copy those files to your dedicated file server with maybe rsync,
giving it a
rate limit option. Once your upload is done, you'll have copied your
uploaded file to some folder.
The event to listen to is the CLOSE event (can't remember from the top of
my head),
which mean the file was written and then closed.
You can use inotifywait (come with an Ubuntu package, or can be compiled
from source) from a pure
shell program, or shell it out from python. If you are copying random
files to a single transfer folder you'll have to generate unique names.
Sounds like a fun thing to write :)
Cheers,
- Benjamin
On Sep 3, 2011, at 7:50 AM, Garrett Robinson wrote:
> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
>
> I will check out multiprocessing. Thanks!
>
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> The Django community has had some talk about using the multiprocessing
>> module (Python 2.6+). That would probably be the easiest thing to set up and
>> get you going since there's no external dependencies and it's all pure
>> Python code. I have yet to try it out but I also have an upcoming project
>> that requires extensive server-side processing.
>> I've tried Celery so far. If you can get it going, I think it'll be nice. I
>> had a lot of problems with it though. Seems to do too much, so it's hard to
>> find docs on what you actually want to do with it. Was also looking for an
>> alternative.
>> Anyone have any good experience with multiprocessing?
>>
>> On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com> wrote:
>>>
>>> AFAIK file uploading works in a per-request basis, so I think at best you
>>> could have more than 1 server and load balance uploads betwen them.
>>> For post processing or moving the files I would encourage you to think of
>>> other (nicer) options, such as Gearman or Flask-Celery.
>>> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>>> <garrett.f.robinson@gmail.com> wrote:
>>>>
>>>> I'm working on a site that will have a queue for transferring files
>>>> uploaded by its users, to better control bandwidth. I'm wondering how
>>>> to design the queue.
>>>>
>>>> My thoughts are:
>>>> 1) keep track of it inside the flask app using Python's Queue; but
>>>> then i don't know how to poll it regularly because everything in Flask
>>>> is request-driven
>>>> 2) use a file or database to keep track of the queue, then write a
>>>> separate python script to manage it, and run it regularly with cron
>>>>
>>>> Then is there anything special about accessing the file that I need to
>>>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>>>> question, I don't know much about multithreading/file access issues -
>>>> just enough to know that it could be a problem : )
>>>>
>>>> I'd really appreciate any help or guidance.
>>>
>>>
>>>
>>> --
>>> Rodrigo Aliste P.
>>>
>>
>>
Re: [flask] Beginner's question
- From:
- Sean Lynch
- Date:
- 2011-09-03 @ 16:52
Another option is to use the Google App Engine Task Queue (
http://code.google.com/appengine/docs/python/taskqueue/overview.html) if you
would want to host there.
There is also a MapReduce API that uses the task queue (
http://code.google.com/p/appengine-mapreduce/).
I use the MapReduce API to allow a user to upload a CSV that I then process
each line into datastore entities. I plan to write a Flask snippet in the
future on how to make this work (probably be a few weeks as I'll be away for
the next week).
On Sat, Sep 3, 2011 at 10:50 AM, Garrett Robinson <
garrett.f.robinson@gmail.com> wrote:
> Thanks for the replies. I agree, Celery seems like too much for me
> too. All I want is to implement is a simple file queue to rate-limit
> the transfer of files from the web server to a dedicated file server,
> in a threadsafe way.
>
> I will check out multiprocessing. Thanks!
>
> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
> > The Django community has had some talk about using the multiprocessing
> > module (Python 2.6+). That would probably be the easiest thing to set up
> and
> > get you going since there's no external dependencies and it's all pure
> > Python code. I have yet to try it out but I also have an upcoming project
> > that requires extensive server-side processing.
> > I've tried Celery so far. If you can get it going, I think it'll be nice.
> I
> > had a lot of problems with it though. Seems to do too much, so it's hard
> to
> > find docs on what you actually want to do with it. Was also looking for
> an
> > alternative.
> > Anyone have any good experience with multiprocessing?
> >
> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
> wrote:
> >>
> >> AFAIK file uploading works in a per-request basis, so I think at best
> you
> >> could have more than 1 server and load balance uploads betwen them.
> >> For post processing or moving the files I would encourage you to think
> of
> >> other (nicer) options, such as Gearman or Flask-Celery.
> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
> >> <garrett.f.robinson@gmail.com> wrote:
> >>>
> >>> I'm working on a site that will have a queue for transferring files
> >>> uploaded by its users, to better control bandwidth. I'm wondering how
> >>> to design the queue.
> >>>
> >>> My thoughts are:
> >>> 1) keep track of it inside the flask app using Python's Queue; but
> >>> then i don't know how to poll it regularly because everything in Flask
> >>> is request-driven
> >>> 2) use a file or database to keep track of the queue, then write a
> >>> separate python script to manage it, and run it regularly with cron
> >>>
> >>> Then is there anything special about accessing the file that I need to
> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
> >>> question, I don't know much about multithreading/file access issues -
> >>> just enough to know that it could be a problem : )
> >>>
> >>> I'd really appreciate any help or guidance.
> >>
> >>
> >>
> >> --
> >> Rodrigo Aliste P.
> >>
> >
> >
>
Re: [flask] Beginner's question
- From:
- Joe Esposito
- Date:
- 2011-09-03 @ 17:32
Thanks, I was wondering whether GAE would be able to help with this.
MapReduce sounds interesting, I'd really like to see your snippet when you
get around to it =)
On Sat, Sep 3, 2011 at 12:52 PM, Sean Lynch <techniq35@gmail.com> wrote:
> Another option is to use the Google App Engine Task Queue (
> http://code.google.com/appengine/docs/python/taskqueue/overview.html) if
> you would want to host there.
>
> There is also a MapReduce API that uses the task queue (
> http://code.google.com/p/appengine-mapreduce/).
>
> I use the MapReduce API to allow a user to upload a CSV that I then process
> each line into datastore entities. I plan to write a Flask snippet in the
> future on how to make this work (probably be a few weeks as I'll be away for
> the next week).
>
> On Sat, Sep 3, 2011 at 10:50 AM, Garrett Robinson <
> garrett.f.robinson@gmail.com> wrote:
>
>> Thanks for the replies. I agree, Celery seems like too much for me
>> too. All I want is to implement is a simple file queue to rate-limit
>> the transfer of files from the web server to a dedicated file server,
>> in a threadsafe way.
>>
>> I will check out multiprocessing. Thanks!
>>
>> On Sat, Sep 3, 2011 at 10:45 AM, Joe Esposito <espo58@gmail.com> wrote:
>> > The Django community has had some talk about using the multiprocessing
>> > module (Python 2.6+). That would probably be the easiest thing to set up
>> and
>> > get you going since there's no external dependencies and it's all pure
>> > Python code. I have yet to try it out but I also have an upcoming
>> project
>> > that requires extensive server-side processing.
>> > I've tried Celery so far. If you can get it going, I think it'll be
>> nice. I
>> > had a lot of problems with it though. Seems to do too much, so it's hard
>> to
>> > find docs on what you actually want to do with it. Was also looking for
>> an
>> > alternative.
>> > Anyone have any good experience with multiprocessing?
>> >
>> > On Sat, Sep 3, 2011 at 1:25 AM, Rodrigo Aliste P. <raliste@gmail.com>
>> wrote:
>> >>
>> >> AFAIK file uploading works in a per-request basis, so I think at best
>> you
>> >> could have more than 1 server and load balance uploads betwen them.
>> >> For post processing or moving the files I would encourage you to think
>> of
>> >> other (nicer) options, such as Gearman or Flask-Celery.
>> >> On Sat, Sep 3, 2011 at 2:10 AM, Garrett Robinson
>> >> <garrett.f.robinson@gmail.com> wrote:
>> >>>
>> >>> I'm working on a site that will have a queue for transferring files
>> >>> uploaded by its users, to better control bandwidth. I'm wondering how
>> >>> to design the queue.
>> >>>
>> >>> My thoughts are:
>> >>> 1) keep track of it inside the flask app using Python's Queue; but
>> >>> then i don't know how to poll it regularly because everything in Flask
>> >>> is request-driven
>> >>> 2) use a file or database to keep track of the queue, then write a
>> >>> separate python script to manage it, and run it regularly with cron
>> >>>
>> >>> Then is there anything special about accessing the file that I need to
>> >>> know re: Flask, multithreading, etc.? Sorry if this is a n00b
>> >>> question, I don't know much about multithreading/file access issues -
>> >>> just enough to know that it could be a problem : )
>> >>>
>> >>> I'd really appreciate any help or guidance.
>> >>
>> >>
>> >>
>> >> --
>> >> Rodrigo Aliste P.
>> >>
>> >
>> >
>>
>
>