librelist archives

« back to archive

Continuous call to a thir-party site

Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-14 @ 23:31
I have an CLi app that continuously call one website using requests and
time.sleep. Now I need to do that same thing using Flask, is there a good
way to continuously call a site in order to check if there are new content?

My target: http://steamcommunity.com/app/440/tradingforum/

I need to continuously check if there are new topics, Steam sadly doesn't
provide any sort of API for the forums and I'm getting all data needed
using bs4.

My objective: Code defined 'delay time' to check the site and whenever a
new topic comes up I want to get the URL (full site URL or just the 'new'
part of it).

Is it possible?

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 00:17
I'm sure that such a thing can be done with Flask. However, I'm not sure
that Flask is the right tool for the job, since it's a framework for
building a web server that *listens* for traffic.

Why not just set up a cron job that runs, say, every five minutes, scrapes
the page, checks against the last list of topics (which can be stored in a
database somewhere), and then fires off a relevant alert if new topics
appear?

On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <juan0christian@gmail.com>
wrote:

> I have an CLi app that continuously call one website using requests and
> time.sleep. Now I need to do that same thing using Flask, is there a good
> way to continuously call a site in order to check if there are new content?
>
> My target: http://steamcommunity.com/app/440/tradingforum/
>
> I need to continuously check if there are new topics, Steam sadly doesn't
> provide any sort of API for the forums and I'm getting all data needed
> using bs4.
>
> My objective: Code defined 'delay time' to check the site and whenever a
> new topic comes up I want to get the URL (full site URL or just the 'new'
> part of it).
>
> Is it possible?
>



-- 
"Structures are the weapons of the mathematician."
--Bourbaki

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 00:28
"Cron job", what do you mean by that?

On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:

> I'm sure that such a thing can be done with Flask. However, I'm not sure
> that Flask is the right tool for the job, since it's a framework for
> building a web server that *listens* for traffic.
>
> Why not just set up a cron job that runs, say, every five minutes, scrapes
> the page, checks against the last list of topics (which can be stored in a
> database somewhere), and then fires off a relevant alert if new topics
> appear?
>
> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <juan0christian@gmail.com>
> wrote:
>
>> I have an CLi app that continuously call one website using requests and
>> time.sleep. Now I need to do that same thing using Flask, is there a good
>> way to continuously call a site in order to check if there are new content?
>>
>> My target: http://steamcommunity.com/app/440/tradingforum/
>>
>> I need to continuously check if there are new topics, Steam sadly doesn't
>> provide any sort of API for the forums and I'm getting all data needed
>> using bs4.
>>
>> My objective: Code defined 'delay time' to check the site and whenever a
>> new topic comes up I want to get the URL (full site URL or just the 'new'
>> part of it).
>>
>> Is it possible?
>>
>
>
>
> --
> "Structures are the weapons of the mathematician."
> --Bourbaki
>

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 00:39
A job set up in Cron: http://en.wikipedia.org/wiki/Cron

Of course, any other task scheduler would suffice.

On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <juan0christian@gmail.com>
wrote:

> "Cron job", what do you mean by that?
>
> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> I'm sure that such a thing can be done with Flask. However, I'm not sure
>> that Flask is the right tool for the job, since it's a framework for
>> building a web server that *listens* for traffic.
>>
>> Why not just set up a cron job that runs, say, every five minutes,
>> scrapes the page, checks against the last list of topics (which can be
>> stored in a database somewhere), and then fires off a relevant alert if new
>> topics appear?
>>
>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <juan0christian@gmail.com
>> > wrote:
>>
>>> I have an CLi app that continuously call one website using requests and
>>> time.sleep. Now I need to do that same thing using Flask, is there a good
>>> way to continuously call a site in order to check if there are new content?
>>>
>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>
>>> I need to continuously check if there are new topics, Steam sadly
>>> doesn't provide any sort of API for the forums and I'm getting all data
>>> needed using bs4.
>>>
>>> My objective: Code defined 'delay time' to check the site and whenever a
>>> new topic comes up I want to get the URL (full site URL or just the 'new'
>>> part of it).
>>>
>>> Is it possible?
>>>
>>
>>
>>
>> --
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>>
>
>


-- 
"Structures are the weapons of the mathematician."
--Bourbaki

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 00:53
I need something not OS related, it seems that cron is UNIX only.

On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:

> A job set up in Cron: http://en.wikipedia.org/wiki/Cron
>
> Of course, any other task scheduler would suffice.
>
> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <juan0christian@gmail.com>
> wrote:
>
>> "Cron job", what do you mean by that?
>>
>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>> I'm sure that such a thi ng can be done with Flask. However, I'm not
>>> sure that Flask is the right tool for the job, since it's a framework for
>>> building a web server that *listens* for traffic.
>>>
>>> Why not just set up a cron job that runs, say, every five minutes,
>>> scrapes the page, checks against the last list of topics (which can be
>>> stored in a database somewhere), and then fires off a relevant alert if new
>>> topics appear?
>>>
>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>> juan0christian@gmail.com> wrote:
>>>
>>>> I have an CLi app that continuously call one website using requests and
>>>> time.sleep. Now I need to do that same thing using Flask, is there a good
>>>> way to continuously call a site in order to check if there are new content?
>>>>
>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>
>>>> I need to continuously check if there are new topics, Steam sadly
>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>> needed using bs4.
>>>>
>>>> My objective: Code defined 'delay time' to check the site and whenever
>>>> a new topic comes up I want to get the URL (full site URL or just the 'new'
>>>> part of it).
>>>>
>>>> Is it possible?
>>>>
>>>
>>>
>>>
>>> --
>>> "Structures are the weapons of the mathematician."
>>> --Bourbaki
>>>
>>
>>
>
>
> --
> "Structures are the weapons of the mathematician."
> --Bourbaki
>

Re: [flask] Continuous call to a thir-party site

From:
Scott Lipsig
Date:
2014-10-16 @ 02:29
Not sure why you’d run flask/python/anything on something that isn’t 
unix-based, but you *can* use celery to do pretty much the same thing. 
I’ve used it to good effect with scrapers that needed built-in delays in 
the past.


http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
http://celery.readthedocs.org/en/latest/reference/celery.schedules.html

Of course, if you’re just running something every five minutes, there 
isn’t any reason to use anything more complex than the standard library’s 
time.sleep() function. 
e.g.:


from time import sleep

while True:
    scrape_things()
    time.sleep(300) # 300 seconds = 5 minutes


On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com> wrote:

> I need something not OS related, it seems that cron is UNIX only.
> 
> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
> A job set up in Cron: http://en.wikipedia.org/wiki/Cron
> 
> Of course, any other task scheduler would suffice.
> 
> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
> "Cron job", what do you mean by that?
> 
> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
> I'm sure that such a thi ng can be done with Flask. However, I'm not 
sure that Flask is the right tool for the job, since it's a framework for 
building a web server that *listens* for traffic.
> 
> Why not just set up a cron job that runs, say, every five minutes, 
scrapes the page, checks against the last list of topics (which can be 
stored in a database somewhere), and then fires off a relevant alert if 
new topics appear?
> 
> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
> I have an CLi app that continuously call one website using requests and 
time.sleep. Now I need to do that same thing using Flask, is there a good 
way to continuously call a site in order to check if there are new 
content?
> 
> My target: http://steamcommunity.com/app/440/tradingforum/
> 
> I need to continuously check if there are new topics, Steam sadly 
doesn't provide any sort of API for the forums and I'm getting all data 
needed using bs4.
> 
> My objective: Code defined 'delay time' to check the site and whenever a
new topic comes up I want to get the URL (full site URL or just the 'new' 
part of it).
> 
> Is it possible?
> 
> 
> 
> -- 
> "Structures are the weapons of the mathematician."
> --Bourbaki
> 
> 
> 
> 
> -- 
> "Structures are the weapons of the mathematician."
> --Bourbaki
> 

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 03:13
It's more like 5seconds, sleep still good for this?

On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com>
wrote:

> Not sure why you’d run flask/python/anything on something that isn’t
> unix-based, but you *can* use celery to do pretty much the same thing. I’ve
> used it to good effect with scrapers that needed built-in delays in the
> past.
>
>
> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
> http://celery.readthedocs.org/en/latest/reference/celery.schedules.html
>
> Of course, if you’re just running something every five minutes, there
> isn’t any reason to use anything more complex than the standard library’s
> time.sleep() function.
> < div>e.g.:
>
>
> from time import sleep
>
> while True:
>     scrape_things()
>     time.sleep(300) # 300 seconds = 5 minutes
>
>
> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
> wrote:
>
> I need something not OS related, it seems that cron is UNIX only.
>
> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>> <http://en.wikipedia.org/wiki/Cron>
>>
>> Of course, any other task scheduler would suffice.
>>
>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <juan0christian@gmail.com
>> > wrote:
>>
>>> "Cron job", what do you mean by that?
>>>
>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>>
>>>> I'm sure that such a thi ng can be done with Flask. However, I'm not
>>>> sure that Flask is the right tool for the job, since it's a framework for
>>>> building a web server that *listens* for traffic.
>>>>
>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>> scrapes the page, checks against the last list of topics (which can be
>>>> stored in a database somewhere), and then fires off a relevant alert if new
>>>> topics appear?
>>>>
>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>> juan0christian@gmail.com> wrote:
>>>>
>>>>> I have an CLi app that continuously call one website using requests
>>>>> and time.sleep. Now I need to do that same thing using Flask, is there a
>>>>> good way to continuously call a site in order to check if there are new
>>>>> content?
>>>>>
>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>
>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>> needed using bs4.
>>>>>
>>>>> My objective: Code defined 'delay time' to check the site and whenever
>>>>> a new topic comes up I want to get the URL (full site URL or just the 'new'
>>>>> part of it).
>>>>>
>>>>> Is it possible?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "Structures are the weapons of the mathematician."
>>>> --Bourbaki
>>>>
>>>
>>>
>>
>>
>> --
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>>
>
>
>

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 03:35
Yes, the argument to time.sleep() is the number of seconds to sleep:
https://docs.python.org/2/library/time.html#time.sleep

That said, you might want to really think about whether or not you need to
know whether a new forum topic pops up every *five seconds*. Not only does
that seem like overkill, but you might find your IP banned by Steam.

On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <juan0christian@gmail.com>
wrote:

> It's more like 5seconds, sleep still good for this?
>
> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com>
> wrote:
>
>> Not sure why you’d run flask/python/anything on something that isn’t
>> unix-based, but you *can* use celery to do pretty much the same thing. I’ve
>> used it to good effect with scrapers that needed built-in delays in the
>> past.
>>
>>
>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>> http://cele ry.readthedocs.org/en/latest/reference/celery.schedules.html
>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>
>> Of course, if you’re just running something every five minutes, there
>> isn’t any reason to use anything more complex than the standard library’s
>> time.sleep() function.
>> < div>e.g.:
>>
>>
>> from time import sleep
>>
>> while True:
>>     scrape_things()
>>     time.sleep(300) # 300 seconds = 5 minutes
>>
>>
>> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
>> wrote:
>>
>> I need something not OS related, it seems that cron is UNIX only.
>>
>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>> <http://en.wikipedia.org/wiki/Cron>
>>>
>>> Of course, any other task scheduler would suffice.
>>>
>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>> juan0christian@gmail.com> wrote:
>>>
>>>> "Cron job", what do you mean by that?
>>>>
>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm not
>>>>> sure that Flask is the right tool for the job, since it's a framework for
>>>>> building a web server that *listens* for traffic.
>>>>>
>>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>>> scrapes the page, checks against the last list of topics (which can be
>>>>> stored in a database somewhere), and then fires off a relevant alert if new
>>>>> topics appear?
>>>>>
>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>> juan0christian@gmail.com> wrote:
>>>>>
>>>>>> I have an CLi app that continuously call one website using requests
>>>>>> and time.sleep. Now I need to do that same thing using Flask, is there a
>>>>>> good way to continuously call a site in order to check if there are new
>>>>>> content?
>>>>>>
>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>
>>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>>> needed using bs4.
>>>>>>
>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>> whenever a new topic comes up I want to get the URL (full site URL or just
>>>>>> the 'new' part of it).
>>>>>>
>>>>>> Is it possible?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "Structures are the weapons of the mathematician."
>>>>> --Bourbaki
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> "Structures are the weapons of the mathematician."
>>> --Bourbaki
>>>
>>
>>
>>
>


-- 
"Structures are the weapons of the mathematician."
--Bourbaki

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 03:37
And to add to what Scott mentioned earlier: you should really consider
running this in a Linux environment. Unless you're coding in C# or .Net,
Windows is the red-headed stepchild of coding environments.

On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com> wrote:

> Yes, the argument to time.sleep() is the number of seconds to sleep:
> https://docs.python.org/2/library/time.html#time.sleep
>
> That said, you might want to really think about whether or not you need to
> know whether a new forum topic pops up every *five seconds*. Not only does
> that seem like overkill, but you might find your IP banned by Steam.
>
> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <juan0christian@gmail.com
> > wrote:
>
>> It's more like 5seconds, sleep still good for this?
>>
>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com>
>> wrote:
>>
>>> Not sure why you’d run flask/python/anything on something that isn’t
>>> unix-based, but you *can* use celery to do pretty much the same thing. I’ve
>>> used it to good effect with scrapers that needed built-in delays in the
>>> past.
>>>
>>>
>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>> http://cele ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>
>>> Of course, if you’re just running something every five minutes, there
>>> isn’t any reason to use anything more complex than the standard library’s
>>> time.sleep() function.
>>> < div>e.g.:
>>>
>>>
>>> from time import sleep
>>>
>>> while True:
>>>     scrape_things()
>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>
>>>
>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
>>> wrote:
>>>
>>> I need something not OS related, it seems that cron is UNIX only.
>>>
>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>>
>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>
>>>> Of course, any other task scheduler would suffice.
>>>>
>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>> juan0christian@gmail.com> wrote:
>>>>
>>>>> "Cron job", what do you mean by that?
>>>>>
>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm not
>>>>>> sure that Flask is the right tool for the job, since it's a framework for
>>>>>> building a web server that *listens* for traffic.
>>>>>>
>>>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>>>> scrapes the page, checks against the last list of topics (which can be
>>>>>> stored in a database somewhere), and then fires off a relevant alert if new
>>>>>> topics appear?
>>>>>>
>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>> juan0christian@gmail.com> wrote:
>>>>>>
>>>>>>> I have an CLi app that continuously call one website using requests
>>>>>>> and time.sleep. Now I need to do that same thing using Flask, is there a
>>>>>>> good way to continuously call a site in order to check if there are new
>>>>>>> content?
>>>>>>>
>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>
>>>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>>>> needed using bs4.
>>>>>>>
>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>> whenever a new topic comes up I want to get the URL (full site URL or just
>>>>>>> the 'new' part of it).
>>>>>>>
>>>>>>> Is it possible?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> "Structures are the weapons of the mathematician."
>>>>>> --Bourbaki
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "Structures are the weapons of the mathematician."
>>>> --Bourbaki
>>>>
>>>
>>>
>>>
>>
>
>
> --
> "Structures are the weapons of the mathematician."
> --Bourbaki
>



-- 
"Structures are the weapons of the mathematician."
--Bourbaki

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 14:33
Yes people, I knew about the sleep() and how it works, but I thought it
would be kind of a kludge, indeed I'm using it on the 'kludge version' of
my script that runs on terminal, but for the Flask version I wanted
something more 'professional'. So, calling the site every 5 seconds isn't
that good because I may get IP banned, how can I continuously check for new
topics then? =/

I'm coding it in a Windows env using PyCharm, but the final version will
run on my NAS Server that runs Linux, but I do prefer to not use something
OS dependent, doesn't matter if it is Win or Linux.

On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com> wrote:

> And to add to what Scott mentioned earlier: you should really consider
> running this in a Linux environment. Unless you're coding in C# or .Net,
> Windows is the red-headed stepchild of coding environments.
>
> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>> https://docs.python.org/2/library/time.html#time.sleep
>>
>> That said, you might want to really think about whether or not you need
>> to know whether a new forum topic pops up every *five seconds*. Not only
>> does that seem like overkill, but you might find your IP banne d by Steam.
>>
>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>> juan0christian@gmail.com> wrote:
>>
>>> It's more like 5seconds, sleep still good for this?
>>>
>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com>
>>> wrote:
>>>
>>>> Not sure why you’d run flask/python/anything on something that isn’t
>>>> unix-based, but you *can* use celery to do pretty much the same thing. I’ve
>>>> used it to good effect with scrapers that needed built-in delays in the
>>>> past.
>>>>
>>>>
>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>> http://cele ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>
>>>> Of course, if you’re just running something every five minutes, there
>>>> isn’t any reason to use anything more complex than the standard library’s
>>>> time.sleep() function.
>>>> < div>e.g.:
>>>>
>>>>
>>>> from time import sleep
>>>>
>>>> while True:
>>>>     scrape_things()
>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>
>>>>
>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
>>>> wrote:
>>>>
>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>
>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>> wrote:
>>>>
>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>
>>>>> Of course, any other task scheduler would suffice.
>>>>>
>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>> juan0christian@gmail.com> wrote:
>>>>>
>>>>>> "Cron job", what do you mean by that?
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm not
>>>>>>> sure that Flask is the right tool for the job, since it's a framework for
>>>>>>> building a web server that *listens* for traffic.
>>>>>>>
>>>>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>>>>> scrapes the page, checks against the last list of topics (which can be
>>>>>>> stored in a database somewhere), and then fires off a relevant 
alert if new
>>>>>>> topics appear?
>>>>>>>
>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>
>>>>>>>> I have an CLi app that continuously call one website using requests
>>>>>>>> and time.sleep. Now I need to do that same thing using Flask, is there a
>>>>>>>> good way to continuously call a site in order to check if there are new
>>>>>>>> content?
>>>>>>>>
>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>
>>>>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>>>>> needed using bs4.
>>>>>>>>
>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>> whenever a new topic comes up I want to get the URL (full site 
URL or just
>>>>>>>> the 'new' part of it).
>>>>>>>>
>>>>>>>> Is it possible?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>> --Bourbaki
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "Structures are the weapons of the mathematician."
>>>>> --Bourbaki
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>>
>
>
>
> --
> "Structures are the weapons of the mathematician."
> --Bourbaki
>

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 14:51
You neither want nor need to check continuously to see if a new topic
springs up in a forum. The closest you can get to that is to have no call
to time.sleep in your while loop, which would be a very bad idea, as it
could easily be interpreted as an attempted DoS attack.

I'm curious as to how you are envisioning a Flask version of this app. At
its core, it would still consist of a loop containing a delay and then the
actual scraping work (and figuring out which topics are new, etc).
On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com> wrote:

> Yes people, I knew about the sleep() and how it works, but I thought it
> would be kind of a kludge, indeed I'm using it on the 'kludge version' of
> my script that runs on terminal, but for the Flask version I wanted
> something more 'professional'. So, calling the site every 5 seconds isn't
> that good because I may get IP banned, how can I continuously check for new
> topics then? =/
>
> I'm coding it in a Windows env using PyCharm, but the final version will
> run on my NAS Server that runs Linux, but I do prefer to not use something
> OS dependent, doesn't matter if it is Win or Linux.
>
> On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> And t o add to what Scott mentioned earlier: you should really consider
>> running this in a Linux environment. Unless you're coding in C# or .Net,
>> Windows is the red-headed stepchild of coding environments.
>>
>> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>>> https://docs.python.org/2/library/time.html#time.sleep
>>>
>>> That said, you might want to really think about whether or not you need
>>> to know whether a new forum topic pops up every *five seconds*. Not only
>>> does that seem like overkill, but you might find your IP banne d by Steam.
>>>
>>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>>> juan0christian@gmail.com> wrote:
>>>
>>>> It's more like 5seconds, sleep still good for this?
>>>>
>>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com>
>>>> wrote:
>>>>
>>>>> Not sure why you’d run flask/python/anything on something that isn’t
>>>>> unix-based, but you *can* use celery to do pre tty much the same thing.
>>>>> I’ve used it to good effect with scrapers that needed built-in delays in
>>>>> the past.
>>>>>
>>>>>
>>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>> http://cele
>>>>> ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>>
>>>>> Of course, if you’re just running something every five minutes, there
>>>>> isn’t any reason to use anything more complex than the standard library’s
>>>>> time.sleep() function.
>>>>> < div>e.g.:
>>>>>
>>>>>
>>>>> from time import sleep
>>>>>
>>>>> while True:
>>>>>     scrape_things()
>>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>>
>>>>>
>>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
>>>>> wrote:
>>>>>
>>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>>
>>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>>
>>>>>> Of course, any other task scheduler would suffice.
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>>> juan0christian@gmail.com> wrote:
>>>>>>
>>>>>>> "Cron job", what do you mean by that?
>>>>>>>
>>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm
>>>>>>>> not sure that Flask is the right tool for the job, since it's a framework
>>>>>>>> for building a web server that *listens* for traffic.
>>>>>>>>
>>>>>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>>>>>> scrapes the page, checks against the last list of topics (which can be
>>>>>>>> stored in a database somewhere), and then fires off a relevant 
alert if new
>>>>>>>> topics appear?
>>>>>>>>
>>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I have an CLi app that continuously call one website using
>>>>>>>>> requests and time.sleep. Now I need to do that same thing using 
Flask, is
>>>>>>>>> there a good way to continuously call a site in order to check 
if there are
>>>>>>>>> new content?
>>>>>>>>>
>>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>
>>>>>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>>>>>> needed using bs4.
>>>>>>>>>
>>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>>> whenever a new topic comes up I want to get the URL (full site 
URL or just
>>>>>>>>> the 'new' part of it).
>>>>>>>>>
>>>>>>>>> Is it possible?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>> --Bourbaki
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> "Structures are the weapons of the mathematician."
>>>>>> --Bourbaki
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> "Structures are the weapons of the mathematician."
>>> --Bourbaki
>>>
>>
>>
>>
>> --
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>>
>
>

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 15:33
What I want:

1. Check if new topic
2. Found a new one
3. Do something with user and topic data (I have this done already)
4. Save on DB
5. Post a new 'topic panel' or anything similar (
http://getbootstrap.com/components/#panels) in the site running
Flask-Bootstrap with all the info I got from #3
6. Go to #1

The problem is that you said I may get IP banned if I do this all the time
(I do this way, aka 5 seconds delay, in my 'kludge terminal version' and
din't get banned yet), but I don't see any other way, I do need to check in
a 5 seconds or less delay, because topics there are post like crazy
sometimes and sometimes there is a 10 min delay between them. The scrap
part is OK, I'm using bs4.

Let's say I want the page on my Flask app to have the topic panels posted
without the need to reload it, what do I need to do?


On Thu, Oct 16, 2014 at 11:51 AM, Jack Maney <jackmaney@gmail.com> wrote:

> You neither want nor need to check continuously to see if a new topic
> springs up in a forum. The closest you can get to that is to have no call
> to time.sleep in your while loop, which would be a very bad idea, as it
> could easily be interpreted as an attempted DoS attack.
>
> I'm curious as to how you are envisioning a Flask version of this app. At
> its core, it would still consist of a loop containing a delay and then the
> actual scraping work (and figuring out which topics are new, etc).
> On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com>
> wrote:
>
>> Yes people, I knew about the sleep() and how it works, but I thought it
>> would be kind of a kludge, indeed I'm using it on the 'kludge version' of
>> my script that runs on terminal, but for the Flask version I wanted
>> something more 'professional'. So, calling the site every 5 seconds isn't
>> that good because I may get IP banned, how can I continuously check for new
>> topics then? =/
>>
>> I'm coding it in a Windows env using PyCharm, but the final version will
>> run on my NAS Server that runs Linux, but I do prefer to not use something
>> OS dependent, doesn't matter if it is Win or Linux.
>>
>> On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>> And t o add to what Scott mentioned earlier: you should really consider
>>> running this in a Linux environment. Unless you're coding in C# or .Net,
>>> Windows is the red-headed stepchild of coding environments.
>>>
>>> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com>
>>> wrote:
>>>
>>>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>>>> https://docs.python.org/2/library/time.html#time.sleep
>>>>
>>>> That said, you might want to really think about whether or not you need
>>>> to know whether a new forum topic pops up every *five seconds*. Not only
>>>> does that seem like overkill, but you might find your IP banne d by Steam.
>>>>
>>>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>>>> juan0christian@gmail.com> wrote:
>>>>
>>>>> It's more like 5seconds, sleep still good for this?
>>>>>
>>>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Not sure why you’d run flask/python/anything on something that isn’t
>>>>>> unix-based, but you *can* use celery to do pre tty much the same thing.
>>>>>> I’ve used it to good effect with scrapers that needed built-in delays in
>>>>>> the past.
>>>>>>
>>>>>>
>>>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>>> http://cele
>>>>>> ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>>>
>>>>>> Of course, if you’re just running something every five minutes, there
>>>>>> isn’t any reason to use anything more complex than the standard library’s
>>>>>> time.sleep() function.
>>>>>> < div>e.g.:
>>>>>>
>>>>>>
>>>>>> from time import sleep
>>>>>>
>>>>>> while True:
>>>>>>     scrape_things()
>>>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>>>
>>>>>>
>>>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>>>
>>>>>>> Of course, any other task scheduler would suffice.
>>>>>>>
>>>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>
>>>>>>>> "Cron job", what do you mean by that?
>>>>>>>>
>>>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm
>>>>>>>>> not sure that Flask is the right tool for the job, since it's a 
framework
>>>>>>>>> for building a web server that *listens* for traffic.
>>>>>>>>>
>>>>>>>>> Why not just set up a cron job that runs, say, every five minutes,
>>>>>>>>> scrapes the page, checks against the last list of topics (which can be
>>>>>>>>> stored in a database somewhere), and then fires off a relevant 
alert if new
>>>>>>>>> topics appear?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I have an CLi app that continuously call one website using
>>>>>>>>>> requests and time.sleep. Now I need to do that same thing using
Flask, is
>>>>>>>>>> there a good way to continuously call a site in order to check 
if there are
>>>>>>>>>> new content?
>>>>>>>>>>
>>>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>>
>>>>>>>>>> I need to continuously check if there are new topics, Steam sadly
>>>>>>>>>> doesn't provide any sort of API for the forums and I'm getting all data
>>>>>>>>>> needed using bs4.
>>>>>>>>>>
>>>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>>>> whenever a new topic comes up I want to get the URL (full site 
URL or just
>>>>>>>>>> the 'new' part of it).
>>>>>>>>>>
>>>>>>>>>> Is it possible?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>> --Bourbaki
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>> --Bourbaki
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "Structures are the weapons of the mathematician."
>>>> --Bourbaki
>>>>
>>>
>>>
>>>
>>> --
>>> "Structures are the weapons of the mathematician."
>>> --Bourbaki
>>>
>>
>>

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-16 @ 15:58
You can hit the forum page every five seconds or multiple times per second.
However, you risk having your IP address blocked. Especially if you hit the
page on a regular interval, your traffic will stick out like a sore thumb.

In order to update a page without reloading it, you either need some AJAX
(that polls your server every so often for updates) or you can fire up a
websocket connection from your front end to your server (which allows your
server to push out updates). In the latter case, I believe there is a Flask
extension for websocket connections.
On Oct 16, 2014 10:42 AM, "Juan Christian" <juan0christian@gmail.com> wrote:

> What I want:
>
> 1. Check if new topic
> 2. Found a new one
> 3. Do something with user and topic data (I have this done already)
> 4. Save on DB
> 5. Post a new 'topic panel' or anything similar (
> http://getbootstrap.com/components/#panels) in the site running
> Flask-Bootstrap with all the info I got from #3
> 6. Go to #1
>
> The problem is that you said I may get IP banned if I do this all the time
> (I do this way, aka 5 seconds delay, in my 'kludge terminal version' and
> din't get banned yet), but I don't see any other way, I do need to check in
> a 5 seconds or less delay, because topics there are post like crazy
> sometimes and sometimes there is a 10 min delay between them. The scrap
> part is OK, I'm using bs4.
>
> Let's say I want the page on my Flask app to have the topic panels pos ted
> without the need to reload it, what do I need to do?
>
>
> On Thu, Oct 16, 2014 at 11:51 AM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> You neither want nor need to check continuously to see if a new topic
>> springs up in a forum. The closest you can get to that is to have no call
>> to time.sleep in your while loop, which would be a very bad idea, as it
>> could easily be interpreted as an attempted DoS attack.
>>
>> I'm curious as to how you are envisioning a Flask version of this app. At
>> its core, it would still consist of a loop containing a delay and then the
>> actual scraping work (and figuring out which topics are new, etc).
>> On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com>
>> wrote:
>>
>>> Yes people, I knew about the sleep() and how it works, but I thought it
>>> would be kind of a kludge, indeed I'm using it on the 'kludge version' of
>>> my script that runs on terminal, but for the Flask version I wanted
>>> something more 'professional'. So, calling the site every 5 seconds isn't
>>> that good because I may get IP banned, how can I continuously check for new
>>> topics then? =/
>>>
>>> I'm coding it in a Windows env using PyCharm, but the final version will
>>> run on my NAS Server that runs Linux, but I do prefer to not use something
>>> OS de pendent, doesn't matter if it is Win or Linux.
>>>
>>> On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com>
>>> wrote:
>>>
>>>> And t o add to what Scott mentioned earlier: you should really consider
>>>> running this in a Linux environment. Unless you're coding in C# or .Net,
>>>> Windows is the red-headed stepchild of coding environments.
>>>>
>>>> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>>>>> https://docs.python.org/2/library/time.html#time.sleep
>>>>>
>>>>> That said, you might want to really think about whether or not you
>>>>> need to know whether a new forum to pic pops up every *five seconds*. Not
>>>>> only does that seem like overkill, but you might find your IP banne d by
>>>>> Steam.
>>>>>
>>>>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>>>>> juan0christian@gmail.com> wrote:
>>>>>
>>>>>> It's more like 5seconds, sleep still good for this?
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <
>>>>>> scott.lipsig@gmail.com> wrote:
>>>>>>
>>>>>>> Not sure why you’d run flask/python/anything on something that isn’t
>>>>>>> unix-based, but you *can* use celery to do pre tty much the same thing.
>>>>>>> I’ve used it to good effect with scrapers that needed built-in delays in
>>>>>>> the past.
>>>>>>>
>>>>>>>
>>>>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>>>> http://cele
>>>>>>> ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>>>>
>>>>>>> Of course, if you’re just running something every five minutes,
>>>>>>> there isn’t any reason to use anything more complex than the standard
>>>>>>> library’s time.sleep() function.
>>>>>>> < div>e.g.:
>>>>>>>
>>>>>>>
>>>>>>> from time import sleep
>>>>>>>
>>>>>>> while True:
>>>>>>>     scrape_things()
>>>>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>>>>
>>>>>>>
>>>>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <
>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>
>>>>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>>>>
>>>>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>>>>
>>>>>>>> Of course, any other task scheduler would suffice.
>>>>>>>>
>>>>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> "Cron job", what do you mean by that?
>>>>>>>>>
>>>>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm
>>>>>>>>>> not sure that Flask is the right tool for the job, since it's a
framework
>>>>>>>>>> for building a web server that *listens* for traffic.
>>>>>>>>>>
>>>>>>>>>> Why not just set up a cron job that runs, say, every five
>>>>>>>>>> minutes, scrapes the page, checks against the last list of 
topics (which
>>>>>>>>>> can be stored in a database somewhere), and then fires off a 
relevant alert
>>>>>>>>>> if new topics appear?
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have an CLi app that continuously call one website using
>>>>>>>>>>> requests and time.sleep. Now I need to do that same thing 
using Flask, is
>>>>>>>>>>> there a good way to continuously call a site in order to check
if there are
>>>>>>>>>>> new content?
>>>>>>>>>>>
>>>>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>>>
>>>>>>>>>>> I need to continuously check if there are new topics, Steam
>>>>>>>>>>> sadly doesn't provide any sort of API for the forums and I'm 
getting all
>>>>>>>>>>> data needed using bs4.
>>>>>>>>>>>
>>>>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>>>>> whenever a new topic comes up I want to get the URL (full site
URL or just
>>>>>>>>>>> the 'new' part of it).
>>>>>>>>>>>
>>>>>>>>>>> Is it possible?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>>> --Bourbaki
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>> --Bourbaki
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "Structures are the weapons of the mathematician."
>>>>> --Bourbaki
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "Structures are the weapons of the mathematician."
>>>> --Bourbaki
>>>>
>>>
>>>
>

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-17 @ 01:17
So, no way to safely get the forum topics without risking my IP?

On Thu, Oct 16, 2014 at 12:58 PM, Jack Maney <jackmaney@gmail.com> wrote:

> You can hit the forum page every five seconds or multiple times per
> second. However, you risk having your IP address blocked. Especially if you
> hit the page on a regular interval, your traffic will stick out like a sore
> thumb.
>
> In order to update a page without reloading it, you either need some AJAX
> (that polls your server every so often for updates) or you can fire up a
> websocket connection from your front end to your server (which allows your
> server to push out updates). In the latter case, I believe there is a Flask
> extension for websocket connections.
> On Oct 16, 2014 10:42 AM, "Juan Christian" <juan0christian@gmail.com>
> wrote:
>
>> What I want:
>>
>> 1. Check if new topic
>> 2. Found a new one
>> 3. Do something with user and topic data (I have this done already)
>> 4. Save on DB
>> 5. Post a new 'topic panel' or anything similar (
>> http://getbootstrap.com/components/#panels) in the site running
>> Flask-Bootstrap with all the info I got from #3
>> 6. Go to #1
>>
>> The problem is that you said I may get IP banned if I do this all the
>> time (I do this way, aka 5 seconds delay, in my 'kludge terminal version'
>> and din't get banned yet), but I don't see any other way, I do need to
>> check in a 5 seconds or less delay, because topics there are post like
>> crazy sometimes and sometimes there is a 10 min delay between them. The
>> scrap part is OK, I'm using bs4.
>>
>> Let's say I want the page on my Flask app to have the topic panels pos
>> ted without the need to reload it, what do I need to do?
>>
>>
>> On Thu, Oct 16, 2014 at 11:51 AM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>> You neither want nor need to check continuously to see if a new topic
>>> springs up in a forum. The closest you can get to that is to have no call
>>> to time.sleep in your while loop, which would be a very bad idea, as it
>>> could easily be interpreted as an attempted DoS attack.
>>>
>>> I'm curious as to how you are envisioning a Flask version of this app.
>>> At its core, it would still consist of a loop containing a delay and then
>>> the actual scraping work (and figuring out which topics are new, etc).
>>> On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com>
>>> wrote:
>>>
>>>> Yes people, I knew about the sleep() and how it works, but I thought it
>>>> would be kind of a kludge, indeed I'm using it on the 'kludge version' of
>>>> my script that runs on terminal, but for the Flask version I wanted
>>>> something more 'professional'. So, calling the site every 5 seconds isn't
>>>> that good because I may get IP banned, how can I continuously check for new
>>>> topics then? =/
>>>>
>>>> I'm coding it in a Windows env using PyCharm, but the final version
>>>> will run on my NAS Server that runs Linux, but I do prefer to not use
>>>> something OS de pendent, doesn't matter if it is Win or Linux.
>>>>
>>>> On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com>
>>>> wrote:
>>>>
>>>>> And t o add to what Scott mentioned earlier: you should really
>>>>> consider running this in a Linux environment. Unless you're coding in C# or
>>>>> .Net, Windows is the red-headed stepchild of coding environments.
>>>>>
>>>>> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>>>>>> https://docs.python.org/2/library/time.html#time.sleep
>>>>>>
>>>>>> That said, you might want to really think about whether or not you
>>>>>> need to know whether a new forum to pic pops up every *five seconds*. Not
>>>>>> only does that seem like overkill, but you might find your IP banne d by
>>>>>> Steam.
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>>>>>> juan0christian@gmail.com> wrote:
>>>>>>
>>>>>>> It's more like 5seconds, sleep still good for this?
>>>>>>>
>>>>>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <
>>>>>>> scott.lipsig@gmail.com> wrote:
>>>>>>>
>>>>>>>> Not sure why you’d run flask/python/anything on something that
>>>>>>>> isn’t unix-based, but you *can* use celery t o do pre tty much the same
>>>>>>>> thing. I’ve used it to good effect with scrapers that needed built-in
>>>>>>>> delays in the past.
>>>>>>>>
>>>>>>>>
>>>>>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>>>>> http://cele
>>>>>>>> ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>>>>> <http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>>>>>
>>>>>>>> Of course, if you’re just running something every five minutes,
>>>>>>>> there isn’t any reason to use anything more complex than the standard
>>>>>>>> library’s time.sleep() function.
>>>>>>>> < div>e.g.:
>>>>>>>>
>>>>>>>>
>>>>>>>> from time import sleep
>>>>>>>>
>>>>>>>> while True:
>>>>>>>>     scrape_things()
>>>>>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <
>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>
>>>>>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>>>>>
>>>>>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>>>>>
>>>>>>>>> Of course, any other task scheduler would suffice.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> "Cron job", what do you mean by that?
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm sure that such a thi ng can be done with Flask. However, I'm
>>>>>>>>>>> not sure that Flask is the right tool for the job, since it's 
a framework
>>>>>>>>>>> for building a web server that *listens* for traffic.
>>>>>>>>>>>
>>>>>>>>>>> Why not just set up a cron job that runs, say, every five
>>>>>>>>>>> minutes, scrapes the page, checks against the last list of 
topics (which
>>>>>>>>>>> can be stored in a database somewhere), and then fires off a 
relevant alert
>>>>>>>>>>> if new topics appear?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I have an CLi app that continuously call one website using
>>>>>>>>>>>> requests and time.sleep. Now I need to do that same thing 
using Flask, is
>>>>>>>>>>>> there a good way to continuously call a site in order to 
check if there are
>>>>>>>>>>>> new content?
>>>>>>>>>>>>
>>>>>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>>>>
>>>>>>>>>>>> I need to continuously check if there are new topics, Steam
>>>>>>>>>>>> sadly doesn't provide any sort of API for the forums and I'm 
getting all
>>>>>>>>>>>> data needed using bs4.
>>>>>>>>>>>>
>>>>>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>>>>>> whenever a new topic comes up I want to get the URL (full 
site URL or just
>>>>>>>>>>>> the 'new' part of it).
>>>>>>>>>>>>
>>>>>>>>>>>> Is it possible?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>>>> --Bourbaki
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>> --Bourbaki
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> "Structures are the weapons of the mathematician."
>>>>>> --Bourbaki
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> "Structures are the weapons of the mathematician."
>>>>> --Bourbaki
>>>>>
>>>>
>>>>
>>

Re: [flask] Continuous call to a thir-party site

From:
Jack Maney
Date:
2014-10-17 @ 02:54
No, that's not what we're saying.

Look. There is no such thing as "continuously making calls to a site". The
closest you can get to that is:

while True:
    scrape_site()

but it still takes time for the network traffic, parsing the returned HTML,
etc. So, if you stick to the while loop above (with no sleep delays),
you'll be hitting the forum several times per second.

It's a good bet that Valve does some monitoring of their incoming web
traffic to defend against DDOS attacks. So, you'll have to take a guess at
a frequency that won't trip their alarms. For test purposes, you might want
to set up a cloud instance of your scraper (eg AWS, Heroku, etc).

Good luck!

On Thu, Oct 16, 2014 at 8:17 PM, Juan Christian <juan0christian@gmail.com>
wrote:

> So, no way to safely get the forum topics without risking my IP?
>
> On Thu, Oct 16, 2014 at 12:58 PM, Jack Maney <jackmaney@gmail.com> wrote:
>
>> You can hit the forum page every five seconds or multiple times per
>> second. However, you risk having your IP address blocked. Especially if you
>> hit the page on a regular interval, your traffic will stick out like a sore
>> thumb.
>>
>> In order to update a page without reloading it, you either need some AJAX
>> (that polls your server every so often for updates) or you can fire up a
>> websocket connection from your front end to your server (which allows your
>> server to push out updates). In the latter case, I believe there is a Flask
>> extension for websocket connections.
>> On Oct 16, 2014 10:42 AM, "Juan Christian" <juan0christian@gmail.com>
>> wrote:
>>
>>> What I want:
>>>
>>> 1. Check if new topic
>>> 2. Found a new one
>>> 3. Do something with user and topic data (I have this done already)
>>> 4. Save on DB
>>> 5. Post a new 'topic panel' or anything similar (
>>> http://getbootstrap.com/components/#panels) in the site running
>>> Flask-Bootstrap with all the info I got from #3
>>> 6. Go to #1
>>>
>>> The problem is that you said I may get IP banned if I do this all the
>>> time (I do this way, aka 5 seconds delay, in my 'kludge terminal version' a
>>> nd din't get banned yet), but I don't see any other way, I do need to check
>>> in a 5 seconds or less delay, because topics there are post like crazy
>>> sometimes and sometimes there is a 10 min delay between them. The scrap
>>> part is OK, I'm using bs4.
>>>
>>> Let's say I want the page on my Flask app to have the topic panels pos
>>> ted without the need to reload it, what do I need to do?
>>>
>>>
>>> On Thu, Oct 16, 2014 at 11:51 AM, Jack Maney <jackmaney@gmail.com>
>>> wrote:
>>>
>>>> You neither want nor need to check continuously to see if a new topic
>>>> springs up in a forum. The closest you can get to that is to have no call
>>>> to time.sleep in your while loop, which would be a very bad idea, as it
>>>> could easily be interpreted as an attempted DoS attack.
>>>>
>>>> I'm curious as to how you are envisioning a Flask version of this app.
>>>> At its core, it would still consist of a loop containing a delay and then
>>>> the actual scraping work (and figuring out which topics are new, etc).
>>>> On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes people, I knew about the sleep() and how it works, but I thought
>>>>> it would be kind of a kludge, indeed I'm using it on the 'kludge version'
>>>>> of my script that runs on terminal, but for the Flask version I wanted
>>>>> something more 'professional'. So, calling the site every 5 seconds isn't
>>>>> that good because I may get IP banned, how can I continuously check for new
>>>>> topics then? =/
>>>>>
>>>>> I'm coding it in a Windows env using PyCharm, but the final version
>>>>> will run on my NAS Server that runs Linux, but I do prefer to not use
>>>>> something OS de pendent, doesn't matter if it is Win or Linux.
>>>>>
>>>>> On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> And t o add to what Scott mentioned earlier: you should really
>>>>>> consider running this in a Linux environment. Unless you're coding in C# or
>>>>>> .Net, Windows is the red-headed stepchild of coding environments.
>>>>>>
>>>>>> On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, the argument to time.sleep() is the number of seconds to sleep:
>>>>>>> https://docs.python.org/2/library/time.html#time.sleep
>>>>>>>
>>>>>>> That said, you might want to really think about whether or not you
>>>>>>> ne ed to know whether a new forum to pic pops up every *five seconds*. Not
>>>>>>> only does that seem like overkill, but you might find your IP banne d by
>>>>>>> Steam.
>>>>>>>
>>>>>>> On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian <
>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>
>>>>>>>> It's more like 5seconds, sleep still good for this?
>>>>>>>>
>>>>>>>> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <
>>>>>>>> scott.lipsig@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Not sure why you’d run flask/python/anythin g on something that
>>>>>>>>> isn’t unix-based, but you *can* use celery t o do pre tty much the same
>>>>>>>>> thing. I’ve used it to good effect with scrapers that needed built-in
>>>>>>>>> delays in the past.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>>>>>> http://cele
>>>>>>>>> ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>>>>>> 
<http://celery.readthedocs.org/en/latest/reference/celery.schedules.html>
>>>>>>>>>
>>>>>>>>> Of course, if you’re just running something every five minutes,
>>>>>>>>> there isn’t any reason to use anything more complex than the standard
>>>>>>>>> library’s time.sleep() function.
>>>>>>>>> < div>e.g.:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> from time import sleep
>>>>>>>>>
>>>>>>>>> while True:
>>>>>>>>>     scrape_things()
>>>>>>>>>     time.sleep(300) # 300 seconds = 5 minutes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 15, 2014, at 5:53 PM, Juan Christian <
>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> I need something not OS related, it seems that cron is UNIX only.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>>>>>> <http://en.wikipedia.org/wiki/Cron>
>>>>>>>>>>
>>>>>>>>>> Of course, any other task scheduler would suffice.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian <
>>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> "Cron job", what do you mean by that?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm sure that such a thi ng can be done with Flask. However,
>>>>>>>>>>>> I'm not sure that Flask is the right tool for the job, since it's a
>>>>>>>>>>>> framework for building a web server that *listens* for traffic.
>>>>>>>>>>>>
>>>>>>>>>>>> Why not just set up a cron job that runs, say, every five
>>>>>>>>>>>> minutes, scrapes the page, checks against the last list of 
topics (which
>>>>>>>>>>>> can be stored in a database somewhere), and then fires off a 
relevant alert
>>>>>>>>>>>> if new topics appear?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian <
>>>>>>>>>>>> juan0christian@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have an CLi app that continuously call one website using
>>>>>>>>>>>>> requests and time.sleep. Now I need to do that same thing 
using Flask, is
>>>>>>>>>>>>> there a good way to continuously call a site in order to 
check if there are
>>>>>>>>>>>>> new content?
>>>>>>>>>>>>>
>>>>>>>>>>>>> My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>>>>>
>>>>>>>>>>>>> I need to continuously check if there are new topics, Steam
>>>>>>>>>>>>> sadly doesn't provide any sort of API for the forums and I'm
getting all
>>>>>>>>>>>>> data needed using bs4.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My objective: Code defined 'delay time' to check the site and
>>>>>>>>>>>>> whenever a new topic comes up I want to get the URL (full 
site URL or just
>>>>>>>>>>>>> the 'new' part of it).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it possible?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>>>>> --Bourbaki
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>>>>> --Bourbaki
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> "Structures are the weapons of the mathematician."
>>>>>>> --Bourbaki
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> "Structures are the weapons of the mathematician."
>>>>>> --Bourbaki
>>>>>>
>>>>>
>>>>>
>>>
>


-- 
"Structures are the weapons of the mathematician."
--Bourbaki

Re: [flask] Continuous call to a thir-party site

From:
Bruce Adams
Date:
2014-10-16 @ 15:10
The ideal solution would be for the forum you are scraping to have some 
kind of publish/subscribe API (see for example WAMP - Web Application 
Messaging Protocol
  
             
WAMP - Web Application Messaging Protocol
WAMP is an open WebSocket subprotocol that provides two application 
messaging patterns in one unified protocol: Remote Procedure Calls and 
Publish/Subscribe.  
View on wamp.ws Preview by Yahoo  
  
 ) but if its not under your control I guess that is not possible. Its 
probably still worth asking if you have a good use case
(e.g. you want to know about any posts on Steam about the game you've 
written). The site owners won't want you loading them with too many 
unnecessary requests either and might be reasonable about helping their 
platform evolve.

It is also worth remember that if the page hasn't changed you should 
receive a 304 response (see List of HTTP status codes - Wikipedia, the 
free encyclopedia) so you wont have to rescrape the page and they won't 
have to send it. 
Of course the page will change when there are new posts rather than just 
new topics.


  
             
List of HTTP status codes - Wikipedia, the free encyclop...
The following is a list of Hypertext Transfer Protocol (HTTP) response 
status codes. This includes codes from IETF internet standards as well as 
other IETF RFCs, ot...  
View on en.wikipedia.org Preview by Yahoo  
  
 



>________________________________
> From: Jack Maney <jackmaney@gmail.com>
>To: flask@librelist.com 
>Sent: Thursday, October 16, 2014 3:51 PM
>Subject: Re: [flask] Continuous call to a thir-party site
> 
>
>
>You neither want nor need to check continuously to see if a new topic 
springs up in a forum. The closest you can get to that is to have no call 
to time.sleep in your while loop, which would be a very bad idea, as it 
could easily be interpreted as an attempted DoS attack.
>I'm curious as to how you are envisioning a Flask version of this app. At
its core, it would still consist of a loop containing a delay and then the
actual scraping work (and figuring out which topics are new, etc).
>On Oct 16, 2014 9:37 AM, "Juan Christian" <juan0christian@gmail.com> wrote:
>
>Yes people, I knew about the sleep() and how it works, but I thought it 
would be kind of a kludge, indeed I'm using it on the 'kludge version' of 
my script that runs on terminal, but for the Flask version I wanted 
something more 'professional'. So, calling the site every 5 seconds isn't 
that good because I may get IP banned, how can I continuously check for 
new topics then? =/
>>
>>
>>I'm coding it in a Windows env using PyCharm, but the final version will
run on my NAS Server that runs Linux, but I do prefer to not use something
OS dependent, doesn't matter if it is Win or Linux.
>>
>>
>>On Thu, Oct 16, 2014 at 12:37 AM, Jack Maney <jackmaney@gmail.com> wrote:
>>
>>And t o add to what Scott mentioned earlier: you should really consider 
running this in a Linux environment. Unless you're coding in C# or .Net, 
Windows is the red-headed stepchild of coding environments.
>>>
>>>
>>>On Wed, Oct 15, 2014 at 10:35 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>>
>>>Yes, the argument to time.sleep() is the number of seconds to sleep: 
https://docs.python.org/2/library/time.html#time.sleep
>>>>
>>>>
>>>>That said, you might want to really think about whether or not you 
need to know whether a new forum topic pops up every *five seconds*. Not 
only does that seem like overkill, but you might find your IP banne d by 
Steam.
>>>>
>>>>
>>>>On Wed, Oct 15, 2014 at 10:13 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>>>>
>>>>It's more like 5seconds, sleep still good for this?
>>>>>
>>>>>
>>>>>On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig 
<scott.lipsig@gmail.com> wrote:
>>>>>
>>>>>Not sure why you’d run flask/python/anything on something that isn’t 
unix-based, but you *can* use celery to do pre tty much the same thing. 
I’ve used it to good effect with scrapers that needed built-in delays in 
the past.
>>>>>>
>>>>>>

>>>>>>http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
>>>>>>http://cele ry.readthedocs.org/en/latest/reference/celery.schedules.html
>>>>>>
>>>>>>
>>>>>>Of course, if you’re just running something every five minutes, 
there isn’t any reason to use anything more complex than the standard 
library’s time.sleep() function. < div>e.g.:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>from time import sleep
>>>>>>
>>>>>>
>>>>>>while True:
>>>>>>    scrape_things()
>>>>>>    time.sleep(300) # 300 seconds = 5 minutes
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Oct 15, 2014, at 5:53 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>>>>>>
>>>>>>I need something not OS related, it seems that cron is UNIX only.
>>>>>>>
>>>>>>>
>>>>>>>On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>>>>>>
>>>>>>>A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>>>>>>>>
>>>>>>>>
>>>>>>>>Of course, any other task scheduler would suffice.
>>>>>>>>
>>>>>>>>
>>>>>>>>On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>>>>>>>>
>>>>>>>>"Cron job", what do you mean by that?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>I'm sure that such a thi ng can be done with Flask. However, I'm 
not sure that Flask is the right tool for the job, since it's a framework 
for building a web server that *listens* for traffic.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Why not just set up a cron job that runs, say, every five 
minutes, scrapes the page, checks against the last list of topics (which 
can be stored in a database somewhere), and then fires off a relevant 
alert if new topics appear?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>I have an CLi app that continuously call one website using 
requests and time.sleep. Now I need to do that same thing using Flask, is 
there a good way to continuously call a  site in order to check if there 
are new content?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>My target: http://steamcommunity.com/app/440/tradingforum/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>I need to continuously check if there are new topics, Steam 
sadly doesn't provide any sort of API for the forums and I'm getting all 
data needed using bs4.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>My objective: Code defined 'delay time' to check the site and 
whenever a new topic comes up I want to get the URL (full site URL or just
the 'new' part of it).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Is it possible?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>-- 
>>>>>>>>>>"Structures are the weapons of the mathematician."
>>>>>>>>>>--Bourbaki 
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>-- 
>>>>>>>>"Structures are the weapons of the mathematician."
>>>>>>>>--Bourbaki 
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>"Structures are the weapons of the mathematician."
>>>>--Bourbaki 
>>>
>>>
>>>
>>>
>>>-- 
>>>"Structures are the weapons of the mathematician."
>>>--Bourbaki 
>>
>
>

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-16 @ 15:33
Hi,

Juan Christian:
> Yes people, I knew about the sleep() and how it works, but I thought it
> would be kind of a kludge, indeed I'm using it on the 'kludge version' of
> my script that runs on terminal, but for the Flask version I wanted
> something more 'professional'.

So create a (gevent or native) thread within your server script, and do the
sleep+fetch loop there.

But "continuous update" is antisocial and may get you banned. If the server
has a RSS feed, you shouldn't check more often than the feed specifies. If
not, your first task is to ask the server admin for an RSS feed :-P

If (you think) you need to do web scraping, as a server admin I'd consider
an update interval of ten minutes acceptable. Certainly not five seconds.

-- 
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-16 @ 15:43
Steam Discussions Forum doesn't provide RSS Feed nor any good way to fetch
the forum! And they don't care about it.

The target is: http://steamcommunity.com/app/440/tradingforum/

In 10 minutes I would lose tons of possible trades, that's why I'm using 5
seconds. Sometimes different people post like 15-20 topics in the same
seconds, and I can't be 10 minutes late getting these topics, because when
I get there everything would be gone.

I already have the terminal version working and indeed it's doing very
well, I'm getting tons of good trades just because I was fast enough with
the right data about the user, his reputation, and his inventory. I'm doing
the work of 5 minutes in 5 seconds.

On Thu, Oct 16, 2014 at 12:33 PM, Matthias Urlichs <matthias@urlichs.de>
wrote:

> Hi,
>
> Juan Christian:
> > Yes people, I knew about the sleep() and how it works, but I thought it
> > would be kind of a kludge, indeed I'm using it on the 'kludge version' of
> > my script that runs on terminal, but for the Flask version I wanted
> > something more 'professional'.
>
> So create a (gevent or native) thread within your server script, and do the
> sleep+fetch loop there.
>
> But "continuous update" is antisocial and may get you banned. If the server
> has a RSS feed, you shouldn't check more often than the feed specifies. If
> not, your first task is to ask the server admin for an RSS feed :-P
>
> If (you think) you need to do web scraping, as a server admin I'd consider
> an update interval of ten minutes acceptable. Certainly not five seconds.
>
> --
> -- Matthias Urlichs
>

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-17 @ 07:16
Hi,

Juan Christian:
> Steam Discussions Forum doesn't provide RSS Feed nor any good way to fetch
> the forum! And they don't care about it.
> 
Bah.

> The target is: http://steamcommunity.com/app/440/tradingforum/
> 
> In 10 minutes I would lose tons of possible trades, that's why I'm using 5
> seconds.

Ah, OK, if the data is _that_ volatile then that's a different matter;
I expect you're not the only one who does something like that.

So. What you probably want to do is to use WAMP or flask-socketio between
your server and your web clients. The clients need to open a channel to
your server and display the data they get from that channel.
The server runs the scraper in a separate thread and sends whatever it gets
to the clients that are connected via SocketIO (or WAMP).

SocketIO has the distinct advantage that it works with web clients which
don't support the WebSocket protocol. That includes most Android devices.

I would send JSON to the clients and let Javascript display the data,
using a template you load from the server, i.e. you render the template  on
the client, using Mustache (standalone) or Angular (if you use that
framework anyway).

Alternately, of course, you can also send a rendered HTML snippet via a
Flask template.

You will need to write a few lines of Javascript for this; if that's too
annoying (it is to me …) there's a couple of ways to write Javascript with
more-or-less-Pythonic syntax.

Of course if Steam ever notices and blocks you, you're on your own.
You might want to host the scraper somewhere else and feed the data to your
server with HTTP/POST requests containing JSON bits. Or even another
SocketIO stream. Or RPyC.

--
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-17 @ 13:30
I'll be using Tor for that.

On Fri, Oct 17, 2014 at 4:16 AM, Matthias Urlichs <matthias@urlichs.de>
wrote:

> Hi,
>
> Juan Christian:
> > Steam Discussions Forum doesn't provide RSS Feed nor any good way to
> fetch
> > the forum! And they don't care about it.
> >
> Bah.
>
> > The target is: http://steamcommunity.com/app/440/tradingforum/
> >
> > In 10 minutes I would lose tons of possible trades, that's why I'm using
> 5
> > seconds.
>
> Ah, OK, if the data is _that_ volatile then that's a different matter;
> I expect you're not the only one who does something like that.
>
> So. What you probably want to do is to use WAMP or flask-socketio between
> your server and your web clients. The clients need to open a channel to
> your server and display the data they get from that channel.
> The server runs the scraper in a separate thread and sends whatever it gets
> to the clients that are connected via SocketIO (or WAMP).
>
> SocketIO has the distinct advantage that it works with web clients which
> don't support the WebSocket protocol. That includes most Android devices.
>
> I would send JSON to the clients and let Javascript display the data,
> using a template you load from the server, i.e. you render the template  on
> the client, using Mustache (standalone) or Angular (if you use that
> framework anyway).
>
> Alternately, of course, you can also send a rendered HTML snippet via a
> Flask template.
>
> You will need to write a few lines of Javascript for this; if that's too
> annoying (it is to me …) there's a couple of ways to write Javascript with
> more-or-less-Pythonic syntax.
>
> Of course if Steam ever notices and blocks you, you're on your own.
> You might want to host the scraper somewhere else and feed the data to your
> server with HTTP/POST requests containing JSON bits. Or even another
> SocketIO stream. Or RPyC.
>
> --
> -- Matthias Urlichs
>

Re: [flask] Continuous call to a thir-party site

From:
Eric B
Date:
2014-10-17 @ 13:39
I don't want to step on flasks toes here but this really sounds like a job
for a non blocking async framework like tornado. It has a gen.Task method
that can be infinitely looped in a non blocking manner and doesn't use the
Sleep command. It has has async requests that make pulling API or scraping
data easier on the server.

Http://tornadoweb.org
On Oct 17, 2014 9:34 AM, "Juan Christian" <juan0christian@gmail.com> wrote:

> I'll be using Tor for that.
>
> On Fri, Oct 17, 2014 at 4:16 AM, Matthias Urlichs <matthias@urlichs.de>
> wrote:
>
>> Hi,
>>
>> Juan Christian:
>> > Steam Discussions Forum doesn't provide RSS Feed nor any good way to
>> fetch
>> > the forum! And they don't care about it.
>> >
>> Bah.
>>
>> > The target is: http://steamcommunity.com/app/440/tradingforum/
>> >
>> > In 10 minutes I would lose tons of possible trades, that's why I'm
>> using 5
>> > seconds.
>>
>> Ah, OK, if the data is _that_ volatile then that's a different matter;
>> I expect you're not the only one who does something like that.
>>
>> So. What you probably want to do is to use WAMP or flask-socketio between
>> your server and your web clients. The clients need to open a channel to
>> your server and display the data they get from that channel.
>> The server runs the scraper in a separate thread and sends whatever it
>> gets
>> to the clients that are connected via SocketIO (or WAMP).
>>
>> SocketIO has the distinct advantage that it works with web clients which
>> don't support the WebSocket protocol. That includes most Android devices.
>>
>> I would send JSON to the clients and let Javascript display the data,
>> using a template you load from the server, i.e. you render the template
>> on
>> the client, using Mustache (standalone) or Angular (if you use that
>> framework anyway).
>>
>> Alternately, of course, you can also send a rendered HTML snippet via a
>> Flask template.
>>
>> You will need to write a few lines of Javascript for this; if that's too
>> annoying (it is to me …) there's a couple of ways to write Javascript with
>> more-or-less-Pythonic syntax.
>>
>> Of course if Steam ever notices and blocks you, you're on your own.
>> You might want to host the scraper somewhere else and feed the data to
>> your
>> server with HTTP/POST requests containing JSON bits. Or even another
>> SocketIO stream. Or RPyC.
>>
>> --
>> -- Matthias Urlichs
>>
>
>

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-18 @ 17:53
Hi,

Eric B:
> I don't want to step on flasks toes here but this really sounds like a job
> for a non blocking async framework like tornado.

… or gevent. Which is a lot easier to work with if you already have code
you want to use.

-- 
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Yaroslav Kyrpych
Date:
2014-10-18 @ 19:11
JS is naturally better-designed language for asynch jobs than Python, and 
node.js would be tool of choice.

> On Oct 17, 2014, at 9:30 AM, Juan Christian <juan0christian@gmail.com> wrote:
> 
> I'll be using Tor for that.
> 
>> On Fri, Oct 17, 2014 at 4:16 AM, Matthias Urlichs <matthias@urlichs.de> wrote:
>> Hi,
>> 
>> Juan Christian:
>> > Steam Discussions Forum doesn't provide RSS Feed nor any good way to fetch
>> > the forum! And they don't care about it.
>> >
>> Bah.
>> 
>> > The target is: http://steamcommunity.com/app/440/tradingforum/
>> >
>> > In 10 minutes I would lose tons of possible trades, that's why I'm using 5
>> > seconds.
>> 
>> Ah, OK, if the data is _that_ volatile then that's a different matter;
>> I expect you're not the only one who does something like that.
>> 
>> So. What you probably want to do is to use WAMP or flask-socketio between
>> your server and your web clients. The clients need to open a channel to
>> your server and display the data they get from that channel.
>> The server runs the scraper in a separate thread and sends whatever it gets
>> to the clients that are connected via SocketIO (or WAMP).
>> 
>> SocketIO has the distinct advantage that it works with web clients which
>> don't support the WebSocket protocol. That includes most Android devices.
>> 
>> I would send JSON to the clients and let Javascript display the data,
>> using a template you load from the server, i.e. you render the template  on
>> the client, using Mustache (standalone) or Angular (if you use that
>> framework anyway).
>> 
>> Alternately, of course, you can also send a rendered HTML snippet via a
>> Flask template.
>> 
>> You will need to write a few lines of Javascript for this; if that's too
>> annoying (it is to me …) there's a couple of ways to write Javascript with
>> more-or-less-Pythonic syntax.
>> 
>> Of course if Steam ever notices and blocks you, you're on your own.
>> You might want to host the scraper somewhere else and feed the data to your
>> server with HTTP/POST requests containing JSON bits. Or even another
>> SocketIO stream. Or RPyC.
>> 
>> --
>> -- Matthias Urlichs
> 

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-19 @ 08:06
Hi,

Yaroslav Kyrpych:
> JS is naturally better-designed language for asynch jobs than Python, 
and node.js would be tool of choice.

Well, yes, if you want to write your code as a heap of callbacks.
Which personally I do not -- debugging errors in such code is no fun
because there's no sensible stack trace.

Call me opinionated, but I'd much rather write my server-side code with
gevent. Or as separate processes.

-- 
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Inada Naoki
Date:
2014-10-19 @ 10:37
ES6 introduce "yield" that Python has from 2.5.
node.js may catch up Tornado.

But Python has "yield from" already.
I feel Python is better than node.js because of it.

On Sun, Oct 19, 2014 at 5:06 PM, Matthias Urlichs <matthias@urlichs.de> wrote:
> Hi,
>
> Yaroslav Kyrpych:
>> JS is naturally better-designed language for asynch jobs than Python, 
and node.js would be tool of choice.
>
> Well, yes, if you want to write your code as a heap of callbacks.
> Which personally I do not -- debugging errors in such code is no fun
> because there's no sensible stack trace.
>
> Call me opinionated, but I'd much rather write my server-side code with
> gevent. Or as separate processes.
>
> --
> -- Matthias Urlichs



-- 
INADA Naoki  <songofacandy@gmail.com>

Re: [flask] Continuous call to a thir-party site

From:
Eric B
Date:
2014-10-19 @ 14:38
I agree, having using both nodejs and tornado, the asynchronous nature of
asyncio/tornado coupled with the seemingly synchronous nature of the code
makes async in python much easier to read, write, and maintain.

On Sun, Oct 19, 2014 at 6:37 AM, INADA Naoki <songofacandy@gmail.com> wrote:

> ES6 introduce "yield" that Python has from 2.5.
> node.js may catch up Tornado.
>
> But Python has "yield from" already.
> I feel Python is better than node.js because of it.
>
> On Sun, Oct 19, 2014 at 5:06 PM, Matthias Urlichs <matthias@urlichs.de>
> wrote:
> > Hi,
> >
> > Yaroslav Kyrpych:
> >> JS is naturally better-designed language for asynch jobs than Python,
> and node.js would be tool of choice.
> >
> > Well, yes, if you want to write your code as a heap of callbacks.
> > Which personally I do not -- debugging errors in such code is no fun
> > because there's no sensible stack trace.
> >
> > Call me opinionated, but I'd much rather write my server-side code with
> > gevent. Or as separate processes.
> >
> > --
> > -- Matthias Urlichs
>
>
>
> --
> INADA Naoki  <songofacandy@gmail.com>
>

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-23 @ 18:49
So guys, let's say I'll use a while True and time.sleep to do the work just
to get it working for now and later I do a better approach.

@app.route('/')
def index():
while True:
response = requests.get(FORUM_URL)
soup = bs4.BeautifulSoup(response.text)
topic_id =
str(soup.select('a.forum_topic_overlay')[2].attrs.get('href')).split('/')[-2]
topic_url = FORUM_URL + topic_id

response = requests.get(topic_url)
soup = bs4.BeautifulSoup(response.text)

user_url = soup.select('div.authorline a')[0].attrs.get('href').strip()
title = soup.select('div.topic')[0].get_text().strip()
content = soup.select('div.content')[1].get_text().strip()

user = vanity_url(user_url)
db.session.add(Topic(topic_id, '11/11/11', 'tradeoffer.com', 'johny',
title, content, topic_url))
db.session.commit()
print(Topic.query.all())
return render_template('index.html')
time.sleep(5)

This code works (I'll add al the data later, as of now I only have the id,
title, message and url). How can I get this while True running and have my
page rendered? If I put the render_template before everything the while
isn't executed, if I put the render_template after the while True it will
never be reached, how can I get this working?

On Sun, Oct 19, 2014 at 12:38 PM, Eric B <neurosnap@gmail.com> wrote:

> I agree, having using both nodejs and tornado, the asynchronous nature of
> asyncio/tornado coupled with the seemingly synchronous nature of the code
> makes async in python much easier to read, write, and maintain.
>
> On Sun, Oct 19, 2014 at 6:37 AM, INADA Naoki <songofacandy@gmail.com>
> wrote:
>
>> ES6 introduce "yield" that Python has from 2.5.
>> node.js may catch up Tornado.
>>
>> But Python has "yield from" already.
>> I feel Python is better than node.js because of it.
>>
>> On Sun, Oct 19, 2014 at 5:06 PM, Matthias Urlichs <matthias@urlichs.de>
>> wrote:
>> > Hi,
>> >
>> > Yaroslav Kyrpych:
>> >> JS is naturally better-designed language for asynch jobs than Python,
>> and node.js would be tool of choice.
>> >
>> > Well, yes, if you want to write your code as a heap of callbacks.
>> > Which personally I do not -- debugging errors in such code is no fun
>> > because there's no sensible stack trace.
>> >
>> > Call me opinionated, but I'd much rather write my server-side code with
>> > gevent. Or as separate processes.
>> >
>> > --
>> > -- Matthias Urlichs
>>
>>
>>
>> --
>> INADA Naoki  <songofacandy@gmail.com>
>>
>
>

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-24 @ 08:12
Hi,

Juan Christian:
> So guys, let's say I'll use a while True and time.sleep to do the work just
> to get it working for now and later I do a better approach.
> 
I'm sorry to sound harsh, but …

Please start thinking, and/or reading what we wrote before.


Your task to poll the Steam forum (which you have ONE instance of), and
talking to your clients (ZERO to MANY) are totally independent of each
other. They share some data, but that's it.

So why do you insist on burying both in the same procedure?

> return render_template('index.html')
> time.sleep(5)
> 
You should by now know that a procedure _returns_ when you call "return",
so anything that you want to happen afterwards (such as a sleep, which you
also forgot to put inside the loop, and in fact the next iteration of the
loop) simply will not happen.

Also, this code's job is to render a complete page.
You do not want to replace the whole page (which would in theory be
possible by replacing the "return" with "yield" and adding some Flask
streaming magic, but in Real Life web browsers do not work that way); you
want to update just a tiny part of the page.

> How can I get this while True running and have my page rendered?

You cannot. What this code tries to do is impossible.
Also, you don't want to: having your forms cleared every five seconds is
not a nice user experience. What if the connection is slow? You'd push the
new page, i.e. clear the screen, as soon as the client sees your page.
Also not a nice user experience.

You need the client to go and fetch updates for its data. This means 
polling a REST-ish service and getting some JSON or HTML back, or opening
a SocketIO stream and having the server send updates to you.

Personally I like the SocketIO way much better (less server and network
load and, ultimately, less coding).

Both ways require that you'll write some Javascript.

You might want to check out AngularJS or a similar client-side framework
which makes the job of updating the data on the client's page, once you get
it there, much easier to code (AngularJS does it all for you).

So go get your data retrieval loop sorted with gevent, get your client to
do AngularJS and/or JQuery, plus SocketIO (there are many tutorials out
there; for the AngularJS basics I like https://docs.angularjs.org/tutorial
because it comes with a comprehensive test suite; googling for "angularjs
socketio" gets a lot of hits I haven't investigated more closely yet), or
use something else entirely.

Come back when you have the basics down.

-- 
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Juan Christian
Date:
2014-10-24 @ 13:59
I'll stick with CLI only for now then, it's to much work just to have a
'cool' visual for my tool. My main goal with Flask was to use
Flask-Bootstrap to have a visual for my tool (indeed, it's very easy and
useful). I'd be the only 'client' here, even if I got it working with Flask
I'd put it in my local network and not share it away.

Maybe I'll go with Tk or Qt to build a GUI, anyway, thanks guys!

On Fri, Oct 24, 2014 at 6:12 AM, Matthias Urlichs <matthias@urlichs.de>
wrote:

> Hi,
>
> Juan Christian:
> > So guys, let's say I'll use a while True and time.sleep to do the work
> just
> > to get it working for now and later I do a better approach.
> >
> I'm sorry to sound harsh, but …
>
> Please start thinking, and/or reading what we wrote before.
>
>
> Your task to poll the Steam forum (which you have ONE instance of), and
> talking to your clients (ZERO to MANY) are totally independent of each
> other. They share some data, but that's it.
>
> So why do you insist on burying both in the same procedure?
>
> > return render_template('index.html')
> > time.sleep(5)
> >
> You should by now know that a procedure _returns_ when you call "return",
> so anything that you want to happen afterwards (such as a sleep, which you
> also forgot to put inside the loop, and in fact the next iteration of the
> loop) simply will not happen.
>
> Also, this code's job is to render a complete page.
> You do not want to replace the whole page (which would in theory be
> possible by replacing the "return" with "yield" and adding some Flask
> streaming magic, but in Real Life web browsers do not work that way); you
> want to update just a tiny part of the page.
>
> > How can I get this while True running and have my page rendered?
>
> You cannot. What this code tries to do is impossible.
> Also, you don't want to: having your forms cleared every five seconds is
> not a nice user experience. What if the connection is slow? You'd push the
> new page, i.e. clear the screen, as soon as the client sees your page.
> Also not a nice user experience.
>
> You need the client to go and fetch updates for its data. This means
> polling a REST-ish service and getting some JSON or HTML back, or opening
> a SocketIO stream and having the server send updates to you.
>
> Personally I like the SocketIO way much better (less server and network
> load and, ultimately, less coding).
>
> Both ways require that you'll write some Javascript.
>
> You might want to check out AngularJS or a similar client-side framework
> which makes the job of updating the data on the client's page, once you get
> it there, much easier to code (AngularJS does it all for you).
>
> So go get your data retrieval loop sorted with gevent, get your client to
> do AngularJS and/or JQuery, plus SocketIO (there are many tutorials out
> there; for the AngularJS basics I like https://docs.angularjs.org/tutorial
> because it comes with a comprehensive test suite; googling for "angularjs
> socketio" gets a lot of hits I haven't investigated more closely yet), or
> use something else entirely.
>
> Come back when you have the basics down.
>
> --
> -- Matthias Urlichs
>

Re: [flask] Continuous call to a thir-party site

From:
Daniel Neuhäuser
Date:
2014-10-23 @ 19:34
> So guys, let's say I'll use a while True and time.sleep to do the work 
just to get it working for now and later I do a better approach.

Your making a big assumption here, that this approach can work. It can't.

> 
> @app.route('/')
> def index():
> 	while True:
> 		response = requests.get(FORUM_URL)
> 		soup = bs4.BeautifulSoup(response.text)
> 		topic_id = 
str(soup.select('a.forum_topic_overlay')[2].attrs.get('href')).split('/')[-2]
> 		topic_url = FORUM_URL + topic_id
> 
> 		response = requests.get(topic_url)
> 		soup = bs4.BeautifulSoup(response.text)
> 
> 		user_url = soup.select('div.authorline a')[0].attrs.get('href').strip()
> 		title = soup.select('div.topic')[0].get_text().strip()
> 		content = soup.select('div.content')[1].get_text().strip()
> 
> 		user = vanity_url(user_url)
> 		db.session.add(Topic(topic_id, '11/11/11', 'tradeoffer.com', 'johny', 
title, content, topic_url))
> 		db.session.commit()
> 		print(Topic.query.all())
> 		return render_template('index.html')
> 
>  	time.sleep(5)
> 
> This code works (I'll add al the data later, as of now I only have the 
id, title, message and url).

No, it obviously doesn't...

> How can I get this while True running and have my page rendered? If I 
put the render_template before everything the while isn't executed, if I 
put the render_template after the while True it will never be reached, how
can I get this working?

...as you also seem to be realizing but somehow still feel pressured to 
deny. Let me help you relieve that pressure: HTTP modulo cookies has no 
state. Your approach and no variation thereof can work. It is not supposed
to.


What you should do, is use celery (or something equivalent) to 
periodically call a task. This task should scrape the website and store 
the extracted data in your database or push it to streams (more on that 
later). You should then provide an endpoint doing one of the following 
things:

- Provide the current state of the data in your database.
- Provide an event stream that yields new data as it comes in (what I 
referred to earlier)

On your website you will then need to consume the endpoint either by 
polling it or by asynchronously consuming the event stream, browsers 
provide an API for the latter.

I think the stream approach is more elegant and it should be more 
efficient but it's also more complicated. So you probably want to go with 
the other approach for now. 

This is how your problem can and should be solved. Doing anything else is 
wrong or - if it happens to be working - significantly more complicated 
and wrong.

Re: [flask] Continuous call to a thir-party site

From:
Matthias Urlichs
Date:
2014-10-18 @ 17:56
Hi,

Juan Christian:
> I'll be using Tor for that.
> 
I wonder how many requests will end up taking longer than 5sec ..?

--
-- Matthias Urlichs

Re: [flask] Continuous call to a thir-party site

From:
Scott Lipsig
Date:
2014-10-16 @ 03:33
Absolutely. Invoking time.sleep(5) will work. If you run into any issues 
with flood protection, you may want to randomize 
(https://docs.python.org/2/library/random.html) the number of seconds. I 
doubt you have to worry about it with steam, but you’ll know it if you see
it.

Note: my prior example should have read: sleep(300) # 300 seconds = 5 minutes

On Oct 15, 2014, at 8:13 PM, Juan Christian <juan0christian@gmail.com> wrote:

> It's more like 5seconds, sleep still good for this?
> 
> On Wed, Oct 15, 2014 at 11:29 PM, Scott Lipsig <scott.lipsig@gmail.com> wrote:
> Not sure why you’d run flask/python/anything on something that isn’t 
unix-based, but you *can* use celery to do pretty much the same thing. 
I’ve used it to good effect with scrapers that needed built-in delays in 
the past.
> 
> 
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
> http://cele ry.readthedocs.org/en/latest/reference/celery.schedules.html
> 
> Of course, if you’re just running something every five minutes, there 
isn’t any reason to use anything more complex than the standard library’s 
time.sleep() function. 
> < div>e.g.:
> 
> 
> from time import sleep
> 
> while True:
>     scrape_things()
>     time.sleep(300) # 300 seconds = 5 minutes
> 
> 
> On Oct 15, 2014, at 5:53 PM, Juan Christian <juan0christian@gmail.com> wrote:
> 
>> I need something not OS related, it seems that cron is UNIX only.
>> 
>> On Wed, Oct 15, 2014 at 9:39 PM, Jack Maney <jackmaney@gmail.com> wrote:
>> A job set up in Cron: http://en.wikipedia.org/wiki /Cron
>> 
>> Of course, any other task scheduler would suffice.
>> 
>> On Wed, Oct 15, 2014 at 7:28 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>> "Cron job", what do you mean by that?
>> 
>> On Wed, Oct 15, 2014 at 9:17 PM, Jack Maney <jackmaney@gmail.com> wrote:
>> I'm sure that such a thi ng can be done with Flask. However, I'm not 
sure that Flask is the right tool for the job, since it's a framework for 
building a web server that *listens* for traffic.
>> 
>> Why not just set up a cron job that runs, say, every five minutes, 
scrapes the page, checks against the last list of topics (which can be 
stored in a database somewhere), and then fires off a relevant alert if 
new topics appear?
>> 
>> On Tue, Oct 14, 2014 at 6:31 PM, Juan Christian 
<juan0christian@gmail.com> wrote:
>> I have an CLi app that continuously call one website using requests and
time.sleep. Now I need to do that same thing using Flask, is there a good 
way to continuously call a site in order to check if there are new 
content?
>> 
>> My target: http://steamcommunity.com/app/440/tradingforum/
>> 
>> I need to continuously check if there are new topics, Steam sadly 
doesn't provide any sort of API for the forums and I'm getting all data 
needed using bs4.
>> 
>> My objective: Code defined 'delay time' to check the site and whenever 
a new topic comes up I want to get the URL (full site URL or just the 
'new' part of it).
>> 
>> Is it possible?
>> 
>> 
>> 
>> -- 
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>> 
>> 
>> 
>> 
>> -- 
>> "Structures are the weapons of the mathematician."
>> --Bourbaki
>> 
> 
>