librelist archives

« back to archive

Re-computation with line number changes: Bug or a feature?

Re-computation with line number changes: Bug or a feature?

From:
Shabnam Kadir
Date:
2014-04-15 @ 14:36
could not decode message

Re: Re-computation with line number changes: Bug or a feature?

From:
Shabnam Kadir
Date:
2014-04-15 @ 17:07
could not decode message

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Yannick Schwartz
Date:
2014-04-16 @ 07:52
Hi Shabnam,

The function first line is part of what joblib uses to check that the
function has not changed to avoid possible collisions, so it's a feature.
And the good way to go is to:
- cache only the functions in your code that don't change (too much) and
take time (create another function if necessary)
- if you tend to add stuff above your function definition, maybe your
function should be in another file...

Cheers,
Yannick


On Tue, Apr 15, 2014 at 7:07 PM, Shabnam Kadir <shabnam.kadir@gmail.com>wrote:

> Hi,
>     I just realised that I forgot to send a couple of modules in the
> example I attached in the last email. I am therefore re-sending a tidier
> version.
> Sorry about that.
>
> Thanks again,
> Shabnam
>
>
> On Tue, Apr 15, 2014 at 3:36 PM, Shabnam Kadir <shabnam.kadir@gmail.com>wrote:
>
>> Hi,
>>    After some further investigation I have discovered that changing the
>> line number
>> of where function is defined now causes re-computation, even if the code
>> itself is
>> unchanged. I attach an IPython Notebook and some .py files which
>> reproduces this problem.
>>     My question: Is this a bug or a feature? From my perspective it as a
>> bug. It means that should I decide to comment some of my code (which is
>> often desirable), I shall have to recompute everything! Is there a way to
>> disable this?
>>
>> Thanks very much,
>> Shabnam
>>
>>
>>
>>
>
>
> --
>
> 
----------------------------------------------------------------------------------------------------------------
> Dr. Shabnam Kadir
> Institute of Neurology, Department of Neuroscience, Physiology, and
> Pharmacology
> University College London
> 21 University Street
> London WC1E 6DE
> Tel: +44 (0)20 3108 2407
>
> 
----------------------------------------------------------------------------------------------------------------
>

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Shabnam Kadir
Date:
2014-04-16 @ 09:01
Hi Yannick,
     Thanks very much for getting back to me. Please could you clarify: is
this feature of joblib present in all versions of joblib? It would be nice
if the user had the option to turn this feature off, because I think the 
error of potentially defining the same function identically twice in the 
same file is a very rare one! If joblib does notice such a thing it could 
just issue a warning, thus forcing the user to get rid of one. 
     In science you often go back and change some analysis code but not 
want to re-run parts of your pipeline again to obtain intermediate 
results. It is inconvenient to keep putting new functions in new places 
and never comment old functions. I think there will be a demand for 
turning off this feature. 

Thanks, 
Shabnam

Sent from my iPhone

> On 16 Apr 2014, at 08:52, Yannick Schwartz <yannick.schwartz@gmail.com> wrote:
> 
> Hi Shabnam,
> 
> The function first line is part of what joblib uses to check that the 
function has not changed to avoid possible collisions, so it's a feature. 
And the good way to go is to:
> - cache only the functions in your code that don't change (too much) and
take time (create another function if necessary)
> - if you tend to add stuff above your function definition, maybe your 
function should be in another file...
> 
> Cheers,
> Yannick
> 
> 
>> On Tue, Apr 15, 2014 at 7:07 PM, Shabnam Kadir <shabnam.kadir@gmail.com> wrote:
>> Hi,
>>     I just realised that I forgot to send a couple of modules in the 
example I attached in the last email. I am therefore re-sending a tidier 
version.
>> Sorry about that.
>> 
>> Thanks again,
>> Shabnam
>> 
>> 
>>> On Tue, Apr 15, 2014 at 3:36 PM, Shabnam Kadir 
<shabnam.kadir@gmail.com> wrote:
>>> Hi,
>>>    After some further investigation I have discovered that changing 
the line number
>>> of where function is defined now causes re-computation, even if the 
code itself is
>>> unchanged. I attach an IPython Notebook and some .py files which 
reproduces this problem.
>>>     My question: Is this a bug or a feature? From my perspective it as
a bug. It means that should I decide to comment some of my code (which is 
often desirable), I shall have to recompute everything! Is there a way to 
disable this? 
>>> 
>>> Thanks very much,
>>> Shabnam
>> 
>> 
>> 
>> -- 
>> 
----------------------------------------------------------------------------------------------------------------
>> Dr. Shabnam Kadir
>> Institute of Neurology, Department of Neuroscience, Physiology, and 
Pharmacology
>> University College London
>> 21 University Street
>> London WC1E 6DE
>> Tel: +44 (0)20 3108 2407
>> 
----------------------------------------------------------------------------------------------------------------
> 

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Yannick Schwartz
Date:
2014-04-16 @ 11:54
The first line check has been there for a very long time. If you feel there
should be an option you could open an issue on github, and discuss it there.


On Wed, Apr 16, 2014 at 11:01 AM, Shabnam Kadir <shabnam.kadir@gmail.com>wrote:

> Hi Yannick,
>      Thanks very much for getting back to me. Please could you clarify: is
> this feature of joblib present in all versions of joblib? It would be nice
> if the user had the option to turn this feature off, because I think the
> error of potentially defining the same function identically twice in the
> same file is a very rare one! If joblib does notice such a thing it could
> just issue a warning, thus forcing the user to get rid of one.
>      In science you often go back and change some analysis code but not
> want to re-run parts of your pipeline again to obtain intermediate results.
> It is inconvenient to keep putting new functions in new places and never
> comment old functions. I think there will be a demand for turning off this
> feature.
>
> Thanks,
> Shabnam
>
> Sent from my i Phone
>
> On 16 Apr 2014, at 08:52, Yannick Schwartz <yannick.schwartz@gmail.com>
> wrote:
>
> Hi Shabnam,
>
> The function first line is part of what joblib uses to check that the
> function has not changed to avoid possible collisions, so it's a feature.
> And the good way to go is to:
> - cache only the functions in your code that don't change (too much) and
> take time (create another function if necessary)
> - if you tend to add stuff above your function definition, maybe your
> function should be in another file...
>
> Cheers,
> Yannick
>
>
> On Tue, Apr 15, 2014 at 7:07 PM, Shabnam Kadir <shabnam.kadir@gmail.com>wrote:
>
>> Hi,
>>     I just realised that I forgot to send a couple of modules in the
>> example I attached in the last email. I am therefore re-sending a tidier
>> version.
>> Sorry about that.
>>
>> Thanks again,
>> Shabnam
>>
>>
>> On Tue, Apr 15, 2014 at 3:36 PM, Shabnam Kadir <shabnam.kadir@gmail.com>wrote:
>>
>>> Hi,
>>>    After some further investigation I have discovered that changing the
>>> line number
>>> of where function is defined now causes re-computation, even if the code
>>> itself is
>>> unchanged. I attach an IPython Notebook and some .py files which
>>> reproduces this problem.
>>>     My question: Is this a bug or a feature? From my perspective it as a
>>> bug. It means that should I decide to comment some of my code (which is
>>> often desirable), I shall have to recompute everything! Is there a way to
>>> disable this?
>>>
>>> Thanks very much,
>>> Shabnam
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> 
----------------------------------------------------------------------------------------------------------------
>> Dr. Shabnam Kadir
>> Institute of Neurology, Department of Neuroscience, Physiology, and
>> Pharmacology
>> University College London
>> 21 University Street
>> London WC1E 6DE
>> Tel: +44 (0)20 3108 2407
>>
>> 
----------------------------------------------------------------------------------------------------------------
>>
>
>

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Gael Varoquaux
Date:
2014-04-21 @ 10:41
> The first line check has been there for a very long time. If you feel there
> should be an option you could open an issue on github, and discuss it there.

The reason for this is that a file could have 2 functions named the same
way. Yes it's bad practice, but people do this, especially in research
labs.

As joblib needs to be robust to misbehavior, we need to have this feature
in.

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Shabnam Kadir
Date:
2014-04-21 @ 12:24
Indeed, that was what I suspected. However, why not, in the case of 
duplicity of function definition in the same script, have joblib throw an 
exception, thus forcing the user to delete the extraneous copy? This will 
nip such bad practice in the bud. As it currently stands, the user, unless
very observant, just wonders 'why on earth is joblib recomputing? I didn't
change the function at all! I just displaced it by 10 lines in the 
script.'

I would just like to emphasise that apart from this one issue, I think 
joblib is marvellous and shall probably always incorporate it in all my 
python code. 

Shabnam


On 21 Apr 2014, at 11:41, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:

>> The first line check has been there for a very long time. If you feel there
>> should be an option you could open an issue on github, and discuss it there.
> 
> The reason for this is that a file could have 2 functions named the same
> way. Yes it's bad practice, but people do this, especially in research
> labs.
> 
> As joblib needs to be robust to misbehavior, we need to have this feature
> in.

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
gael.varoquaux@normalesup.org
Date:
2014-04-21 @ 21:23
> How about, if you have the source code, you just do a regexp search
> for 'def f(' (with the appropriate spaces/word boundaries/etc.).

Regexp searches are very fragile, and will give errors. Think about the
following:

def f():
   pass

class F(object):

   def f(self):
       pass

if True:
   def f():
       pass

I think that this is valid code, and not horribly dirty code, and this is
likely to break a lot of simple analysis heuristics.

It is better to have a suboptimal behavior that it predictable and
robust, than a behavior that tries to be clever but ends up being
unpredictable. Hacks just lead to more problems in the long run.



More seriously, beyond these hacks, I think that storing the line number
is already a hack. I remember that I had to do it for a good reason, and
the it was related to both multiple functions with the same names, and
the difficulty of extracting function code given a name.

Now, if we slowly try to walk through the logic here, and try to find
what is fool proof and what is robust... Maybe we don't need to check for
line number, maybe the shadowing of two functions can be detected in a
different way. To answer this question, I am afraid that we are going to
have to go a bit deeper in the logic of joblib, and it's been 5 years
that I have written this code...


G


PS: let's keep this on the mailing list, so that others can pitch in.

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
gael.varoquaux@normalesup.org
Date:
2014-04-21 @ 21:39
On Mon, Apr 21, 2014 at 11:23:28PM +0200, gael.varoquaux@normalesup.org wrote:
> More seriously, beyond these hacks, I think that storing the line number
> is already a hack. I remember that I had to do it for a good reason, and
> the it was related to both multiple functions with the same names, and
> the difficulty of extracting function code given a name.

> Now, if we slowly try to walk through the logic here, and try to find
> what is fool proof and what is robust... Maybe we don't need to check for
> line number, maybe the shadowing of two functions can be detected in a
> different way. To answer this question, I am afraid that we are going to
> have to go a bit deeper in the logic of joblib, and it's been 5 years
> that I have written this code...

The part of the code of joblib that seems important is around line 569 of
memory.py, if you are interested in understanding what is going on.

Clearly, there is some effort that has been put in there to detect
collisions. I remember that this was tricky and important.

Gaël

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
gael.varoquaux@normalesup.org
Date:
2014-04-21 @ 22:06
On Mon, Apr 21, 2014 at 11:39:28PM +0200, gael.varoquaux@normalesup.org wrote:
> The part of the code of joblib that seems important is around line 569 of
> memory.py, if you are interested in understanding what is going on.

> Clearly, there is some effort that has been put in there to detect
> collisions. I remember that this was tricky and important.

After even more investigation, I find that I cannot reproduce the
problem. Indeed, if you put the code below in a file and execute it
several times, than remove the line commented (which will change the line
number of the definition of the function), and run the file again, joblib
does not recompute.

This is actually quite clear from the joblib source code: on line 550 of
memory.py we are testing for equality of the code. So this tells me that
the problem isn't the one claimed.

I think that I understand the issue: you are not reloading the function,
but you are changing the lines of code? This does ring a bell: there is
an unwanted behavior here. But it is much more involved than a simple
line number. It's related to the fact that the in-memory representation
of a function may be out of sync with it's disk storage.

The way around it would probably require to have two different ways of
invalidating the cache for a function that is already in memory, version
a function that is reloaded. That's a lot of work.

Gaël
_______________________________________________________________________________
from joblib import Memory

mem = Memory(cachedir='/tmp')

# Remove this line

def foo(x):
    return x

mfoo = mem.cache(foo)

mfoo(1)

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
gael.varoquaux@normalesup.org
Date:
2014-04-22 @ 18:07
OK, I have digged into this, and implemented something that I believe is
a viable option:
https://github.com/joblib/joblib/pull/131

This is quite tricky, so it will need review and validation before it can
maybe be merged.

You two owe me a few hours of sleep (I finished that PR at 2:30 AM)...

Gaël

Re: [joblib] Re: Re-computation with line number changes: Bug or a feature?

From:
Gael Varoquaux
Date:
2014-04-21 @ 12:27
The goal of joblib is that if something works without joblib, it should
work with joblib.

Achieving perfectly this goal is impossible, of course, because things
like side effects, but we try to get as close as possible.

Thanks a lot for your great feedback on joblib.

Gaël

On Mon, Apr 21, 2014 at 01:24:34PM +0100, Shabnam Kadir wrote:
> Indeed, that was what I suspected. However, why not, in the case of 
duplicity of function definition in the same script, have joblib throw an 
exception, thus forcing the user to delete the extraneous copy? This will 
nip such bad practice in the bud. As it currently stands, the user, unless
very observant, just wonders 'why on earth is joblib recomputing? I didn't
change the function at all! I just displaced it by 10 lines in the 
script.'

> I would just like to emphasise that apart from this one issue, I think 
joblib is marvellous and shall probably always incorporate it in all my 
python code. 

> Shabnam


> On 21 Apr 2014, at 11:41, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:

> >> The first line check has been there for a very long time. If you feel there
> >> should be an option you could open an issue on github, and discuss it there.

> > The reason for this is that a file could have 2 functions named the same
> > way. Yes it's bad practice, but people do this, especially in research
> > labs.

> > As joblib needs to be robust to misbehavior, we need to have this feature
> > in.

-- 
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux