librelist archives

« back to archive

bug when caching methods?

bug when caching methods?

From:
Nicolas Pinto
Date:
2011-03-30 @ 23:14
Hello,

I have two questions regarding method caching with joblib (I have to
run, sorry if the explanations are half baked):

(1) I'm trying to cache methods as explained in:
http://packages.python.org/joblib/memory.html#gotchas
but every time I use the same method with the same data, I don't hit
the cache because the object itself contains a reference to a
MemorizedFunc with a different timestamp (see code below).

Is this expected? How to avoid this? I guess filter_args could should
somehow make the distinction -- let me know how can I contribute
there.

Here is the code:

from nose.tools import assert_equal

from joblib import Memory

from joblib.func_inspect import filter_args
from joblib.hashing import hash

class Foo(object):

   def __init__(self):
       mem = Memory(cachedir='./bug/', verbose=True)
       self.method = mem.cache(self.method)

   def method(self, x):
       return x

foo1 = Foo()
foo2 = Foo()

foo1.method.timestamp = 0
foo2.method.timestamp = 0
fargs1 = filter_args(foo1.method.func, [], 1)
fargs2 = filter_args(foo2.method.func, [], 1)
assert_equal(hash(fargs1), hash(fargs2))

import time
foo1.method.timestamp = time.time()
foo2.method.timestamp = time.time()
fargs1 = filter_args(foo1.method.func, [], 1)
fargs2 = filter_args(foo2.method.func, [], 1)
assert_equal(hash(fargs1), hash(fargs2))


(2) in test_bound_methods() (test/test_func_inspect.py), we are
expecting different signatures, but are we are not expecting different
hashes, right? If so, you may add the following unit test:

def test_bound_methods_hash():
   """ Make sure that calling the same method on two different instances
       of the same class does resolve to the same hashes.
   """
   from joblib.hashing import hash
   nose.tools.assert_equal(hash(filter_args(a.f, [], 1)),
                               hash(filter_args(b.f, [], 1)))

Thanks for your help.

--
Nicolas Pinto
http://web.mit.edu/pinto

Re: [joblib] bug when caching methods?

From:
Gael Varoquaux
Date:
2011-03-31 @ 05:29
On Wed, Mar 30, 2011 at 07:14:10PM -0400, Nicolas Pinto wrote:
> I have two questions regarding method caching with joblib (I have to
> (1) I'm trying to cache methods as explained in:
> http://packages.python.org/joblib/memory.html#gotchas
> but every time I use the same method with the same data, I don't hit
> the cache because the object itself contains a reference to a
> MemorizedFunc with a different timestamp (see code below).

> Is this expected? How to avoid this? I guess filter_args could should
> somehow make the distinction -- let me know how can I contribute
> there.

You are right, the addition of the timestamp broke that example, and it
wasn't tested. So, we need a fix for that example to work again, or at
least to have a way to provide the same functionality.

One simple option that I see would be to implement our own pickling
strategy for MemorizedFunc (and maybe for Memory too) to expose a
pickling that does not save the timestamp. As hasing relies on pickling,
that you should the trick.

The easiest way should be to provide a __getinitargs__ method to these
object, that does not include the timestamp:
http://docs.python.org/library/pickle.html#object.__getinitargs__

I was hoping to release a new version of joblib today, to be able to
integrated it in scikits-learn and open the door to some Py3k work at the
sprint tomorow. However, I fear that this is adding a bit to much stuff
on my plate, and I won't have enough time, given that I have 'real-life'
obligations. Any chance that you could fork joblib on github and
implement this?

> (2) in test_bound_methods() (test/test_func_inspect.py), we are
> expecting different signatures, but are we are not expecting different
> hashes, right? If so, you may add the following unit test:

Good point. If you do fork joblib, could you please add the test.

Thanks,

G

Re: [joblib] bug when caching methods?

From:
Nicolas Pinto
Date:
2011-03-31 @ 13:07
> obligations. Any chance that you could fork joblib on github and
> implement this?

> Good point. If you do fork joblib, could you please add the test.

Sounds good. Will do asap.

Thanks.

-- 
Nicolas Pinto
http://web.mit.edu/pinto

Re: [joblib] bug when caching methods?

From:
Pietro Berkes
Date:
2011-03-31 @ 13:09
I'm not 100% sure I'm following the discussion, but wouldn't patch
e0b01e77a2577d36247097eee9a300f5b721ba74
solve the issue?
It re-defined Hasher.save to allow methods to be decorated, and
re-packages tham in a new format that does not include timestamps.
P.


On Thu, Mar 31, 2011 at 1:29 AM, Gael Varoquaux
<gael.varoquaux@normalesup.org> wrote:
> On Wed, Mar 30, 2011 at 07:14:10PM -0400, Nicolas Pinto wrote:
>> I have two questions regarding method caching with joblib (I have to
>> (1) I'm trying to cache methods as explained in:
>> http://packages.python.org/joblib/memory.html#gotchas
>> but every time I use the same method with the same data, I don't hit
>> the cache because the object itself contains a reference to a
>> MemorizedFunc with a different timestamp (see code below).
>
>> Is this expected? How to avoid this? I guess filter_args could should
>> somehow make the distinction -- let me know how can I contribute
>> there.
>
> You are right, the addition of the timestamp broke that example, and it
> wasn't tested. So, we need a fix for that example to work again, or at
> least to have a way to provide the same functionality.
>
> One simple option that I see would be to implement our own pickling
> strategy for MemorizedFunc (and maybe for Memory too) to expose a
> pickling that does not save the timestamp. As hasing relies on pickling,
> that you should the trick.
>
> The easiest way should be to provide a __getinitargs__ method to these
> object, that does not include the timestamp:
> http://docs.python.org/library/pickle.html#object.__getinitargs__
>
> I was hoping to release a new version of joblib today, to be able to
> integrated it in scikits-learn and open the door to some Py3k work at the
> sprint tomorow. However, I fear that this is adding a bit to much stuff
> on my plate, and I won't have enough time, given that I have 'real-life'
> obligations. Any chance that you could fork joblib on github and
> implement this?
>
>> (2) in test_bound_methods() (test/test_func_inspect.py), we are
>> expecting different signatures, but are we are not expecting different
>> hashes, right? If so, you may add the following unit test:
>
> Good point. If you do fork joblib, could you please add the test.
>
> Thanks,
>
> G
>

Re: [joblib] bug when caching methods?

From:
Gael Varoquaux
Date:
2011-04-01 @ 05:55
On Thu, Mar 31, 2011 at 09:09:12AM -0400, Pietro Berkes wrote:
> I'm not 100% sure I'm following the discussion, but wouldn't patch
> e0b01e77a2577d36247097eee9a300f5b721ba74
> solve the issue?
> It re-defined Hasher.save to allow methods to be decorated, and
> re-packages tham in a new format that does not include timestamps.

Hi Pietro,

Your fix and Nico's problems where somewhat orthogonal issue, as Nico's
problem was related to a very specific usage pattern.

G

Re: [joblib] bug when caching methods?

From:
Pietro Berkes
Date:
2011-03-31 @ 13:14
:-)
The number is probably not very useful, it's the patch in the master
branch with comment
FIX: decorated methods could not be pickled
made on Sep 2, 2010.

On Thu, Mar 31, 2011 at 9:09 AM, Pietro Berkes <berkes@brandeis.edu> wrote:
> I'm not 100% sure I'm following the discussion, but wouldn't patch
> e0b01e77a2577d36247097eee9a300f5b721ba74
> solve the issue?
> It re-defined Hasher.save to allow methods to be decorated, and
> re-packages tham in a new format that does not include timestamps.
> P.
>
>
> On Thu, Mar 31, 2011 at 1:29 AM, Gael Varoquaux
> <gael.varoquaux@normalesup.org> wrote:
>> On Wed, Mar 30, 2011 at 07:14:10PM -0400, Nicolas Pinto wrote:
>>> I have two questions regarding method caching with joblib (I have to
>>> (1) I'm trying to cache methods as explained in:
>>> http://packages.python.org/joblib/memory.html#gotchas
>>> but every time I use the same method with the same data, I don't hit
>>> the cache because the object itself contains a reference to a
>>> MemorizedFunc with a different timestamp (see code below).
>>
>>> Is this expected? How to avoid this? I guess filter_args could should
>>> somehow make the distinction -- let me know how can I contribute
>>> there.
>>
>> You are right, the addition of the timestamp broke that example, and it
>> wasn't tested. So, we need a fix for that example to work again, or at
>> least to have a way to provide the same functionality.
>>
>> One simple option that I see would be to implement our own pickling
>> strategy for MemorizedFunc (and maybe for Memory too) to expose a
>> pickling that does not save the timestamp. As hasing relies on pickling,
>> that you should the trick.
>>
>> The easiest way should be to provide a __getinitargs__ method to these
>> object, that does not include the timestamp:
>> http://docs.python.org/library/pickle.html#object.__getinitargs__
>>
>> I was hoping to release a new version of joblib today, to be able to
>> integrated it in scikits-learn and open the door to some Py3k work at the
>> sprint tomorow. However, I fear that this is adding a bit to much stuff
>> on my plate, and I won't have enough time, given that I have 'real-life'
>> obligations. Any chance that you could fork joblib on github and
>> implement this?
>>
>>> (2) in test_bound_methods() (test/test_func_inspect.py), we are
>>> expecting different signatures, but are we are not expecting different
>>> hashes, right? If so, you may add the following unit test:
>>
>> Good point. If you do fork joblib, could you please add the test.
>>
>> Thanks,
>>
>> G
>>
>

Re: [joblib] bug when caching methods?

From:
Nicolas Pinto
Date:
2011-03-31 @ 20:54
Thanks Pietro. Hasher.save() is actually not used when you call
hash(). I followed Gael's suggestion and used the pickling protocol
with __getstate__ in MemorizedFunc to nuke the timestamp.

Pull request at:
https://github.com/joblib/joblib/pull/3

HTH

Best,

N

On Thu, Mar 31, 2011 at 9:14 AM, Pietro Berkes <berkes@brandeis.edu> wrote:
> :-)
> The number is probably not very useful, it's the patch in the master
> branch with comment
> FIX: decorated methods could not be pickled
> made on Sep 2, 2010.
>
> On Thu, Mar 31, 2011 at 9:09 AM, Pietro Berkes <berkes@brandeis.edu> wrote:
>> I'm not 100% sure I'm following the discussion, but wouldn't patch
>> e0b01e77a2577d36247097eee9a300f5b721ba74
>> solve the issue?
>> It re-defined Hasher.save to allow methods to be decorated, and
>> re-packages tham in a new format that does not include timestamps.
>> P.
>>
>>
>> On Thu, Mar 31, 2011 at 1:29 AM, Gael Varoquaux
>> <gael.varoquaux@normalesup.org> wrote:
>>> On Wed, Mar 30, 2011 at 07:14:10PM -0400, Nicolas Pinto wrote:
>>>> I have two questions regarding method caching with joblib (I have to
>>>> (1) I'm trying to cache methods as explained in:
>>>> http://packages.python.org/joblib/memory.html#gotchas
>>>> but every time I use the same method with the same data, I don't hit
>>>> the cache because the object itself contains a reference to a
>>>> MemorizedFunc with a different timestamp (see code below).
>>>
>>>> Is this expected? How to avoid this? I guess filter_args could should
>>>> somehow make the distinction -- let me know how can I contribute
>>>> there.
>>>
>>> You are right, the addition of the timestamp broke that example, and it
>>> wasn't tested. So, we need a fix for that example to work again, or at
>>> least to have a way to provide the same functionality.
>>>
>>> One simple option that I see would be to implement our own pickling
>>> strategy for MemorizedFunc (and maybe for Memory too) to expose a
>>> pickling that does not save the timestamp. As hasing relies on pickling,
>>> that you should the trick.
>>>
>>> The easiest way should be to provide a __getinitargs__ method to these
>>> object, that does not include the timestamp:
>>> http://docs.python.org/library/pickle.html#object.__getinitargs__
>>>
>>> I was hoping to release a new version of joblib today, to be able to
>>> integrated it in scikits-learn and open the door to some Py3k work at the
>>> sprint tomorow. However, I fear that this is adding a bit to much stuff
>>> on my plate, and I won't have enough time, given that I have 'real-life'
>>> obligations. Any chance that you could fork joblib on github and
>>> implement this?
>>>
>>>> (2) in test_bound_methods() (test/test_func_inspect.py), we are
>>>> expecting different signatures, but are we are not expecting different
>>>> hashes, right? If so, you may add the following unit test:
>>>
>>> Good point. If you do fork joblib, could you please add the test.
>>>
>>> Thanks,
>>>
>>> G
>>>
>>
>



-- 
Nicolas Pinto
http://web.mit.edu/pinto

Re: [joblib] bug when caching methods?

From:
Gael Varoquaux
Date:
2011-04-01 @ 05:53
On Thu, Mar 31, 2011 at 04:54:54PM -0400, Nicolas Pinto wrote:
> Thanks Pietro. Hasher.save() is actually not used when you call
> hash(). I followed Gael's suggestion and used the pickling protocol
> with __getstate__ in MemorizedFunc to nuke the timestamp.

> Pull request at:
> https://github.com/joblib/joblib/pull/3

Thanks. This should now be fixed. I am going to release 0.5.0 anytime.

G

Re: bug when caching methods?

From:
Nicolas Pinto
Date:
2011-03-30 @ 23:22
Related question: would it be a good idea to expose the 'timestamp'
parameter in the Memory.cache() decorator ?

N

On Wed, Mar 30, 2011 at 7:14 PM, Nicolas Pinto <pinto@mit.edu> wrote:
> Hello,
>
> I have two questions regarding method caching with joblib (I have to
> run, sorry if the explanations are half baked):
>
> (1) I'm trying to cache methods as explained in:
> http://packages.python.org/joblib/memory.html#gotchas
> but every time I use the same method with the same data, I don't hit
> the cache because the object itself contains a reference to a
> MemorizedFunc with a different timestamp (see code below).
>
> Is this expected? How to avoid this? I guess filter_args could should
> somehow make the distinction -- let me know how can I contribute
> there.
>
> Here is the code:
>
> from nose.tools import assert_equal
>
> from joblib import Memory
>
> from joblib.func_inspect import filter_args
> from joblib.hashing import hash
>
> class Foo(object):
>
>   def __init__(self):
>       mem = Memory(cachedir='./bug/', verbose=True)
>       self.method = mem.cache(self.method)
>
>   def method(self, x):
>       return x
>
> foo1 = Foo()
> foo2 = Foo()
>
> foo1.method.timestamp = 0
> foo2.method.timestamp = 0
> fargs1 = filter_args(foo1.method.func, [], 1)
> fargs2 = filter_args(foo2.method.func, [], 1)
> assert_equal(hash(fargs1), hash(fargs2))
>
> import time
> foo1.method.timestamp = time.time()
> foo2.method.timestamp = time.time()
> fargs1 = filter_args(foo1.method.func, [], 1)
> fargs2 = filter_args(foo2.method.func, [], 1)
> assert_equal(hash(fargs1), hash(fargs2))
>
>
> (2) in test_bound_methods() (test/test_func_inspect.py), we are
> expecting different signatures, but are we are not expecting different
> hashes, right? If so, you may add the following unit test:
>
> def test_bound_methods_hash():
>   """ Make sure that calling the same method on two different instances
>       of the same class does resolve to the same hashes.
>   """
>   from joblib.hashing import hash
>   nose.tools.assert_equal(hash(filter_args(a.f, [], 1)),
>                               hash(filter_args(b.f, [], 1)))
>
> Thanks for your help.
>
> --
> Nicolas Pinto
> http://web.mit.edu/pinto
>



-- 
Nicolas Pinto, PhD
http://web.mit.edu/pinto

Re: [joblib] Re: bug when caching methods?

From:
Gael Varoquaux
Date:
2011-03-31 @ 05:30
On Wed, Mar 30, 2011 at 07:22:11PM -0400, Nicolas Pinto wrote:
> Related question: would it be a good idea to expose the 'timestamp'
> parameter in the Memory.cache() decorator ?

I have no opinion on this. I wouldn't use it, but I don't have any
opinion against.

G