librelist archives

« back to archive

Can joblib save things as one file

Can joblib save things as one file

From:
Gael Varoquaux
Date:
2012-02-10 @ 06:38
I am answering to David Warde-Farley's tweet:
"""Any (relatively) simple way to make
            joblib save things as one file? I have whiny lab mates."""
https://twitter.com/#!/dwf/status/167729531334033408

I am answering here, because I hate using tweeter as an email
replacement.

David, I am not sure what you mean exactly. If the question is whether
the whole database of a Memory instance can be saved in one file, the
answer is no. And there is a good reason to it: storing all the entries
in one file would make dealing with the race conditions of multiple
processes using the same store much harder, and much slower. If the
question is whether it is posisble to use 'joblib.dump' to store an
object in one single file, the answer is yes:
http://packages.python.org/joblib/generated/joblib.dump.html#joblib.dump
just put 'compress=1', and 'cache_size=1e9'. As you can guess from the
argument 'cache_size=1e9', the risk is to blow your memory on dump or
load, as it is much hard to fragment the access to the data, and more
memory will be used.

Does that answer your question? And yes, your lab mates are whiny.

Gaël

Re: [joblib] Can joblib save things as one file

From:
Olivier Grisel
Date:
2012-02-10 @ 07:17
2012/2/10 Gael Varoquaux <gael.varoquaux@normalesup.org>:
> I am answering to David Warde-Farley's tweet:
> """Any (relatively) simple way to make
>            joblib save things as one file? I have whiny lab mates."""
> https://twitter.com/#!/dwf/status/167729531334033408
>
> I am answering here, because I hate using tweeter as an email
> replacement.
>
> David, I am not sure what you mean exactly. If the question is whether
> the whole database of a Memory instance can be saved in one file, the
> answer is no. And there is a good reason to it: storing all the entries
> in one file would make dealing with the race conditions of multiple
> processes using the same store much harder, and much slower. If the
> question is whether it is posisble to use 'joblib.dump' to store an
> object in one single file, the answer is yes:
> http://packages.python.org/joblib/generated/joblib.dump.html#joblib.dump
> just put 'compress=1', and 'cache_size=1e9'. As you can guess from the
> argument 'cache_size=1e9', the risk is to blow your memory on dump or
> load, as it is much hard to fragment the access to the data, and more
> memory will be used.
>
> Does that answer your question? And yes, your lab mates are whiny.

I guess it would also be possible to do it without compression by
generating a file with the following structure:

[python pickle header][memmap sizes, shape and type infos][memmap for
numpy array #0][memmap for numpy array #1][memmap for numpy array
#2][...]

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [joblib] Can joblib save things as one file

From:
Gael Varoquaux
Date:
2012-02-10 @ 07:53
On Fri, Feb 10, 2012 at 08:17:37AM +0100, Olivier Grisel wrote:
> I guess it would also be possible to do it without compression by
> generating a file with the following structure:

> [python pickle header][memmap sizes, shape and type infos][memmap for
> numpy array #0][memmap for numpy array #1][memmap for numpy array

That would tecnhnically be possible. Whether it solves any useful
problem, or it is just one additional feature in a feature-creep race, I
do not know.

G

Re: [joblib] Can joblib save things as one file

From:
Olivier Grisel
Date:
2012-02-10 @ 08:00
2012/2/10 Gael Varoquaux <gael.varoquaux@normalesup.org>:
> On Fri, Feb 10, 2012 at 08:17:37AM +0100, Olivier Grisel wrote:
>> I guess it would also be possible to do it without compression by
>> generating a file with the following structure:
>
>> [python pickle header][memmap sizes, shape and type infos][memmap for
>> numpy array #0][memmap for numpy array #1][memmap for numpy array
>
> That would tecnhnically be possible. Whether it solves any useful
> problem, or it is just one additional feature in a feature-creep race, I
> do not know.

I agree. Maybe for streaming the data over some network but there
might be better ways that would not incur any array copy at all.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [joblib] Can joblib save things as one file

From:
Gael Varoquaux
Date:
2012-02-10 @ 09:19
On Fri, Feb 10, 2012 at 09:00:59AM +0100, Olivier Grisel wrote:
> I agree. Maybe for streaming the data over some network

Or putting in a database, @schwarty is working on that right now, but
that's a different usecase than saving to disk.

G