Re: [joblib] Can joblib save things as one file
- From:
- Olivier Grisel
- Date:
- 2012-02-10 @ 07:17
2012/2/10 Gael Varoquaux <gael.varoquaux@normalesup.org>:
> I am answering to David Warde-Farley's tweet:
> """Any (relatively) simple way to make
> joblib save things as one file? I have whiny lab mates."""
> https://twitter.com/#!/dwf/status/167729531334033408
>
> I am answering here, because I hate using tweeter as an email
> replacement.
>
> David, I am not sure what you mean exactly. If the question is whether
> the whole database of a Memory instance can be saved in one file, the
> answer is no. And there is a good reason to it: storing all the entries
> in one file would make dealing with the race conditions of multiple
> processes using the same store much harder, and much slower. If the
> question is whether it is posisble to use 'joblib.dump' to store an
> object in one single file, the answer is yes:
> http://packages.python.org/joblib/generated/joblib.dump.html#joblib.dump
> just put 'compress=1', and 'cache_size=1e9'. As you can guess from the
> argument 'cache_size=1e9', the risk is to blow your memory on dump or
> load, as it is much hard to fragment the access to the data, and more
> memory will be used.
>
> Does that answer your question? And yes, your lab mates are whiny.
I guess it would also be possible to do it without compression by
generating a file with the following structure:
[python pickle header][memmap sizes, shape and type infos][memmap for
numpy array #0][memmap for numpy array #1][memmap for numpy array
#2][...]
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Re: [joblib] Can joblib save things as one file
- From:
- Gael Varoquaux
- Date:
- 2012-02-10 @ 07:53
On Fri, Feb 10, 2012 at 08:17:37AM +0100, Olivier Grisel wrote:
> I guess it would also be possible to do it without compression by
> generating a file with the following structure:
> [python pickle header][memmap sizes, shape and type infos][memmap for
> numpy array #0][memmap for numpy array #1][memmap for numpy array
That would tecnhnically be possible. Whether it solves any useful
problem, or it is just one additional feature in a feature-creep race, I
do not know.
G
Re: [joblib] Can joblib save things as one file
- From:
- Olivier Grisel
- Date:
- 2012-02-10 @ 08:00
2012/2/10 Gael Varoquaux <gael.varoquaux@normalesup.org>:
> On Fri, Feb 10, 2012 at 08:17:37AM +0100, Olivier Grisel wrote:
>> I guess it would also be possible to do it without compression by
>> generating a file with the following structure:
>
>> [python pickle header][memmap sizes, shape and type infos][memmap for
>> numpy array #0][memmap for numpy array #1][memmap for numpy array
>
> That would tecnhnically be possible. Whether it solves any useful
> problem, or it is just one additional feature in a feature-creep race, I
> do not know.
I agree. Maybe for streaming the data over some network but there
might be better ways that would not incur any array copy at all.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Re: [joblib] Can joblib save things as one file
- From:
- Gael Varoquaux
- Date:
- 2012-02-10 @ 09:19
On Fri, Feb 10, 2012 at 09:00:59AM +0100, Olivier Grisel wrote:
> I agree. Maybe for streaming the data over some network
Or putting in a database, @schwarty is working on that right now, but
that's a different usecase than saving to disk.
G