Upcoming joblib release
- Gael Varoquaux
- 2011-02-27 @ 23:24
I was looking at huge parallel for loops ran with joblib.Parallel (to be
precise, in the scikits.learn's GridSearchCV) and I realized that as joblib
was dispatching immediatly to sub-processes, it could create huge
temporaries. Thus I refactored the Parallel engine, to enable late
>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> def producer():
... for i in range(6):
... print 'Produced %s' % i
... yield i
>>> out = Parallel(n_jobs=2, verbose=1, pre_dispatch='1.5*n_jobs')(
... delayed(sqrt)(i) for i in producer())
[Parallel(n_jobs=2)]: Done 1 out of 3+ |elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 4+ |elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 3 out of 5+ |elapsed: 0.0s remaining: 0.0s
I am planning to release in a few days joblib 0.5.0 with this feature.
The release will also contain small improvements that make joblib's
caching engine more robust when used with many processes.
The soon-to-be-released code can be found in the 0.5.X branch.
I am planning to use this is the near future to improve parallelism in
the scikits.learn's GridSearchCV.
Any feedback is more than welcome.