librelist archives

« back to archive

Upcoming joblib release

Upcoming joblib release

From:
Gael Varoquaux
Date:
2011-02-27 @ 23:24
Hi,

I was looking at huge parallel for loops ran with joblib.Parallel (to be
precise, in the scikits.learn's GridSearchCV) and I realized that as joblib
was dispatching immediatly to sub-processes, it could create huge
temporaries. Thus I refactored the Parallel engine, to enable late
dispatches:

  >>> from math import sqrt
  >>> from joblib import Parallel, delayed

  >>> def producer():
  ...	for i in range(6):
  ...	    print 'Produced %s' % i
  ...	    yield i
  >>> out = Parallel(n_jobs=2, verbose=1, pre_dispatch='1.5*n_jobs')(
  ...				delayed(sqrt)(i) for i in producer())
  Produced 0
  Produced 1
  Produced 2
  [Parallel(n_jobs=2)]: Done 1 out of 3+ |elapsed: 0.0s remaining: 0.0s
  Produced 3
  [Parallel(n_jobs=2)]: Done 2 out of 4+ |elapsed: 0.0s remaining: 0.0s
  Produced 4
  [Parallel(n_jobs=2)]: Done 3 out of 5+ |elapsed: 0.0s remaining: 0.0s
  ...

I am planning to release in a few days joblib 0.5.0 with this feature.
The release will also contain small improvements that make joblib's
caching engine more robust when used with many processes.

The soon-to-be-released code can be found in the 0.5.X branch.

I am planning to use this is the near future to improve parallelism in
the scikits.learn's GridSearchCV.

Any feedback is more than welcome.

Gael