librelist archives

« back to archive

strange error with joblibs

strange error with joblibs

From:
Mainak Jas
Date:
2014-12-18 @ 12:42
Hi everyone,

I am facing a weird problem when I use sklearn cross validation with n_jobs
> 1. The full error trace is available here:
http://becs.aalto.fi/~jasm1/error.txt

I could not reproduce the problem on my own laptop. But this can be
consistently reproduced on the university servers where I have set up a
python installation using miniconda. Here is the list of packages
installed: http://becs.aalto.fi/~jasm1/list_of_packages.txt

To reproduce the bug, here is the script:
http://becs.aalto.fi/~jasm1/reproduce_bug.py and here is the data:
http://becs.aalto.fi/~jasm1/reproduce_bug.npz

Any help in diagnosing this would be greatly appreciated.

Thanks!

Re: [joblib] strange error with joblibs

From:
Olivier Grisel
Date:
2014-12-18 @ 15:29
Hi Mainak,

The following is weird:


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc
in transform(self=StandardScaler(copy=True, with_mean=True,
with_std=True), X=array([[  6.24320004e-12,   5.56367928e-12,
3....481577e-12,  -8.92378108e-12,  -6.00404796e-12]]), y=None,
copy=True)
    353                     "instead. See docstring for motivation and
alternatives.")
    354             if self.std_ is not None:
    355                 inplace_column_scale(X, 1 / self.std_)
    356         else:
    357             if self.with_mean:
--> 358                 X -= self.mean_
    359             if self.with_std:
    360                 X /= self.std_
    361         return X
    362

ValueError: output array is read-only

X can indeed be readonly if large enough when using joblib because of
the memory mapping mechanism. However, the copy=True of the standard
scaler should make sure that this is only updating


Let us create a readonly memmap array:

>>> import numpy as np
>>> _ = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='w+')
>>> a = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='r')
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False

Let us simulate the internal of StandardScaler with copy=True:

>>> from sklearn.utils.validation import check_array
>>> b = check_array(a, copy=True)

In my case b is a regular writable array:

>>> b.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> b
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
>>> b -= b.mean()
>>>

Can you run the same kind of experiment on the host that produced the
failure? Which version of numpy are using using on that host?

>>> print(np.__version__)
1.8.2
​
-- 
Olivier

Re: [joblib] strange error with joblibs

From:
Gael Varoquaux
Date:
2014-12-18 @ 15:41
> X can indeed be readonly if large enough when using joblib because of 
the memory mapping mechanism. However, the copy=True of the standard 
scaler should make sure that this is only updating

Actually, I think that what needs to be done is "np.array(X).copy()", as
just doing the copy copies the memmap, and thus propagates the problem.

These types of problems prompted us to writing the "as_ndarray" code in
nilearn:

https://github.com/nilearn/nilearn/blob/master/nilearn/_utils/numpy_conversions.py#L30

Re: [joblib] strange error with joblibs

From:
Olivier Grisel
Date:
2014-12-18 @ 19:38
2014-12-18 16:41 GMT+01:00 Gael Varoquaux <gael.varoquaux@normalesup.org>:
>> X can indeed be readonly if large enough when using joblib because of 
the memory mapping mechanism. However, the copy=True of the standard 
scaler should make sure that this is only updating
>
> Actually, I think that what needs to be done is "np.array(X).copy()", as
> just doing the copy copies the memmap, and thus propagates the problem.

check_array in master is already doing the right thing:


https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L264


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [joblib] strange error with joblibs

From:
Mainak Jas
Date:
2014-12-18 @ 15:39
Hi Olivier,

On Thu, Dec 18, 2014 at 5:29 PM, Olivier Grisel <olivier.grisel@ensta.org>
wrote:
>
> Hi Mainak,
>
> The following is weird:
>
> 
/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc
in transform(self=StandardScaler(copy=True, with_mean=True, 
with_std=True), X=array([[  6.24320004e-12,   5.56367928e-12,   
3....481577e-12,  -8.92378108e-12,  -6.00404796e-12]]), y=None, copy=True)
>     353                     "instead. See docstring for motivation and 
alternatives.")
>     354             if self.std_ is not None:
>     355                 inplace_column_scale(X, 1 / self.std_)
>     356         else:
>     357             if self.with_mean:
> --> 358                 X -= self.mean_
>     359             if self.with_std:
>     360                 X /= self.std_
>     361         return X
>     362
>
> ValueError: output array is read-only
>
> X can indeed be readonly if large enough when using joblib because of 
the memory mapping mechanism. However, the copy=True of the standard 
scaler should make sure that this is only updating
>
>
> Let us create a readonly memmap array:
>
> >>> import numpy as np
> >>> _ = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='w+')
> >>> a = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='r')
> >>> a.flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : True
>   OWNDATA : False
>   WRITEABLE : False
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> Let us simulate the internal of StandardScaler with copy=True:
>
> >>> from sklearn.utils.validation import check_array
>

do you mean check_arrays? I couldn't find any check_array in my version of
sklearn.


> >>> b = check_array(a, copy=True)
>
> In my case b is a regular writa ble array:
>
> >>> b.flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> >>> b
> array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)
> >>> b -= b.mean()
> >>>
>
> Can you run the same kind of experiment on the host that produced the
> failure?
>

ok, I get a list, not a numpy array!

In [38]: b
Out[38]: [array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
dtype=float32)]

b.flags throws an error.


> Which version of numpy are using using on that host?
>
> >>> print(np.__version__)
> 1.8.2
>

This is 1.9.1

Thanks again,
Mainak


> ​
> --
> Olivier
>

Re: [joblib] strange error with joblibs

From:
Olivier Grisel
Date:
2014-12-18 @ 19:37
2014-12-18 16:39 GMT+01:00 Mainak Jas <mainakjas@gmail.com>:
> Hi Olivier,
>
> On Thu, Dec 18, 2014 at 5:29 PM, Olivier Grisel <olivier.grisel@ensta.org>
> wrote:
>>
>> Hi Mainak,
>>
>> The following is weird:
>>
>>
>> 
/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc
>> in transform(self=StandardScaler(copy=True, with_mean=True, with_std=True),
>> X=array([[  6.24320004e-12,   5.56367928e-12,   3....481577e-12,
>> -8.92378108e-12,  -6.00404796e-12]]), y=None, copy=True)
>>     353                     "instead. See docstring for motivation and
>> alternatives.")
>>     354             if self.std_ is not None:
>>     355                 inplace_column_scale(X, 1 / self.std_)
>>     356         else:
>>     357             if self.with_mean:
>> --> 358                 X -= self.mean_
>>     359             if self.with_std:
>>     360                 X /= self.std_
>>     361         return X
>>     362
>>
>> ValueError: output array is read-only
>>
>> X can indeed be readonly if large enough when using joblib because of the
>> memory mapping mechanism. However, the copy=True of the standard scaler
>> should make sure that this is only updating
>>
>>
>> Let us create a readonly memmap array:
>>
>> >>> import numpy as np
>> >>> _ = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='w+')
>> >>> a = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='r')
>> >>> a.flags
>>   C_CONTIGUOUS : True
>>   F_CONTIGUOUS : True
>>   OWNDATA : False
>>   WRITEABLE : False
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> Let us simulate the internal of StandardScaler with copy=True:
>>
>> >>> from sklearn.utils.validation import check_array
>
>
> do you mean check_arrays? I could n't find any check_array in my version of
> sklearn.

No I mean check_array but you are right this is only in master. So the
bug might have been fixed there.

Can you please try to install scikit-learn master on your host to confirm?

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [joblib] strange error with joblibs

From:
Mainak Jas
Date:
2014-12-18 @ 20:32
ok, now I am on master:

In [2]: sklearn.__version__
Out[2]: '0.16-git'

But now it seems SVC cannot be imported. Something is fishy with
check_array vs check_arrays as the error trace indicates.

Mainak

In [1]: run reproduce_bug.py
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/jasm1/.www/reproduce_bug.py in <module>()
      1 from sklearn import preprocessing
----> 2 from sklearn.svm import SVC
      3 from sklearn.pipeline import Pipeline
      4 from sklearn.cross_validation import cross_val_score, ShuffleSplit
      5

/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/svm/__init__.py
in <module>()
     11 # License: BSD 3 clause (C) INRIA 2010
     12
---> 13 from .classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM,
LinearSVC, \
     14         LinearSVR
     15 from .bounds import l1_min_c

/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/svm/classes.py
in <module>()
      3 from .base import _fit_liblinear, BaseSVC, BaseLibSVM
      4 from ..base import BaseEstimator, RegressorMixin
----> 5 from ..linear_model.base import LinearClassifierMixin,
SparseCoefMixin, \
      6     LinearModel
      7 from ..feature_selection.from_model import _LearntSelectorMixin


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/linear_model/__init__.py
in <module>()
     13
     14 from .bayes import BayesianRidge, ARDRegression
---> 15 from .least_angle import (Lars, LassoLars, lars_path, LarsCV,
LassoLarsCV,
     16                           LassoLarsIC)
     17 from .coordinate_descent import (Lasso, ElasticNet, LassoCV,
ElasticNetCV,


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/linear_model/least_angle.py
in <module>()
     23 from ..base import RegressorMixin
     24 from ..utils import arrayfuncs, as_float_array, check_array,
check_X_y
---> 25 from ..cross_validation import _check_cv as check_cv
     26 from ..utils import ConvergenceWarning
     27 from ..externals.joblib import Parallel, delayed


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/cross_validation.py
in <module>()
     29 from .externals.six import with_metaclass
     30 from .externals.six.moves import zip
---> 31 from .metrics.scorer import check_scoring
     32
     33 __all__ = ['Bootstrap',


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/metrics/__init__.py
in <module>()
     27 from .classification import zero_one_loss
     28
---> 29 from . import cluster
     30 from .cluster import adjusted_mutual_info_score
     31 from .cluster import adjusted_rand_score


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/metrics/cluster/__init__.py
in <module>()
     19 from .unsupervised import silhouette_samples
     20 from .unsupervised import silhouette_score
---> 21 from .bicluster import consensus_score
     22
     23 __all__ = ["adjusted_mutual_info_score",
"normalized_mutual_info_score",


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/metrics/cluster/bicluster/__init__.py
in <module>()
----> 1 from .bicluster_metrics import consensus_score
      2
      3 __all__ = ['consensus_score']


/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/metrics/cluster/bicluster/bicluster_metrics.py
in <module>()
      4
      5 from sklearn.utils.linear_assignment_ import linear_assignment
----> 6 from sklearn.utils.validation import check_arrays
      7
      8

ImportError: cannot import name check_arrays


On Thu, Dec 18, 2014 at 9:37 PM, Olivier Grisel <olivier.grisel@ensta.org>
wrote:
>
> 2014-12-18 16:39 GMT+01:00 Mainak Jas <mainakjas@gmail.com>:
> > Hi Olivier,
> >
> > On Thu, Dec 18, 2014 at 5:29 PM, Olivier Grisel <
> olivier.grisel@ensta.org>
> > wrote:
> >>
> >> Hi Mainak,
> >>
> >> The following is weird:
> >>
> >>
> >>
> 
/proj/rtmeg/dich_erp/miniconda/lib/python2.7/site-packages/sklearn/preprocessing/data.pyc
> >> in transform(self=StandardScaler(copy=True, with_mean=True,
> with_std=True),
> >> X=array([[  6.24320004e-12,   5.56367928e-12,   3....481577e-12,
> >> -8.92378108e-12,  -6.00404796e-12]]), y=None, copy=True)
> >>     353                     "instead. See docstring for motivation and
> >> alternatives.")
> >>     354             if self.std_ is not None:
> >>     355                 inplace_column_scale(X, 1 / self.std_)
> >>     356         else:
> >>     357             if self.with_mean:
> >> --> 358                 X -= self.mean_
> >>     359             if self.with_std:
> >>     360                 X /= self.std_
> >>     361         return X
> >>     362
> >>
> >> ValueError: output array is read-only
> >>
> >> X can indeed be readonly if large enough when using joblib because of
> the
> >> memory mapping mechanism. However, the copy=True of the standard scaler
> >> should make sure that this is only updating
> >>
> >>
> >> Let us create a readonly memmap array:
> >>
> >> >>> import numpy as np
> >> >>> _ = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='w+')
> >> >>> a = np.memmap('/tmp/a', shape=10, dtype=np.float32, mode='r')
> >> >>> a.flags
> >>   C_CONTIGUOUS : True
> >>   F_CONTIGUOUS : True
> >>   OWNDATA : False
> >>   WRITEABLE : False
> >>   ALIGNED : True
> >>   UPDATEIFCOPY : False
> >>
> >> Let us simulate the internal of StandardScaler with copy=True:
> >>
> >> >>> from sklearn.utils.validation import check_array
> >
> >
> > do you mean check_arrays? I could n't find any check_array in my version
> of
> > sklearn.
>
> No I mean check_array but you are right this is only in master. So the
> bug might have been fixed there.
>
> Can you please try to install scikit-learn master on your host to confirm?
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>

Re: [joblib] strange error with joblibs

From:
Olivier Grisel
Date:
2014-12-18 @ 21:21
Your install is not clean, you probably have old .pyc files from the
old folder. Try to uninstall first (delete the sklearn folder in
site-packages) and rebuild from a clean source folder (you can use
"make clean").

-- 
Olivier

Re: [joblib] strange error with joblibs

From:
Mainak Jas
Date:
2014-12-18 @ 22:41
Great, I got a clean installation of the master and this solves the problem!

Thanks a lot.

Mainak

On Thu, Dec 18, 2014 at 11:21 PM, Olivier Grisel <olivier.grisel@ensta.org>
wrote:
>
> Your install is not clean, you probably have old .pyc files from the
> old folder. Try to uninstall first (delete the sklearn folder in
> site-packages) and rebuild from a clean source folder (you can use
> "make clean").
>
> --
> Olivier
>

Re: [joblib] strange error with joblibs

From:
Olivier Grisel
Date:
2014-12-19 @ 00:38
Good news, thanks for the feedback.

-- 
Olivier