librelist archives

« back to archive

OSError: [Errno 4] Interrupted system call

OSError: [Errno 4] Interrupted system call

From:
Gustavo Goretkin
Date:
2011-07-29 @ 09:13
I'm running a very simple demo job, but I'm getting an exception from the
multiprocessing module. The probability of the exception occurring increases
with n_jobs. That is, it's rarer for me to get the exception with n_jobs =
2, but with n_jobs = 8, it's almost guaranteed. I'm running Ubuntu 10.10. I
can't tell what's interrupting the parent process. Any suggestions?

Thanks,
Gustavo


In [4]: Parallel(n_jobs=-1, verbose=1)(delayed(sleep)(.1) for _ in
range(10))
[Parallel(n_jobs=-1)]: Done   1 out of  10 |elapsed:    0.1s remaining:
0.9s
[Parallel(n_jobs=-1)]: Done   2 out of  10 |elapsed:    0.1s remaining:
0.4s
[Parallel(n_jobs=-1)]: Done   3 out of  10 |elapsed:    0.1s remaining:
0.2s
[Parallel(n_jobs=-1)]: Done   4 out of  10 |elapsed:    0.1s remaining:
0.2s
[Parallel(n_jobs=-1)]: Done   5 out of  10 |elapsed:    0.1s remaining:
0.1s
[Parallel(n_jobs=-1)]: Done   6 out of  10 |elapsed:    0.1s remaining:
0.1s
[Parallel(n_jobs=-1)]: Done   7 out of  10 |elapsed:    0.1s remaining:
0.0s
[Parallel(n_jobs=-1)]: Done   8 out of  10 |elapsed:    0.1s remaining:
0.0s
[Parallel(n_jobs=-1)]: Done   9 out of  10 |elapsed:    0.1s remaining:
0.0s
[Parallel(n_jobs=-1)]: Done  10 out of  10 |elapsed:    0.1s remaining:
0.0s
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "/afs/

athena.mit.edu/user/g/o/goretkin/py/lis_env/lib/python2.6/site-packages/joblib-0.4.4.dev-py2.6.egg/joblib/parallel.py",
line 257, in __call__
    pool.join()
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 342, in join
    p.join()
  File "/usr/lib/python2.6/multiprocessing/process.py", line 119, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python2.6/multiprocessing/forking.py", line 117, in wait
    return self.poll(0)
  File "/usr/lib/python2.6/multiprocessing/forking.py", line 106, in poll
    pid, sts = os.waitpid(self.pid, flag)
OSError: [Errno 4] Interrupted system call

Re: [joblib] OSError: [Errno 4] Interrupted system call

From:
Gael Varoquaux
Date:
2011-07-29 @ 12:23
On Fri, Jul 29, 2011 at 05:13:35AM -0400, Gustavo Goretkin wrote:
>    I'm running a very simple demo job, but I'm getting an exception from the
>    multiprocessing module. The probability of the exception occurring
>    increases with n_jobs. That is, it's rarer for me to get the exception
>    with n_jobs = 2, but with n_jobs = 8, it's almost guaranteed. I'm running
>    Ubuntu 10.10. I can't tell what's interrupting the parent process. Any
>    suggestions?

That's really strange. I cannot reproduce the problem on my 10.04
computer that has 12 CPUs and on my 10.10 computer with 2 CPUs. I don't
have much of a guess as to what my be interrupting the process. I noticed
that you were running joblib 0.4.4. Have you tried with latest release
(0.5.3)?

Gael

Re: [joblib] OSError: [Errno 4] Interrupted system call

From:
Gustavo Goretkin
Date:
2011-07-29 @ 12:43
There is something very odd with my setup. Thankfully, I have gotten it to
work now. I'll try to dig further, but to give an idea:

I'm on a machine without root, so I'm using virtualenv.
I was using ipython within Spyder IDE.
Spyder might have been loading the ipython globally installed on the
machine, because running joblib inside ipython which I launched from the
command line worked fine.

The other occasional error was thrown in delayed() by the line
"pickle.dumps(function)" (full stack trace below, from a different machine
so joblib version is different). This happens on my personal machine in
ipython (virtualenv out of the picture).

Thanks for the great module!
Gustavo


-----------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File

"/usr/local/lib/python2.6/dist-packages/joblib-0.5.3.dev-py2.6.egg/joblib/parallel.py",
line 419, in __call__
    for function, args, kwargs in iterable:
  File "<ipython console>", line 1, in <genexpr>
  File

"/usr/local/lib/python2.6/dist-packages/joblib-0.5.3.dev-py2.6.egg/joblib/parallel.py",
line 86, in delayed
    pickle.dumps(function)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects

On Fri, Jul 29, 2011 at 8:23 AM, Gael Varoquaux <
gael.varoquaux@normalesup.org> wrote:

> On Fri, Jul 29, 2011 at 05:13:35AM -0400, Gustavo Goretkin wrote:
> >    I'm running a very simple demo job, but I'm getting an exception from
> the
> >    multiprocessing module. The probability of the exception occurring
> >    increases with n_jobs. That is, it's rarer for me to get the exception
> >    with n_jobs = 2, but with n_jobs = 8, it's almost guaranteed. I'm
> running
> >    Ubuntu 10.10. I can't tell what's interrupting the parent process. Any
> >    suggestions?
>
> That's really strange. I cannot reproduce the problem on my 10.04
> computer that has 12 CPUs and on my 10.10 computer with 2 CPUs. I don't
> have much of a guess as to what my be interrupting the process. I noticed
> that you were running joblib 0.4.4. Have you tried with latest release
> (0.5.3)?
>
> Gael
>

Re: [joblib] OSError: [Errno 4] Interrupted system call

From:
Gael Varoquaux
Date:
2011-07-29 @ 12:47
On Fri, Jul 29, 2011 at 08:43:25AM -0400, Gustavo Goretkin wrote:
>    I was using ipython within Spyder IDE.
>    Spyder might have been loading the ipython globally installed on the
>    machine, because running joblib inside ipython which I launched from the
>    command line worked fine.

OK, indeed. Spyder is using an oldish mechanism of embedding IPython to
which I contributed a bit, and that is not particularly pretty. I
remember that we hacked a bit interprocess communication. Thankfully the
core IPython devs have worked on these issues, and I suspec that the new
IPython just released will be much cleaner in this respect.

>    The other occasional error was thrown in delayed() by the line
>    "pickle.dumps(function)" (full stack trace below, from a different machine
>    so joblib version is different). This happens on my personal machine in
>    ipython (virtualenv out of the picture).

>      File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
>        raise TypeError, "can't pickle %s objects" % base.__name__
>    TypeError: can't pickle function objects

Yes, you have to define the function that you are interested in
'delaying' in a separate module. This should solve that problem.

Cheers,

Gael