librelist archives

« back to archive

joblib.Parallel running out of memory at fork time in large processes

joblib.Parallel running out of memory at fork time in large processes

From:
Reid Priedhorsky
Date:
2013-01-04 @ 17:48
Hi,

We are using joblib.Parallel() and are running out of memory when it 
seems there should be enough. E.g.:

> Exception in thread Thread-166:
> Traceback (most recent call last):
>   File "/home/reidpr/opt/lib/python2.7/threading.py", line 551, in 
__bootstrap_inner
>     self.run()
>   File "/home/reidpr/opt/lib/python2.7/threading.py", line 504, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "/home/reidpr/opt/lib/python2.7/multiprocessing/pool.py", line 
302, in _handle_workers
>     pool._maintain_pool()
>   File "/home/reidpr/opt/lib/python2.7/multiprocessing/pool.py", line 
206, in _maintain_pool
>     self._repopulate_pool()
>   File "/home/reidpr/opt/lib/python2.7/multiprocessing/pool.py", line 
199, in _repopulate_pool
>     w.start()
>   File "/home/reidpr/opt/lib/python2.7/multiprocessing/process.py", line
130, in start
>     self._popen = Popen(self)
>   File "/home/reidpr/opt/lib/python2.7/multiprocessing/forking.py", line
120, in __init__
>     self.pid = os.fork()
> OSError: [Errno 12] Cannot allocate memory

Specifically, my main process is about 22GB of virtual memory; the 
machine has 64GB of RAM and no swap. We have n_jobs == 4. Nothing else 
significant is running.

Looking at parallel.py, I hypothesize that when the multiprocessing.Pool 
is created in Parallel.__call__(), the fork fails because there isn't 
enough memory to duplicate three extra 22GB processes, even when memory 
overcommit is on.

The solution seems like it would be to create the multiprocessing pool 
early, when the main process is small, and let it persist (workers will 
stay small since they don't process large objects).

I see based on the comment in Parallel.__init__() that the lazy-creation 
of the pool is deliberate, but that seems troublesome for my use case.

Questions:

1. Does this make sense? Am I missing something?
2. Is joblib open to a pull request which makes the lazy creation of the 
pool optional?

This is joblib v.2.6.5 on Linux.

Thanks!
Reid

-- 
IM (Google Chat):    reid.priedhorsky@gmail.com (not a valid e-mail)

E-mail response time: I check e-mail periodically throughout the day,
not continually, so I might not see your note for several hours.
Please use IM or phone if you need an immediate response.

Re: [joblib] joblib.Parallel running out of memory at fork time in large processes

From:
Gael Varoquaux
Date:
2013-01-04 @ 19:02
----- Original message -----
> Looking at parallel.py, I hypothesize that when the multiprocessing.Pool 
> is created in Parallel.__call__(), the fork fails because there isn't 
> enough memory to duplicate three extra 22GB processes, even when memory 
> overcommit is on.

I would think that this shouldn't be the case, as unixes typically 
implement a copy-on-write mecanism that enables memory efficiency in such 
situation.

> The solution seems like it would be to create the multiprocessing pool 
> early, when the main process is small, and let it persist (workers will 
> stay small since they don't process large objects).

This is by design, to benefit from the copy on write, and to minimize the 
number of possible zombie processes. Experience with multiprocessing has 
taught me that I want to have parallel code to  be as short-lived as 
possible. 

> 1. Does this make sense? Am I missing something?

Maybe copy-on-write, but I am a bit confused as to whether my argument is sound.

> 2. Is joblib open to a pull request which makes the lazy creation of the 
> pool optional?

No, sorry. That whole code needs to go through several steps of refactor: 
to implement shared memory via memmapping (Olivier Grisel has as pull 
request on that) and to vastly change the archeticture to implement a 
queue-based job dispatcher interprocessed, as with the grand central 
dispatch pattern. I do not want to undertake changes to the current 
archeticture (problem is... I so much lack time). Thanks for offering 
though, it's really appreciated.

Cheers,

Gael

Re: [joblib] joblib.Parallel running out of memory at fork time in large processes

From:
Reid Priedhorsky
Date:
2013-01-07 @ 19:33
On 01/04/2013 12:02 PM, Gael Varoquaux wrote:
>
> ----- Original message -----
>>
>> Looking at parallel.py, I hypothesize that when the
>> multiprocessing.Pool is created in Parallel.__call__(), the fork
>> fails because there isn't enough memory to duplicate three extra
>> 22GB processes, even when memory overcommit is on.
>
> I would think that this shouldn't be the case, as unixes typically
> implement a copy-on-write mecanism that enables memory efficiency in
> such situation.

I'm a little fuzzy on the details myself, but I'm pretty sure that the 
full amount of memory needs to be allocated at fork time. Copy-on-write 
happens gradually later as the processes diverge. For example, the 
following is what I think is happening with us.

As I mentioned, this is a 64GB box with no swap and a memory overcommit 
ratio of 0.5. Therefore, Linux is willing to allocate 96GB of memory 
(RAM + 0.5RAM overcommit + 0 swap).

Working case: Main process is at 21GB of virtual memory. We call 
Parallel(n_jobs=4); this forks 3 additional processes, each also 21GB, 
for a total of 84GB. There's room for this, and the call succeeds. Only 
a small fraction of the 21GB allocation for the children is actually 
touched before they exit, so not much actual memory copying happens.

Failure case: Main process is at 22GB of virtual memory. We call 
Parallel(n_jobs=4); this forks 3 additional processes, each also 22GB, 
for a total of 88GB. That plus unrelated processes exceeds the 96GB the 
OS is willing to allocate, and one of the forks fails.

I can provide example code to demonstrate the failure.

>> The solution seems like it would be to create the multiprocessing
>> pool early, when the main process is small, and let it persist
>> (workers will stay small since they don't process large objects).
>
> This is by design, to benefit from the copy on write, and to minimize
> the number of possible zombie processes. Experience with
> multiprocessing has taught me that I want to have parallel code to
> be as short-lived as possible.

I definitely agree. I don't see a way around the forking early in my use 
case, though. I can ask for the overcommit ratio to be increased, but 
the value that works for me would be pretty large (i.e., might cause 
trouble for others) and I'd like my code to not depend on OS tweaks.

>> 2. Is joblib open to a pull request which makes the lazy creation
>> of the pool optional?
>
> No, sorry. That whole code needs to go through several steps of

OK. I'll do some experimentation; perhaps I'll try to maintain a fork, 
or we might just transition to raw multiprocessing.Pool.

Thanks,
Reid

-- 
IM (Google Chat):    reid.priedhorsky@gmail.com (not a valid e-mail)

E-mail response time: I check e-mail periodically throughout the day,
not continually, so I might not see your note for several hours.
Please use IM or phone if you need an immediate response.