I have this scenario: - I invoke each process seperately with different parameters (done by a cluster scheduling system -- my jobs could sit and wait for five hours, and then suddenly 20 of them are launched simultaneously -- or they could go one after another) - Each job needs some precomputed data that only depend on a few of the parameters. I'd like caching this to be automated. How about using joblib, and point each process to the same cache directory on a network drive? (I'm very CPU-bound.) I can't find anything hinting at locking in the source, so I'd think that I'm almost guaranteed to get race conditions, right? The solution would be to refactor Memory to use a Store, and implement one that does file-based locking. "lockf" (NOT "flock") works reliably across nodes on our cluster (GPFS), and also works locally and on some NFS setups. Given a Store that locks, the default behaviour would be to sleep until the other process finished the computation, and then fetch its results. An improvement over this is doing work-sharing, so that you can specify a list of tasks that can be processed in any order; sequentially, and in the same process, but so that one can skip to the next one when given a task is locked (because somebody else works on it). Would this be a welcome addition to joblib? (I'm basically evaluating where I should put in the necessary work to do this.) It seems to me that it should integrate nicely with what is already there, and it could be optional (although when stable I don't think making lockf-based locking the default would hurt). Any feedback welcome. Dag Sverre
Hi Dag, Good to see you on this (very quiet) mailing list. On Thu, Mar 17, 2011 at 02:31:32PM +0100, Dag Sverre Seljebotn wrote: > How about using joblib, and point each process to the same cache > directory on a network drive? (I'm very CPU-bound.) I can't find > anything hinting at locking in the source, so I'd think that I'm almost > guaranteed to get race conditions, right? You will. I use joblib's cache on my 12 CPUs box all the time. I do hit race conditions all the time. That said, I have tried adding locks, and it ended up slowing down everything. So I have adopted a 'better ask for forgiveness than permission' approach: any operation that could fail is in a try/except block (or should be, I sometimes find places where there is a race I didn't see, in which case I fix it, but more and more infrequently). The limitation of this approach is that the current codebase, when ran in parallel, might have 100 jobs computing the same item to be cached, and using up resources for no good reason. It won't crash, it will just gobble up resources. That's where a lock might come in handy, if there is some kind of scheduling system able to make us of the resources freed. > The solution would be to refactor Memory to use a Store, and implement > one that does file-based locking. "lockf" (NOT "flock") works reliably > across nodes on our cluster (GPFS), and also works locally and on some > NFS setups. Refactoring to use a store would be a good idea anyhow. A better design than the current one would make use of a sub object of memory that would expose a store API (to be defined). That way we could easily replace disk-based storage by memory storage, or storage in a key-value database, such as memcached, couchdb or mongodb. That would certainly be a great addition. In any case, I am more than open to refactoring of the codebase to do such modifications easier. I am planning to get there, but I wanted to get a cache replacement policy in place before. It turned ut that this feature pretty much requires a lock, and when I tried implementing it the first time, I ended up slowing down joblib too when running many jobs in parallel. I think I know how to avoid this problem (this would be the topic for another discussion) and I have started work on this, but it turned out to be too much work for my current available time. > Would this be a welcome addition to joblib? (I'm basically evaluating > where I should put in the necessary work to do this.) It seems to me > that it should integrate nicely with what is already there, and it could > be optional (although when stable I don't think making lockf-based > locking the default would hurt). Any feedback welcome. I don't know whether making the lock by default would hurt or not. I have three worries. 1. The portability of the locking operation. From what I had read when I was worrying about locks, filesystem locking is neither portable, nor reliable. I was adviced to use as a lock a mkdir. Indeed mkdir is an atomic operations on all file systems and all OSs that either succeeds, or fails if the directory exists. Thus acquiring the lock can be achieved by:: try: os.mkdir('foo.lock') except OSError, e: # Here use code that checks the number of the OSError to assert # that it indeed is a 'file exists' error, if so the lock is # acquired elsewhere. releasing the lock is done by removing the directory (also atomic if it's empty). 2. The slow down due to the lock (in particular in the memory access, rather than the memory write). This problem can be reduced as much as possible by having many locks, rather than a global lock, which always ends up being a bottleneck. One option would be to have a lock per function+arguments hash. That would probably work nicely with the current model. 3. Dead-locks. Joblib is designed to be resilient to hack crash: segfaults in the Python code (hell, code I write does segfault), machine running out of memory in the middle of a persistence operation. If we add locks, I'd like joblib to be able to detect deadlocks because of a dead process as much as possible. I'd like also a time out on the locks. In a single machine environment, detecting dead processes can be done by saving the PID of the process in the lock, and having code that checks if this process is still alive (and is still running Python, to catter for PID reuse). In a multi-machine environement, you need on top of that to save a hash that enables the identification of the node. I think it is OK to give up on testing of the process is still alive if the node that wrote the lock is not the same node as the node that is trying to acquire it. However, I think that parallel computing on a single machine is more and more frequent. I have 12 CPUs at the lab, in two years, I might have 24. I want to be able to use them, and with multiprocessing, from the standard library, I can easily achieve that quite well. Thus, I'd like us to cater as much as possible for the single machine case. Sorry, that's a pretty lengthy email: I had already given this problem a thought. My advice would be to play around as much as possible to see what can be achieved without writing too complex code before making any plans. The strategy that worked so far with joblib has been to avoid hard problems, and tackle only the simple/big gain ones. There were previous versions of joblib (0.1 and 0.2) that did dependency tracking. They never made it to a release on Internet, as they were too fragile and I would break them. In any case, if you start exploring any ideas, you should work only from the 0.5.X branch, and not the master branch, as the latter as features required for cache replacement policy that are not needed for your work, and will probably do more harm than good. I should have kept them in a feature branch, but I didn't foresee the locking problems (eventhough I was warned by friends :$). Thanks for your interest, it's exciting! Gael
2011/3/17 Gael Varoquaux <gael.varoquaux@normalesup.org>: > > 2. The slow down due to the lock (in particular in the memory access, > rather than the memory write). This problem can be reduced as much > as possible by having many locks, rather than a global lock, which > always ends up being a bottleneck. One option would be to have a lock > per function+arguments hash. That would probably work nicely with the > current model. That sounds like a potential simple yet useful strategy for scikit-learn. Suppose all calls to fit / transform inside a pipeline are wrapped with a @Memory(cachedir="always/the/same/path") memoizer before being passed to a GridSearchCV with n_jobs != 1 so that joblib.Parallel is used to run the grid search. Making the early stage of the pipeline lock would avoid saturating the CPU computing the exact same thing several times at the beginning. Other than that I am also in favor of keeping things as simple as possible. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
On 03/17/2011 03:44 PM, Gael Varoquaux wrote: > Hi Dag, > > Good to see you on this (very quiet) mailing list. > > On Thu, Mar 17, 2011 at 02:31:32PM +0100, Dag Sverre Seljebotn wrote: >> How about using joblib, and point each process to the same cache >> directory on a network drive? (I'm very CPU-bound.) I can't find >> anything hinting at locking in the source, so I'd think that I'm almost >> guaranteed to get race conditions, right? > You will. I use joblib's cache on my 12 CPUs box all the time. I do hit > race conditions all the time. That said, I have tried adding locks, and > it ended up slowing down everything. So I have adopted a 'better ask for > forgiveness than permission' approach: any operation that could fail is > in a try/except block (or should be, I sometimes find places where there > is a race I didn't see, in which case I fix it, but more and more > infrequently). > > The limitation of this approach is that the current codebase, when ran in > parallel, might have 100 jobs computing the same item to be cached, and > using up resources for no good reason. It won't crash, it will just > gobble up resources. That's where a lock might come in handy, if there is > some kind of scheduling system able to make us of the resources freed. My tasks would have a granularity of minutes rather than milliseconds, so the locking overhead is the least of my worries. But it certainly means things should go via refactor and store API. I'm surprised that an FS stat (or process PID lookup?) drowns hashing a NumPy array (if that is what is going on?) What kind of jobs where these, and how many? > > I don't know whether making the lock by default would hurt or not. I have > three worries. > > 1. The portability of the locking operation. From what I had read when > I was worrying about locks, filesystem locking is neither > portable, nor reliable. I was adviced to use as a lock a mkdir. Indeed > mkdir is an atomic operations on all file systems and all OSs that either > succeeds, or fails if the directory exists. Thus acquiring the lock can > be achieved by:: > > try: > os.mkdir('foo.lock') > except OSError, e: > # Here use code that checks the number of the OSError to assert > # that it indeed is a 'file exists' error, if so the lock is > # acquired elsewhere. > > releasing the lock is done by removing the directory (also atomic if it's > empty). Interesting. I'm guessing this should have about the same performance as lockf (although I guess only testing could tell for sure). The downside is what you write below: Possible dead-locks etc. Within a single host I believe lockf should be rather safe, and those automatically disappear when a process dies. More below. > 2. The slow down due to the lock (in particular in the memory access, > rather than the memory write). This problem can be reduced as much > as possible by having many locks, rather than a global lock, which > always ends up being a bottleneck. One option would be to have a lock > per function+arguments hash. That would probably work nicely with the > current model. > > 3. Dead-locks. Joblib is designed to be resilient to hack crash: > segfaults in the Python code (hell, code I write does segfault), > machine running out of memory in the middle of a persistence > operation. If we add locks, I'd like joblib to be able to detect > deadlocks because of a dead process as much as possible. I'd like > also a time out on the locks. In a single machine environment, > detecting dead processes can be done by saving the PID of the > process in the lock, and having code that checks if this process is > still alive (and is still running Python, to catter for PID reuse). > In a multi-machine environement, you need on top of that to save a > hash that enables the identification of the node. I think it is OK to > give up on testing of the process is still alive if the node that > wrote the lock is not the same node as the node that is trying to > acquire it. However, I think that parallel computing on a single > machine is more and more frequent. I have 12 CPUs at the lab, in two > years, I might have 24. I want to be able to use them, and with > multiprocessing, from the standard library, I can easily achieve that > quite well. Thus, I'd like us to cater as much as possible for the > single machine case. (I'm afraid 24 cores won't do me much good :-) ) Indeed, avoiding dead-locks on segfaults is the single most important thing to get right. It seems to me that a number of approaches is necessary. Locking is intrinsically tied to what hardware, OS and software one is on, and what performance one needs. One could imagine scenarios where joblib would be unusable unless there's an MPI store backend that acts as cache using MPI -- or internet distributed stores using ZMQ -- but first things first :-) I'm just saying that perhaps two approaches -- the one that suits me (cluster), and the one that suits you (single host) -- are needed. But let's see. Two approaches: 1) Use mkdir for lock + probe whether process has died or not. I think this last step needs support for plugging in the appropriate solution. A few options (where we should take as few as possible, but probably at least 2): i) PID's for localhost, as you said ii) For multi-host, SSH in and check PID (Linux-centric etc., but still fairly portable) iii) Specific cluster support (in my case, by far the most reliable is to record $SLURM_JOB_ID, and ask the queue system whether the job still lives) iv) Each process publishes a ZMQ port that can be ping-ed and that responds with a unique hash. Run in it's own thread outside of CPython/GIL control. v) For MPI jobs, do the locking over MPI vi) Some time based solution where a process is required to write a time stamp regularly. Pretty fragile. 2) Use lockf. This a) should be reliable on single-host, b) is only supported on some network systems, but when it is it should better do the job (why else leave an NFS lockd running...). This has the advantage of the OS taking care of releasing when the process dies. I'm thinking of the following algorithm: a) Check if store/ab1234 exists. If so, result is computed, go on and use it. b) Open store/ab1234.lock (in "create but do not overwrite" mode) and attempt to get lock with lockf. b1) If a lock is acquired (= either we're first, or other process died): - Write some information to the lockfile that identifies the process. - Compute results to store/tmp-$hostname-$pid-ab1234. - When done, mv it to store/ab1234 - Release lock and delete store/ab1234.lock b2) If a lock is not acquired, it means some other process is alive and processing, so wait (or do something else) It's possible to do a bit of forking and ping-pong to test the lock up front, at least on single-host. Remains to look what is supported on Windows though. BTW, from what you say I'm glad the first joblib attempts died. What I like so much is exactly using simple mechanisms. But introducing (optional) locking seems necessary. Dag Sverre
Hey Dag, Replying at this mail, rather than the following, but I'll try to take them in account. On Thu, Mar 17, 2011 at 04:30:28PM +0100, Dag Sverre Seljebotn wrote: > > You will. I use joblib's cache on my 12 CPUs box all the time. I do hit > > race conditions all the time. That said, I have tried adding locks, and > > it ended up slowing down everything. > My tasks would have a granularity of minutes rather than milliseconds, > so the locking overhead is the least of my worries. But it certainly > means things should go via refactor and store API. Fair enough, but this might not apply to everybody. Of course we could have a switch, but I'd like the numbers of switch to be kept to a minimum: the package should do the right thing by default. The slowdown was noticeable on a task like using dynamical programming to speed up computations of a fibonacci serie. It also multiplied the time to run the test suite of joblib by three. > I'm surprised that an FS stat (or process PID lookup?) drowns hashing a > NumPy array (if that is what is going on?) What was going on was simply that multiple processes were trying to grab the lock. Thus all but one of them ended up having to wait. This waiting was the problem, not the time to acquire or release the lock. That said, it was a global lock, so the problem was quite different. Also, the processes were doing polling on the lock, so the polling time gave the speed limit. The lock was a sqlite lock, so I couldn't avoid the polling. Will a lockf based lock induce a release of lock without polling (eg with a blocking call that unblocks when the lock is released, à la 'select'). I am not convinced that a well-coded lock to answer the problem that you are trying to answer will necessarily lead to slow downs. I think it's a matter of coding and trying. I am just raising the point that we need to be careful about this possible issue. > Within a single host I believe lockf should be rather safe, and those > automatically disappear when a process dies. More below. > [lots of complicated options, involving MPI, ZMQ, ssh, mkdirs, lockf] Wow. You've convinced me. Let's go for the simple option :P. More seriously, the complexity of trying to cater for all the cases frightens me. Here is what I propose: on systems that support lockf, use it. It seems (from what you say) that it should be resilient to deadlocks and fast. I believe that most people doing highly parallel computing will be on such systems. Does that match your intuition too? For people doing moderately parallel computing, I don't think that the races should be any problem. They are try/excepted and the worst thing that can happen is to force a bit of extra computation. I've been doing a lot of 12-job computation without any problems. I believe that this compromise should keep a simple and efficient codebase, and solve the problem for 99% of the users. We can worry about the rest later, if the need ever arises. > BTW, from what you say I'm glad the first joblib attempts died. Retrospectively me too :). The golden rule of coding is really to _throw away all your prototypes_. It's crazy how important it is to step away from a codebase and think about the problems that it is trying to solve, at some point. > What I like so much is exactly using simple mechanisms. I'd really like to keep joblib as simple as possible. It is its value, in my eyes. > But introducing (optional) locking seems necessary. I still do wonder... given the try/except blocks. I'd be interested by an real world experiment on a highly-parallel environment. Thanks for your thoughts, Gael
On 03/17/2011 06:00 PM, Gael Varoquaux wrote: > > >> I'm surprised that an FS stat (or process PID lookup?) drowns hashing a >> NumPy array (if that is what is going on?) > What was going on was simply that multiple processes were trying to grab > the lock. Thus all but one of them ended up having to wait. This waiting > was the problem, not the time to acquire or release the lock. That said, > it was a global lock, so the problem was quite different. Also, the Ah OK. Yes, a global lock would be rather disastrous :-) And we're in luck -- Python fcntl.lockf (NOT to be confused with fcntl.flock, which will not work at this for all...) can be called in either blocking or nonblocking (probing) mode, and if called in blocking mode it blocks at the OS level. > More seriously, the complexity of trying to cater for all the cases > frightens me. Here is what I propose: on systems that support lockf, use > it. It seems (from what you say) that it should be resilient to deadlocks > and fast. I believe that most people doing highly parallel computing > will be on such systems. Does that match your intuition too? For people Yes. Except some poor souls are on Windows which has a completely different API, but that's a rather trivial case to add when somebody has an interest, and optimistic locking w/ mkdir will still work until then. I'll get to it then. Dag Sverre
On 03/17/2011 04:30 PM, Dag Sverre Seljebotn wrote: > On 03/17/2011 03:44 PM, Gael Varoquaux wrote: >> Hi Dag, >> >> Good to see you on this (very quiet) mailing list. >> >> On Thu, Mar 17, 2011 at 02:31:32PM +0100, Dag Sverre Seljebotn wrote: >>> How about using joblib, and point each process to the same cache >>> directory on a network drive? (I'm very CPU-bound.) I can't find >>> anything hinting at locking in the source, so I'd think that I'm almost >>> guaranteed to get race conditions, right? >> You will. I use joblib's cache on my 12 CPUs box all the time. I do hit >> race conditions all the time. That said, I have tried adding locks, and >> it ended up slowing down everything. So I have adopted a 'better ask for >> forgiveness than permission' approach: any operation that could fail is >> in a try/except block (or should be, I sometimes find places where there >> is a race I didn't see, in which case I fix it, but more and more >> infrequently). >> >> The limitation of this approach is that the current codebase, when ran in >> parallel, might have 100 jobs computing the same item to be cached, and >> using up resources for no good reason. It won't crash, it will just >> gobble up resources. That's where a lock might come in handy, if there is >> some kind of scheduling system able to make us of the resources freed. > My tasks would have a granularity of minutes rather than milliseconds, > so the locking overhead is the least of my worries. But it certainly > means things should go via refactor and store API. > > I'm surprised that an FS stat (or process PID lookup?) drowns hashing a > NumPy array (if that is what is going on?) What kind of jobs where > these, and how many? > >> I don't know whether making the lock by default would hurt or not. I have >> three worries. >> >> 1. The portability of the locking operation. From what I had read when >> I was worrying about locks, filesystem locking is neither >> portable, nor reliable. I was adviced to use as a lock a mkdir. Indeed >> mkdir is an atomic operations on all file systems and all OSs that either >> succeeds, or fails if the directory exists. Thus acquiring the lock can >> be achieved by:: >> >> try: >> os.mkdir('foo.lock') >> except OSError, e: >> # Here use code that checks the number of the OSError to assert >> # that it indeed is a 'file exists' error, if so the lock is >> # acquired elsewhere. >> >> releasing the lock is done by removing the directory (also atomic if it's >> empty). > Interesting. I'm guessing this should have about the same performance as > lockf (although I guess only testing could tell for sure). The downside > is what you write below: Possible dead-locks etc. > > Within a single host I believe lockf should be rather safe, and those > automatically disappear when a process dies. More below. > >> 2. The slow down due to the lock (in particular in the memory access, >> rather than the memory write). This problem can be reduced as much >> as possible by having many locks, rather than a global lock, which >> always ends up being a bottleneck. One option would be to have a lock >> per function+arguments hash. That would probably work nicely with the >> current model. >> >> 3. Dead-locks. Joblib is designed to be resilient to hack crash: >> segfaults in the Python code (hell, code I write does segfault), >> machine running out of memory in the middle of a persistence >> operation. If we add locks, I'd like joblib to be able to detect >> deadlocks because of a dead process as much as possible. I'd like >> also a time out on the locks. In a single machine environment, >> detecting dead processes can be done by saving the PID of the >> process in the lock, and having code that checks if this process is >> still alive (and is still running Python, to catter for PID reuse). >> In a multi-machine environement, you need on top of that to save a >> hash that enables the identification of the node. I think it is OK to >> give up on testing of the process is still alive if the node that >> wrote the lock is not the same node as the node that is trying to >> acquire it. However, I think that parallel computing on a single >> machine is more and more frequent. I have 12 CPUs at the lab, in two >> years, I might have 24. I want to be able to use them, and with >> multiprocessing, from the standard library, I can easily achieve that >> quite well. Thus, I'd like us to cater as much as possible for the >> single machine case. > (I'm afraid 24 cores won't do me much good :-) ) > > Indeed, avoiding dead-locks on segfaults is the single most important > thing to get right. > > It seems to me that a number of approaches is necessary. Locking is > intrinsically tied to what hardware, OS and software one is on, and what > performance one needs. One could imagine scenarios where joblib would be > unusable unless there's an MPI store backend that acts as cache using > MPI -- or internet distributed stores using ZMQ -- but first things > first :-) > > I'm just saying that perhaps two approaches -- the one that suits me > (cluster), and the one that suits you (single host) -- are needed. But > let's see. Two approaches: > > 1) Use mkdir for lock + probe whether process has died or not. I think > this last step needs support for plugging in the appropriate solution. A > few options (where we should take as few as possible, but probably at > least 2): > > i) PID's for localhost, as you said > ii) For multi-host, SSH in and check PID (Linux-centric etc., but > still fairly portable) > iii) Specific cluster support (in my case, by far the most reliable is > to record $SLURM_JOB_ID, and ask the queue system whether the job still > lives) > iv) Each process publishes a ZMQ port that can be ping-ed and that > responds with a unique hash. Run in it's own thread outside of > CPython/GIL control. > v) For MPI jobs, do the locking over MPI > vi) Some time based solution where a process is required to write a > time stamp regularly. Pretty fragile. > > 2) Use lockf. This a) should be reliable on single-host, b) is only > supported on some network systems, but when it is it should better do > the job (why else leave an NFS lockd running...). > > This has the advantage of the OS taking care of releasing when the > process dies. I'm thinking of the following algorithm: > > a) Check if store/ab1234 exists. If so, result is computed, go on and > use it. > b) Open store/ab1234.lock (in "create but do not overwrite" mode) and > attempt to get lock with lockf. > b1) If a lock is acquired (= either we're first, or other process > died): Insert here: - Check again, having the lock, that store/ab1234 doesn't exist now (which means that another process finished it a microsecond ago, in which case go to a)) > - Write some information to the lockfile that identifies the process. > - Compute results to store/tmp-$hostname-$pid-ab1234. > - When done, mv it to store/ab1234 > - Release lock and delete store/ab1234.lock > b2) If a lock is not acquired, it means some other process is alive > and processing, so wait (or do something else) > > It's possible to do a bit of forking and ping-pong to test the lock up > front, at least on single-host. Remains to look what is supported on > Windows though. > > BTW, from what you say I'm glad the first joblib attempts died. What I > like so much is exactly using simple mechanisms. But introducing > (optional) locking seems necessary. > > Dag Sverre
I think this is actually a lot simpler than all I wrote below: The downside of mkdir is it allows deadlocks. The upside of mkdir is that it is more portable. Lockf is deadlocksafe but unportable. So it should probably be possible to use both kinds of locks, which should turn into pessimistic (and deadlock safe) lockf locks on OSes that support it, an optimistic cpuwasting mkdir locks when the lockf mechanism fails for some reason. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote: On 03/17/2011 04:30 PM, Dag Sverre Seljebotn wrote: > On 03/17/2011 03:44 PM, Gael Varoquaux wrote: >> Hi Dag, >> >> Good to see you on this (very quiet) mailing list. >> >> On Thu, Mar 17, 2011 at 02:31:32PM +0100, Dag Sverre Seljebotn wrote: >>> How about using joblib, and point each process to the same cache >>> directory on a network drive? (I'm very CPU-bound.) I can't find >>> anything hinting at locking in the source, so I'd think that I'm almost >>> guaranteed to get race conditions, right? >> You will. I use joblib's cache on my 12 CPUs box all the time. I do hit >> race conditions all the time. That said, I have tried adding locks, and >> it ended up slowing down everything. So I have adopted a 'better ask for >> forgiveness than permission' approach: any operation that could fail is >> in a try/except block (or should be, I sometimes find places where there >> is a race I didn't see, in which case I fix it, but more and more >> infrequently). >> >> The limitation of this approach is that the current codebase, when ran in >> parallel, might have 100 jobs computing the same item to be cached, and >> using up resources for no good reason. It won't crash, it will just >> gobble up resources. That's where a lock might come in handy, if there is >> some kind of scheduling system able to make us of the resources freed. > My tasks would have a granularity of minutes rather than milliseconds, > so the locking overhead is the least of my worries. But it certainly > means things should go via refactor and store API. > > I'm surprised that an FS stat (or process PID lookup?) drowns hashing a > NumPy array (if that is what is going on?) What kind of jobs where > these, and how many? > >> I don't know whether making the lock by default would hurt or not. I have >> three worries. >> >> 1. The portability of the locking operation. From what I had read when >> I was worrying about locks, filesystem locking is neither >> portable, nor reliable. I was adviced to use as a lock a mkdir. Indeed >> mkdir is an atomic operations on all file systems and all OSs that either >> succeeds, or fails if the directory exists. Thus acquiring the lock can >> be achieved by:: >> >> try: >> os.mkdir('foo.lock') >> except OSError, e: >> # Here use code that checks the number of the OSError to assert >> # that it indeed is a 'file exists' error, if so the lock is >> # acquired elsewhere. >> >> releasing the lock is done by removing the directory (also atomic if it's >> empty). > Interesting. I'm guessing this should have about the same performance as > lockf (although I guess only testing could tell for sure). The downside > is what you write below: Possible dead-locks etc. > > Within a single host I believe lockf should be rather safe, and those > automatically disappear when a process dies. More below. > >> 2. The slow down due to the lock (in particular in the memory access, >> rather than the memory write). This problem can be reduced as much >> as possible by having many locks, rather than a global lock, which >> always ends up being a bottleneck. One option would be to have a lock >> per function+arguments hash. That would probably work nicely with the >> current model. >> >> 3. Dead-locks. Joblib is designed to be resilient to hack crash: >> segfaults in the Python code (hell, code I write does segfault), >> machine running out of memory in the middle of a persistence >> operation. If we add locks, I'd like joblib to be able to detect >> deadlocks because of a dead process as much as possible. I'd like >> also a time out on the locks. In a single machine environment, >> detecting dead processes can be done by saving the PID of the >> process in the lock, and having code that checks if this process is >> still alive (and is still running Python, to catter for PID reuse). >> In a multi-machine environement, you need on top of that to save a >> hash that enables the identification of the node. I think it is OK to >> give up on testing of the process is still alive if the node that >> wrote the lock is not the same node as the node that is trying to >> acquire it. However, I think that parallel computing on a single >> machine is more and more frequent. I have 12 CPUs at the lab, in two >> years, I might have 24. I want to be able to use them, and with >> multiprocessing, from the standard library, I can easily achieve that >> quite well. Thus, I'd like us to cater as much as possible for the >> single machine case. > (I'm afraid 24 cores won't do me much good :-) ) > > Indeed, avoiding dead-locks on segfaults is the single most important > thing to get right. > > It seems to me that a number of approaches is necessary. Locking is > intrinsically tied to what hardware, OS and software one is on, and what > performance one needs. One could imagine scenarios where joblib would be > unusable unless there's an MPI store backend that acts as cache using > MPI -- or internet distributed stores using ZMQ -- but first things > first :-) > > I'm just saying that perhaps two approaches -- the one that suits me > (cluster), and the one that suits you (single host) -- are needed. But > let's see. Two approaches: > > 1) Use mkdir for lock + probe whether process has died or not. I think > this last step needs support for plugging in the appropriate solution. A > few options (where we should take as few as possible, but probably at > least 2): > > i) PID's for localhost, as you said > ii) For multi-host, SSH in and check PID (Linux-centric etc., but > still fairly portable) > iii) Specific cluster support (in my case, by far the most reliable is > to record $SLURM_JOB_ID, and ask the queue system whether the job still > lives) > iv) Each process publishes a ZMQ port that can be ping-ed and that > responds with a unique hash. Run in it's own thread outside of > CPython/GIL control. > v) For MPI jobs, do the locking over MPI > vi) Some time based solution where a process is required to write a > time stamp regularly. Pretty fragile. > > 2) Use lockf. This a) should be reliable on single-host, b) is only > supported on some network systems, but when it is it should better do > the job (why else leave an NFS lockd running...). > > This has the advantage of the OS taking care of releasing when the > process dies. I'm thinking of the following algorithm: > > a) Check if store/ab1234 exists. If so, result is computed, go on and > use it. > b) Open store/ab1234.lock (in "create but do not overwrite" mode) and > attempt to get lock with lockf. > b1) If a lock is acquired (= either we're first, or other process > died): Insert here: - Check again, having the lock, that store/ab1234 doesn't exist now (which means that another process finished it a microsecond ago, in which case go to a)) > - Write some information to the lockfile that identifies the process. > - Compute results to store/tmp-$hostname-$pid-ab1234. > - When done, mv it to store/ab1234 > - Release lock and delete store/ab1234.lock > b2) If a lock is not acquired, it means some other process is alive > and processing, so wait (or do something else) > > It's possible to do a bit of forking and ping-pong to test the lock up > front, at least on single-host. Remains to look what is supported on > Windows though. > > BTW, from what you say I'm glad the first joblib attempts died. What I > like so much is exactly using simple mechanisms. But introducing > (optional) locking seems necessary. > > Dag Sverre