librelist archives

« back to archive

"Resource deadlock avoided" error?

"Resource deadlock avoided" error?

From:
Dan Christensen
Date:
2014-05-30 @ 22:58
This morning I got the following error while trying to create a backup
to a local repo that is also used to backup other machines.  Any idea
what might have caused it?  A later backup to this repo from a remote
machine succeeded.  Oh, the repo has corruption---see below.

# attic create --stats "--exclude-from=/etc/attic/volumes/system.exclude" 
"/Backups/attic/system.attic::boots-system-20140530-06:47" "/" 

Initializing cache...
Analyzing archive: boots-system-20140514-06:32
Analyzing archive: boots-system-20140521-06:46
...62 lines deleted...
Analyzing archive: speedy-system-20140228-02:03
Analyzing archive: stilton-system-20140519-03:35
attic: /lib/modules/3.2.0-4-686-pae/kernel/drivers/ssb/ssb.ko: [Errno 35] 
Resource deadlock avoided
Traceback (most recent call last):
  File "/usr/local/bin/attic", line 3, in <module>
    main()
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
702, in main
    exit_code = archiver.run(sys.argv[1:])
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
692, in run
    return args.func(args)
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
127, in do_create
    self._process(archive, cache, args.excludes, skip_inodes, path, restrict_dev)
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
    os.path.join(path, filename), restrict_dev)
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
    os.path.join(path, filename), restrict_dev)
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
    os.path.join(path, filename), restrict_dev)
  File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
162, in _process
    archive.process_file(path, st, cache)
  File "/usr/local/lib/python3.2/dist-packages/attic/archive.py", line 
389, in process_file
    chunks.append(cache.add_chunk(self.key.id_hash(chunk), chunk, self.stats))
  File "/usr/local/lib/python3.2/dist-packages/attic/cache.py", line 179, 
in add_chunk
    self.repository.put(id, data, wait=False)
  File "/usr/local/lib/python3.2/dist-packages/attic/repository.py", line 
360, in put
    self.segments.setdefault(segment, 0)
AttributeError: 'Repository' object has no attribute 'segments'

Any idea what "Resource deadlock avoided" means?  Could attic
have noticed the problem and exited with a more helpful message?

I then ran:

# attic check --repository-only /Backups/attic/system.attic
Starting repository check...
Index object count mismatch. 1024852 != 1024853
attic: Exiting with failure status due to previous errors

So I tried the following (not just for the repo, in case there are
other problems too):

# attic check --repair /Backups/attic/system.attic
attic: Warning: 'check --repair' is an experimental feature that might 
result in data loss.
Type "Yes I am sure" if you understand this and want to continue.
Do you want to continue? Yes I am sure
Starting repository check...
Repository check complete, no problems found.
Starting archive consistency check...
Analyzing archive boots-system-20140514-06:32 (1/66)
Analyzing archive boots-system-20140521-06:46 (2/66)
...
Analyzing archive stilton-system-20140523-03:37 (65/66)
Analyzing archive stilton-system-20140519-03:35 (66/66)
1 orphaned objects found

It's strange that this time there was no problem found with the
repository-level check.  Also, what does the final message mean?  Maybe
it should say "1 orphaned objects found and fixed" or something like that?

An earlier backup to this repo from a remote host failed, probably due
to a network problem, so maybe that is why it was corrupt?

Dan

Re: [attic] "Resource deadlock avoided" error?

From:
Jonas Borgström
Date:
2014-05-31 @ 13:18
On 2014-05-31 00:58, Dan Christensen wrote:
> This morning I got the following error while trying to create a backup
> to a local repo that is also used to backup other machines.  Any idea
> what might have caused it?  A later backup to this repo from a remote
> machine succeeded.  Oh, the repo has corruption---see below.
> 
> # attic create --stats 
"--exclude-from=/etc/attic/volumes/system.exclude" 
"/Backups/attic/system.attic::boots-system-20140530-06:47" "/" 
> 
> Initializing cache...
> Analyzing archive: boots-system-20140514-06:32
> Analyzing archive: boots-system-20140521-06:46
> ...62 lines deleted...
> Analyzing archive: speedy-system-20140228-02:03
> Analyzing archive: stilton-system-20140519-03:35
> attic: /lib/modules/3.2.0-4-686-pae/kernel/drivers/ssb/ssb.ko: [Errno 
35] Resource deadlock avoided
> Traceback (most recent call last):
>   File "/usr/local/bin/attic", line 3, in <module>
>     main()
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
702, in main
>     exit_code = archiver.run(sys.argv[1:])
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
692, in run
>     return args.func(args)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
127, in do_create
>     self._process(archive, cache, args.excludes, skip_inodes, path, 
restrict_dev)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
>     os.path.join(path, filename), restrict_dev)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
>     os.path.join(path, filename), restrict_dev)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
174, in _process
>     os.path.join(path, filename), restrict_dev)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archiver.py", line 
162, in _process
>     archive.process_file(path, st, cache)
>   File "/usr/local/lib/python3.2/dist-packages/attic/archive.py", line 
389, in process_file
>     chunks.append(cache.add_chunk(self.key.id_hash(chunk), chunk, self.stats))
>   File "/usr/local/lib/python3.2/dist-packages/attic/cache.py", line 
179, in add_chunk
>     self.repository.put(id, data, wait=False)
>   File "/usr/local/lib/python3.2/dist-packages/attic/repository.py", 
line 360, in put
>     self.segments.setdefault(segment, 0)
> AttributeError: 'Repository' object has no attribute 'segments'
> 
> Any idea what "Resource deadlock avoided" means?  Could attic
> have noticed the problem and exited with a more helpful message?

This one was tricky to track down. It turns out that the fcntl.lockf()
api was changed between Python 3.2 and Python 3.3. Python 3.2 raises
IOError and Python 3.3 raises OSError.

So instead of properly bailing out with:

attic: Error: Failed to acquire write lock on /path/repos.attic/config

The exception was mistaken for a file specific error and you got:
attic: /the/first/modified/file: [Errno 35] Resource deadlock avoided

And attic tried to continue but failed.

This is now fixed in git.


> I then ran:
> 
> # attic check --repository-only /Backups/attic/system.attic
> Starting repository check...
> Index object count mismatch. 1024852 != 1024853
> attic: Exiting with failure status due to previous errors

This means the check found one object in the repository segment file
that was not in the repository index.

> So I tried the following (not just for the repo, in case there are
> other problems too):
> 
> # attic check --repair /Backups/attic/system.attic
> attic: Warning: 'check --repair' is an experimental feature that might 
result in data loss.
> Type "Yes I am sure" if you understand this and want to continue.
> Do you want to continue? Yes I am sure
> Starting repository check...
> Repository check complete, no problems found.
> Starting archive consistency check...
> Analyzing archive boots-system-20140514-06:32 (1/66)
> Analyzing archive boots-system-20140521-06:46 (2/66)
> ...
> Analyzing archive stilton-system-20140523-03:37 (65/66)
> Analyzing archive stilton-system-20140519-03:35 (66/66)
> 1 orphaned objects found
> 
> It's strange that this time there was no problem found with the
> repository-level check.  Also, what does the final message mean?  Maybe
> it should say "1 orphaned objects found and fixed" or something like that?

Yeah, the --repair mode output is a bit out of sync with the non-repair
mode. The repository-level repair silently added the newly found object
to the index. And later the archive mode deleted it since it was
orphaned (not used). I'll try to improve this soon.

> An earlier backup to this repo from a remote host failed, probably due
> to a network problem, so maybe that is why it was corrupt?

Attic 0.11 included a fix for a race condition that could result in
resurrection of previously deleted objects during segment file compaction.

So if this repository has been touched by Attic < 0.11 and has not been
repaired since then it's most likely due to this benign bug.

/ Jonas

Re: [attic] "Resource deadlock avoided" error?

From:
Dan Christensen
Date:
2014-06-01 @ 00:45
Jonas Borgström <jonas@borgstrom.se> writes:

> On 2014-05-31 00:58, Dan Christensen wrote:

>> This morning I got the following error while trying to create a backup
>> to a local repo that is also used to backup other machines.  Any idea
>> what might have caused it? 

...

>> attic: /lib/modules/3.2.0-4-686-pae/kernel/drivers/ssb/ssb.ko: [Errno 
35] Resource deadlock avoided

> This one was tricky to track down. It turns out that the fcntl.lockf()
> api was changed between Python 3.2 and Python 3.3. Python 3.2 raises
> IOError and Python 3.3 raises OSError.

Ah, so it was really a locking problem.  And checking my logs, I see
that this backup took longer than usual to rebuild the index (~2 hours;
my wireless network is of variable quality), and another backup started
just a minute or two before this one finished building the index and
started the actual backup (which only takes a minute or two).

> This is now fixed in git.

Great!

>> An earlier backup to this repo from a remote host failed, probably due
>> to a network problem, so maybe that is why it was corrupt?
>
> Attic 0.11 included a fix for a race condition that could result in
> resurrection of previously deleted objects during segment file compaction.
>
> So if this repository has been touched by Attic < 0.11 and has not been
> repaired since then it's most likely due to this benign bug.

I believe I created this repo using version

  0d248192e588a6eee67ef6f30dde8bf078c77097

from Feb 10, 2014, which was during the development of 0.11.  Is this
before or after the race condition was fixed?

Thanks for your help!

Dan

Re: [attic] "Resource deadlock avoided" error?

From:
Jonas Borgström
Date:
2014-06-01 @ 20:30
On 2014-06-01 02:45, Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> On 2014-05-31 00:58, Dan Christensen wrote:
> 
>>> This morning I got the following error while trying to create a backup
>>> to a local repo that is also used to backup other machines.  Any idea
>>> what might have caused it? 
> 
> ...
> 
>>> attic: /lib/modules/3.2.0-4-686-pae/kernel/drivers/ssb/ssb.ko: [Errno 
35] Resource deadlock avoided
> 
>> This one was tricky to track down. It turns out that the fcntl.lockf()
>> api was changed between Python 3.2 and Python 3.3. Python 3.2 raises
>> IOError and Python 3.3 raises OSError.
> 
> Ah, so it was really a locking problem.  And checking my logs, I see
> that this backup took longer than usual to rebuild the index (~2 hours;
> my wireless network is of variable quality), and another backup started
> just a minute or two before this one finished building the index and
> started the actual backup (which only takes a minute or two).
> 
>> This is now fixed in git.
> 
> Great!

Btw, I've also pushed some additional changes that completely removes
the possibility of deadlocks when creating or deleting archives.

>>> An earlier backup to this repo from a remote host failed, probably due
>>> to a network problem, so maybe that is why it was corrupt?
>>
>> Attic 0.11 included a fix for a race condition that could result in
>> resurrection of previously deleted objects during segment file compaction.
>>
>> So if this repository has been touched by Attic < 0.11 and has not been
>> repaired since then it's most likely due to this benign bug.
> 
> I believe I created this repo using version
> 
>   0d248192e588a6eee67ef6f30dde8bf078c77097
> 
> from Feb 10, 2014, which was during the development of 0.11.  Is this
> before or after the race condition was fixed?

Before:

commit 6425d16aa84be1eaaf88937124411438b260ad67
Author: Jonas Borgström <jonas@borgstrom.se>
Date:   Fri Feb 21 20:20:17 2014 +0100

    repository: Fix potential race condition

    If we crash between compact_segments() and write_index() and the
    transaction deletes objects that are newer than the current index
    might become undeleted.


/ Jonas