librelist archives

« back to archive

Attic vs Obnam

Attic vs Obnam

From:
Dan Christensen
Date:
2014-01-31 @ 19:35
Obnam and Attic are two de-duplicating backup programs that
seem to have a very similar design, so I've been comparing them.
Here are my observations and benchmarks.  I would be happy to
receive corrections and or additions to this information.
The Attic author has also written about this at

  
http://librelist.com/browser//attic/2013/8/13/attic-vs-obnam/#1f67440fa29dacce0ed2af4df5d0d8b7

Obnam pros:

- well documented
- active mailing list
- packages available

Obnam cons:

- very slow
- large backups

Attic pros:

- much smaller backups (even without deduplication)
- much better deduplication
- much faster

Attic cons:

- repository format not documented
- not a large user community

Important question:

How resilient are the repositories that Obnam and Attic use?  If a
single sector fails on the hard-drive, how much will be lost?  Do the
programs deal with read errors without crashing?

Testing:

Versions used:  Attic 0.10, Obnam 1.5-1ubuntu1, no encryption.

De-duplication:

For this test I used a single 33MB pdf file.  The initial repos were
33MB for Obnam and 28MB for Attic.  Then I added a single byte to the
start of the file.  The Obnam repo went to 65MB while the Attic repo
stayed at 28MB.  Both took about 1s for the initial backup and 0.5s
for the second backup.

The remaining testing was done with a Maildir folder containing
23000 files.  The total length of these files is 126MB, and they
occupy 177MB on disk according to du -sh, because of partially
filled blocks.  Here MB means 1024*1024 bytes.

Archive size, as measured by du -sh:

After one backup:

Attic:                65MB
Obnam:               190MB
Obnam with deflate:  127MB

After a second backup with no changes:  the same.

Speed:

 From local SSD to itself.  Warm cache.  First number is initial
backup, second is a repeat with no changes.  minutes:seconds

cp -a:           0:01
Attic:           0:08  0:01
Obnam:           0:51  0:05
Obnam deflate:   0:53  0:05

 From local SSD to remote HD, over a so-so wifi connection:

rsync:           0:24  0:01
Attic ssh:       0:28  0:05
Attic sshfs:     0:51  0:08
Obnam sftp:      8:45  0:21
Obnam sshfs:    25:22  0:22

Note that when using Attic with sshfs, no software has to be installed
on the remote host.

As another data point, I use unison to do bidirectional
synchronization of my home directory with an offsite server.  5GB,
56000 files.  When there are no changes, it runs in 1 second (!), 
and when there are changes, it is also very efficient.  Can either
of these programs get close to that speed in the future?

Dan

Re: [attic] Attic vs Obnam

From:
Jonas Borgström
Date:
2014-01-31 @ 22:03
On 2014-01-31 20:35, Dan Christensen wrote:
> Obnam and Attic are two de-duplicating backup programs that
> seem to have a very similar design, so I've been comparing them.
> Here are my observations and benchmarks.  I would be happy to
> receive corrections and or additions to this information.

Hi Dan and thanks for your comparison. I've written some comments and
additional information below:

> The Attic author has also written about this at
> 
>   
http://librelist.com/browser//attic/2013/8/13/attic-vs-obnam/#1f67440fa29dacce0ed2af4df5d0d8b7
> 
> Obnam pros:
> 
> - well documented
> - active mailing list
> - packages available

What kind of packages are you interested in? There are Attic packages
for Debian, Ubuntu and Arch Linux. See the homepage for more details.

> Obnam cons:
> 
> - very slow
> - large backups
> 
> Attic pros:
> 
> - much smaller backups (even without deduplication)
> - much better deduplication
> - much faster
> 
> Attic cons:
> 
> - repository format not documented

Good point, I'll try to document that.

> - not a large user community
> 
> Important question:
> 
> How resilient are the repositories that Obnam and Attic use?  If a
> single sector fails on the hard-drive, how much will be lost?  Do the
> programs deal with read errors without crashing?

Right now Attic will detect a checksum mismatch and abort. You will need
to explicity --exclude the damaged files to continue beyond that point.
fsck/repair functionality is at the top of my list for 0.11. But since
Attic repositories contain no redundant information even a single bad
sector will result in some data loss. Depending on the actual location a
single file might be destroyed but it could also kill an entire archive
or even the entire repository if you are extremely unlucky.

> Testing:
> 
> Versions used:  Attic 0.10, Obnam 1.5-1ubuntu1, no encryption.
> 
> De-duplication:
> 
> For this test I used a single 33MB pdf file.  The initial repos were
> 33MB for Obnam and 28MB for Attic.  Then I added a single byte to the
> start of the file.  The Obnam repo went to 65MB while the Attic repo
> stayed at 28MB.  Both took about 1s for the initial backup and 0.5s
> for the second backup.
> 
> The remaining testing was done with a Maildir folder containing
> 23000 files.  The total length of these files is 126MB, and they
> occupy 177MB on disk according to du -sh, because of partially
> filled blocks.  Here MB means 1024*1024 bytes.
> 
> Archive size, as measured by du -sh:
> 
> After one backup:
> 
> Attic:                65MB
> Obnam:               190MB
> Obnam with deflate:  127MB
> 
> After a second backup with no changes:  the same.
> 
> Speed:
> 
>  From local SSD to itself.  Warm cache.  First number is initial
> backup, second is a repeat with no changes.  minutes:seconds
> 
> cp -a:           0:01
> Attic:           0:08  0:01
> Obnam:           0:51  0:05
> Obnam deflate:   0:53  0:05
> 
>  From local SSD to remote HD, over a so-so wifi connection:
> 
> rsync:           0:24  0:01
> Attic ssh:       0:28  0:05
> Attic sshfs:     0:51  0:08
> Obnam sftp:      8:45  0:21
> Obnam sshfs:    25:22  0:22
> 
> Note that when using Attic with sshfs, no software has to be installed
> on the remote host.
> 
> As another data point, I use unison to do bidirectional
> synchronization of my home directory with an offsite server.  5GB,
> 56000 files.  When there are no changes, it runs in 1 second (!), 
> and when there are changes, it is also very efficient.  Can either
> of these programs get close to that speed in the future?

Attic is as fast as it is right now because it avoids having to read
unmodified files by using a file cache. So as long as most/all of your
56000 files are unmodified it should be fairly quick but 56000 files in
1 second is probably hard to beat for a Python program.

/ Jonas

Re: Attic vs Obnam

From:
Dan Christensen
Date:
2014-02-01 @ 14:19
Jonas Borgström <jonas@borgstrom.se> writes:

> What kind of packages are you interested in? There are Attic packages
> for Debian, Ubuntu and Arch Linux. See the homepage for more details.

Debian and Ubuntu are what I need.  The installation part of the
manual here

  https://pythonhosted.org/Attic/installation.html

doesn't mention packages, nor does the homepage there.  Is there
another homepage I should know about?

Also, I'm still running Ubuntu 12.04 LTS, and I don't think there
are packages for this version.  Maybe a PPA could be set up to provide
them? 

>> - repository format not documented
>
> Good point, I'll try to document that.

Great!  I'm also curious how locking works.  I just tested simultaneous
writing, and saw that one writer was delayed until the other was done,
which is good.  Maybe you can say a few more words about this?  What
about read locks?

Can you also say how the blocks are chosen?  Are there situations for
which the de-duplication is poor?  Does the algorithm for de-duplication
scale to very large backups?

>> How resilient are the repositories that Obnam and Attic use?  If a
>> single sector fails on the hard-drive, how much will be lost?  Do the
>> programs deal with read errors without crashing?
>
> Right now Attic will detect a checksum mismatch and abort. You will need
> to explicity --exclude the damaged files to continue beyond that point.
> fsck/repair functionality is at the top of my list for 0.11. But since
> Attic repositories contain no redundant information even a single bad
> sector will result in some data loss. Depending on the actual location a
> single file might be destroyed but it could also kill an entire archive
> or even the entire repository if you are extremely unlucky.

Thanks.  That sounds quite reasonable.  I tested changing a single
byte in a data file, and it behaved as you said.  I think it would
be better if it continued after an error and extracted as many files
as possible.  If you are trying to recover from a damaged disk,
you want the operation to proceed as quickly as possible.  Having
to restart and re-extract files each time an error occurs would be
painful.

But what if there is a read error?  That's harder to test, but it
would also be good if attic would continue past it.  (Maybe controlled
by a flag.)

>> As another data point, I use unison to do bidirectional
>> synchronization of my home directory with an offsite server.  5GB,
>> 56000 files.  When there are no changes, it runs in 1 second (!), 
>> and when there are changes, it is also very efficient.  Can either
>> of these programs get close to that speed in the future?
>
> Attic is as fast as it is right now because it avoids having to read
> unmodified files by using a file cache. So as long as most/all of your
> 56000 files are unmodified it should be fairly quick but 56000 files in
> 1 second is probably hard to beat for a Python program.

Unison is written in ocaml, and it uses a file cache as well.  If the
file metadata hasn't changed, it assumes by default that the file hasn't
changed.  Is that how attic works?  If so, this should be documented,
since it can fail to detect changes.  (Maybe there could be an option
to select between metadata checking and full content checking?)

Here is a list of documentation suggestions, some mentioned above and
some other things:

- Document the repository format.  Maybe even twice, once at a high
  level, but in another place in enough detail to extract data from a
  repo.  Include a discussion of how resilient it is and the failure modes.
- Document repo locking.
- Document how blocks are chosen and how de-duplication works.
- Document pruning in more detail (just another sentence or two in the
  Usage section).
- Document that packages are available.

Also, I think you could update the documentation to make Attic more
appealing to others:

- You could provide more examples of the speed of backups and how small
  they are, similar to my benchmark data, but also including cases where
  things change.

- In the "remote repositories" section of the Quick Start, and anywhere
  else that discusses this, you should mention that you *don't* need
  Attic installed on the remote end:  you can use sshfs.  But that it is
  faster if you do have Attic installed at the remote end.  Some people
  may not be able to install Attic on a backup server, so this might
  scare them away from using Attic.

Thanks for a great program!  I will keep testing it and so you
should expect more questions and feedback (and maybe even some patches
at some point).

Dan

Re: [attic] Re: Attic vs Obnam

From:
Jonas Borgström
Date:
2014-02-02 @ 12:58
On 2014-02-01 15:19, Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> What kind of packages are you interested in? There are Attic packages
>> for Debian, Ubuntu and Arch Linux. See the homepage for more details.
> 
> Debian and Ubuntu are what I need.  The installation part of the
> manual here
> 
>   https://pythonhosted.org/Attic/installation.html
> 
> doesn't mention packages, nor does the homepage there.  Is there
> another homepage I should know about?

No, it's just no documented clearly enough. If you search for Debian or
Ubuntu on https://pythonhosted.org/Attic/ you'll find some links. But
that should obviously also be mentioned on the installation page.

> Also, I'm still running Ubuntu 12.04 LTS, and I don't think there
> are packages for this version.  Maybe a PPA could be set up to provide
> them? 

Yeah perhaps. I'm not a ubuntu user myself but I guess that's possible.

The debian+ubuntu have been built by Clint Adams but as far as I can
tell there's currently no packages for the stable versions of debian and
ubuntu. They are also still at version 0.8.

>>> - repository format not documented
>>
>> Good point, I'll try to document that.
> 
> Great!  I'm also curious how locking works.  I just tested simultaneous
> writing, and saw that one writer was delayed until the other was done,
> which is good.  Maybe you can say a few more words about this?  What
> about read locks?

Sure, a shared POSIX lock (lockf) is acquired as soon as a repository is
accessed and later upgraded to an exclusive lock when a transaction is
started.

> Can you also say how the blocks are chosen?  Are there situations for
> which the de-duplication is poor?  Does the algorithm for de-duplication
> scale to very large backups?

A rolling hash (buzhash) is used to identify "cut point" based on the
file contents. By splitting files into chunks using these cut points
will produce a stable set of chunks that are on average 64KB in size.
Attic then calculates a sha256 checksum of each chunk that is used to
identify already stored chunks.

Since a change as small as a single flipped byte will require at least
one new chunk to be stored de-duplication efficiency will be better the
larger the changes are.

Deduplication is fairly memory hungry since information about all known
chunks need to fit into memory. The current memory overhead is probably
around 100 bytes per chunk. So as long as you have enough ram it should
work fairly well with large backups.

>>> How resilient are the repositories that Obnam and Attic use?  If a
>>> single sector fails on the hard-drive, how much will be lost?  Do the
>>> programs deal with read errors without crashing?
>>
>> Right now Attic will detect a checksum mismatch and abort. You will need
>> to explicity --exclude the damaged files to continue beyond that point.
>> fsck/repair functionality is at the top of my list for 0.11. But since
>> Attic repositories contain no redundant information even a single bad
>> sector will result in some data loss. Depending on the actual location a
>> single file might be destroyed but it could also kill an entire archive
>> or even the entire repository if you are extremely unlucky.
> 
> Thanks.  That sounds quite reasonable.  I tested changing a single
> byte in a data file, and it behaved as you said.  I think it would
> be better if it continued after an error and extracted as many files
> as possible.  If you are trying to recover from a damaged disk,
> you want the operation to proceed as quickly as possible.  Having
> to restart and re-extract files each time an error occurs would be
> painful.
> 
> But what if there is a read error?  That's harder to test, but it
> would also be good if attic would continue past it.  (Maybe controlled
> by a flag.)

Agreed, that will probably change in the future.

>>> As another data point, I use unison to do bidirectional
>>> synchronization of my home directory with an offsite server.  5GB,
>>> 56000 files.  When there are no changes, it runs in 1 second (!), 
>>> and when there are changes, it is also very efficient.  Can either
>>> of these programs get close to that speed in the future?
>>
>> Attic is as fast as it is right now because it avoids having to read
>> unmodified files by using a file cache. So as long as most/all of your
>> 56000 files are unmodified it should be fairly quick but 56000 files in
>> 1 second is probably hard to beat for a Python program.
> 
> Unison is written in ocaml, and it uses a file cache as well.  If the
> file metadata hasn't changed, it assumes by default that the file hasn't
> changed.  Is that how attic works?  If so, this should be documented,
> since it can fail to detect changes.  (Maybe there could be an option
> to select between metadata checking and full content checking?)
> 
> Here is a list of documentation suggestions, some mentioned above and
> some other things:
> 
> - Document the repository format.  Maybe even twice, once at a high
>   level, but in another place in enough detail to extract data from a
>   repo.  Include a discussion of how resilient it is and the failure modes.
> - Document repo locking.
> - Document how blocks are chosen and how de-duplication works.
> - Document pruning in more detail (just another sentence or two in the
>   Usage section).
> - Document that packages are available.
> 
> Also, I think you could update the documentation to make Attic more
> appealing to others:
> 
> - You could provide more examples of the speed of backups and how small
>   they are, similar to my benchmark data, but also including cases where
>   things change.
> 
> - In the "remote repositories" section of the Quick Start, and anywhere
>   else that discusses this, you should mention that you *don't* need
>   Attic installed on the remote end:  you can use sshfs.  But that it is
>   faster if you do have Attic installed at the remote end.  Some people
>   may not be able to install Attic on a backup server, so this might
>   scare them away from using Attic.
> 
> Thanks for a great program!  I will keep testing it and so you
> should expect more questions and feedback (and maybe even some patches
> at some point).

Thanks, these are great suggestions. The documentation is very basic
right now and needs to be improved.

/ Jonas

data files deleted when listing a corrupt repo

From:
Dan Christensen
Date:
2014-02-05 @ 03:00
Jonas Borgström <jonas@borgstrom.se> writes:

> On 2014-02-01 15:19, Dan Christensen wrote:
>
>> How resilient are the repositories that Obnam and Attic use?  If a
>> single sector fails on the hard-drive, how much will be lost?  Do the
>> programs deal with read errors without crashing?
>
> Right now Attic will detect a checksum mismatch and abort. You will need
> to explicity --exclude the damaged files to continue beyond that point.
> fsck/repair functionality is at the top of my list for 0.11. But since
> Attic repositories contain no redundant information even a single bad
> sector will result in some data loss. Depending on the actual location a
> single file might be destroyed but it could also kill an entire archive
> or even the entire repository if you are extremely unlucky.

My latest tests have revealed some bad behaviour.  I did the following:

mkdir test-errors
cd test-errors; touch a b c d e f g; cd ..
attic init test-errors.attic
attic create test-errors.attic::1 test-errors
cp -a test-errors.attic corrupt.attic
emacs corrupt.attic/data/0/2  [change last byte to X]
attic list corrupt.attic

When I ran the last command, attic *removed* the data file data/0/2.
That's the only data file in this case, but when I did a similar
test for a large repo with many files, attic removed *all* of the
data files when just one byte was changed in one file.

The thing that really surprises me about this is that "attic list"
should be a read-only operation.  If there is some corruption, nothing
should be deleted unless the user explicitly initiates a repair of
some sort.  For example, the error could be transient, and you wouldn't
want an entire repo's data deleted because of this.

I have attached my repos, but this has happened almost every time I
have tried changing a byte in any repo, so it shouldn't be hard to
reproduce.

Dan

Re: [attic] data files deleted when listing a corrupt repo

From:
Jonas Borgström
Date:
2014-02-05 @ 13:11
On 2014-02-05 04:00 , Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> On 2014-02-01 15:19, Dan Christensen wrote:
>>
>>> How resilient are the repositories that Obnam and Attic use?  If a
>>> single sector fails on the hard-drive, how much will be lost?  Do the
>>> programs deal with read errors without crashing?
>>
>> Right now Attic will detect a checksum mismatch and abort. You will need
>> to explicity --exclude the damaged files to continue beyond that point.
>> fsck/repair functionality is at the top of my list for 0.11. But since
>> Attic repositories contain no redundant information even a single bad
>> sector will result in some data loss. Depending on the actual location a
>> single file might be destroyed but it could also kill an entire archive
>> or even the entire repository if you are extremely unlucky.
> 
> My latest tests have revealed some bad behaviour.  I did the following:
> 
> mkdir test-errors
> cd test-errors; touch a b c d e f g; cd ..
> attic init test-errors.attic
> attic create test-errors.attic::1 test-errors
> cp -a test-errors.attic corrupt.attic
> emacs corrupt.attic/data/0/2  [change last byte to X]
> attic list corrupt.attic
> 
> When I ran the last command, attic *removed* the data file data/0/2.
> That's the only data file in this case, but when I did a similar
> test for a large repo with many files, attic removed *all* of the
> data files when just one byte was changed in one file.
> 
> The thing that really surprises me about this is that "attic list"
> should be a read-only operation.  If there is some corruption, nothing
> should be deleted unless the user explicitly initiates a repair of
> some sort.  For example, the error could be transient, and you wouldn't
> want an entire repo's data deleted because of this.

It's a known issue. I actually added a comment in the code yesterday to
remind me to fix this while working on the repository check/repair code.

https://github.com/jborg/attic/blob/master/attic/repository.py#L345

An Attic repository works similar to a transactional log based
filesystem. When the repository is modified a transaction is opened and
a series of PUT and DELETE tags/operations are appended to one or more
segment files. The transaction is later committed by appending a COMMIT
tag at the end of the last segment file.

So whenever a repository is opened the code looks at the end of the last
segment file. If it does not end with a COMMIT tag it is assumed to be a
partial/aborted transaction. If so, the code continues to delete
segments until it finds a COMMIT tag.

This works well as long as the last bytes of the most recent segment
file are not corrupted or lost.

Ideally this "recovery" operation would be delayed until a read-write
transaction is requested. But that's not currently the case.

The fix is to delay the actual deletion until a COMMIT tag is found. If
no COMMIT tag is found the code should abort and tell the user the
repository is corrupted and needs to be repaired.

This issue also illustrates how important it is that the underlying
filesystem is reliable and why I'm a bit reluctant to recommend storing
repositories on potentially unreliable filesystems like nfs, samba and
sshfs.

Since I finished the first part of the repository check code yesterday I
will most likely fix this before starting to work on the repair code.

/ Jonas

Re: [attic] data files deleted when listing a corrupt repo

From:
Dan Christensen
Date:
2014-02-05 @ 15:02
Jonas Borgström <jonas@borgstrom.se> writes:

> So whenever a repository is opened the code looks at the end of the last
> segment file. If it does not end with a COMMIT tag it is assumed to be a
> partial/aborted transaction. If so, the code continues to delete
> segments until it finds a COMMIT tag.

Does one create operation (which may back up thousands of files) produce
just one COMMIT tag?  If so, that means that an entire backup can be
lost if just one byte is corrupted/unreadable.  Or is the checkpoint
feature exactly designed to do more frequent commits?  Is there any
other way to make the repo more resilient?  

> This works well as long as the last bytes of the most recent segment
> file are not corrupted or lost.

Does this explain the other behaviour I saw?  With a larger repository,
changing a single byte in one of the data files caused all of the data
files (about 15 of them) to be deleted when I tried to list the
repository.  Also, this deletion happened even if I changed a byte
in the middle of a data file, not at the beginning or end.

> This issue also illustrates how important it is that the underlying
> filesystem is reliable and why I'm a bit reluctant to recommend storing
> repositories on potentially unreliable filesystems like nfs, samba and
> sshfs.

I just read about another similar backup program, called bup, which
provides the option of using par2 to store parity data in the repo
which allows the data to be fully recovered even if a certain percentage
of the repo is unreadable.  It doesn't require storing that much extra
data.  I wonder if something like this could be built into attic?
The cost of having no redundancy in a repo is fragility, and for
backups, fragility is bad!

> Since I finished the first part of the repository check code yesterday I
> will most likely fix this before starting to work on the repair code.

Great!  Is it a read-only check?

One thought on the UI:  "attic verify /path/to/repo.attic::archive"
could verify just that archive, and "attic verify /path/to/repo.attic"
could verify the whole repo.  Easier to remember than two separate
commands.

Dan

Re: [attic] data files deleted when listing a corrupt repo

From:
Jonas Borgström
Date:
2014-02-05 @ 16:13
On 2014-02-05 16:02 , Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> So whenever a repository is opened the code looks at the end of the last
>> segment file. If it does not end with a COMMIT tag it is assumed to be a
>> partial/aborted transaction. If so, the code continues to delete
>> segments until it finds a COMMIT tag.
>
> Does one create operation (which may back up thousands of files) produce 
> just one COMMIT tag?  If so, that means that an entire backup can be
> lost if just one byte is corrupted/unreadable.  Or is the checkpoint
> feature exactly designed to do more frequent commits?  Is there any
> other way to make the repo more resilient?  

The checkpoint feature will ensure that a partial archive is committed
every 5 minutes so that the user will not have to start over from
scratch if a large backups is interrupted for some reason.

Except for the bug you noticed (which is easily fixable) a single
corrupted byte should not mean that the entire repository is lost.
All corruption does however mean that the repository will need to be
repaired before being usable again.

Since the repository repair code hasn't been written yet I can't know
for sure but I hope to be able to keep the data loss as small as
possible. But it's important to understand that all data corruption will
result in some data loss. But hopefully most minor corruption cases will
only result in loss of a single file or part of a file.

>> This works well as long as the last bytes of the most recent segment
>> file are not corrupted or lost.
> 
> Does this explain the other behaviour I saw?  With a larger repository,
> changing a single byte in one of the data files caused all of the data
> files (about 15 of them) to be deleted when I tried to list the
> repository.  Also, this deletion happened even if I changed a byte
> in the middle of a data file, not at the beginning or end.

The problem you observed should only occur if the last 9 bytes of the
most recent segment file does not contain the following byte sequence:

40 f4 3c 25 09 00 00 00 02

My guess is that you managed to change the last 9 bytes by mistake.
Perhaps you used a text editor that added a new-line character at the
end of the file?

But if you're able to reproduce I would be interested in learning how.

>> This issue also illustrates how important it is that the underlying
>> filesystem is reliable and why I'm a bit reluctant to recommend storing
>> repositories on potentially unreliable filesystems like nfs, samba and
>> sshfs.
> 
> I just read about another similar backup program, called bup, which
> provides the option of using par2 to store parity data in the repo
> which allows the data to be fully recovered even if a certain percentage
> of the repo is unreadable.  It doesn't require storing that much extra
> data.  I wonder if something like this could be built into attic?
> The cost of having no redundancy in a repo is fragility, and for
> backups, fragility is bad!

I think that sounds like a layer violation. Attic goes to great lengths
to make sure the data is correctly written to disk, for example at the
end of each segment os.fsync(fd) is called to force the OS to write the
data to disk before continuing.
After that it's the operating system's responsibility for making sure
the data is not lost or corrupted.

Without knowing things like physical sector size, the number of disks
and the ability to store data on individual disks it's very difficult to
implement this in a way that actually works when disks fail or silently
starts corrupting data.

IMHO you're much better of letting a RAID system or even better a
filesystem such as ZFS or BTRFS worry about this.

If you worry about silent data corruption one good solution I think is
to schedule a periodic "attic check" of the repository and if the check
succeeds the repository could be rsynced to an off-site location.

>> Since I finished the first part of the repository check code yesterday I
>> will most likely fix this before starting to work on the repair code.
> 
> Great!  Is it a read-only check?

Yes, and so far it only verifies the consistency of the repository and
not archive metadata itself.

Note: It's bit buggy right now. It doesn't work for remote repositories
unless --no-progress is specified and the verbose output is negated (it
says "No errors found" if an error is found).

Since a repair is potentially destructive it will have to be explicity
enabled. Probably something like this:

attic check --repair my-repo

> One thought on the UI:  "attic verify /path/to/repo.attic::archive"
> could verify just that archive, and "attic verify /path/to/repo.attic"
> could verify the whole repo.  Easier to remember than two separate
> commands.

Yeah, the plan is actually to drop the verify command since it's almost
identical to extract except no data is actually written.

https://github.com/jborg/attic/issues/25

So:

$ attic verify repo::archive some/file

would be replaced with:

$ attic extract --dry-run repo::archive some/file

That way we could add even more "modes" in the future, perhaps:

$ attic extract --compare repo::archive some/file

That would compare the stored data with the data on disk.

/ Jonas

Re: [attic] data files deleted when listing a corrupt repo

From:
Dan Christensen
Date:
2014-02-06 @ 14:34
Jonas Borgström <jonas@borgstrom.se> writes:

>>> Since I finished the first part of the repository check code yesterday I
>>> will most likely fix this before starting to work on the repair code.
>> 
>> Great!  Is it a read-only check?
>
> Yes, and so far it only verifies the consistency of the repository and
> not archive metadata itself.

I just tried your version

  4271ffa25fdb8e37fb55bcb6dbaaee82079fd18b

and it seems to work pretty well!  "attic list" no long deletes data,
and "attic check" reports most errors.  Thanks!

A few minor issues:

For one corrupt repo, when I run "attic check", it says:

  attic: Error: Inconsistency detected. Please "run attic check corrupt.attic"

I guess it has trouble even before getting to the check stage,
so maybe there's no point telling the user to run attic check?
(Also, the quotes are misplaced.)

I have another corrupt archive, where all data files were deleted
by attic list in the past, and I get this funny behaviour:

$ attic list corrupt2.attic
attic: Error: Repository corrupt2.attic does not exist
$ attic check corrupt2.attic
Check complete, no errors found.

Finally, I'm still noticing that for a small repo with two archives,
changing just the last byte of the one data file makes both archives
inaccessible.  I just wanted to check that this is expected.

> $ attic verify repo::archive some/file
>
> would be replaced with:
>
> $ attic extract --dry-run repo::archive some/file

Will there be a way to get attic to test the consistency of all
of the archives?

Dan

Re: [attic] data files deleted when listing a corrupt repo

From:
Jonas Borgström
Date:
2014-02-06 @ 17:18
On 2014-02-06 15:34, Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>>>> Since I finished the first part of the repository check code yesterday I
>>>> will most likely fix this before starting to work on the repair code.
>>>
>>> Great!  Is it a read-only check?
>>
>> Yes, and so far it only verifies the consistency of the repository and
>> not archive metadata itself.
> 
> I just tried your version
> 
>   4271ffa25fdb8e37fb55bcb6dbaaee82079fd18b
> 
> and it seems to work pretty well!  "attic list" no long deletes data,
> and "attic check" reports most errors.  Thanks!
> 
> A few minor issues:
> 
> For one corrupt repo, when I run "attic check", it says:
> 
>   attic: Error: Inconsistency detected. Please "run attic check corrupt.attic"
> 
> I guess it has trouble even before getting to the check stage,
> so maybe there's no point telling the user to run attic check?
> (Also, the quotes are misplaced.)

Yeah, there's a catch-22 thing going on there right now, but I'll fix
that later.

> I have another corrupt archive, where all data files were deleted
> by attic list in the past, and I get this funny behaviour:
> 
> $ attic list corrupt2.attic
> attic: Error: Repository corrupt2.attic does not exist
> $ attic check corrupt2.attic
> Check complete, no errors found.

Heh, that's confusing :)

> Finally, I'm still noticing that for a small repo with two archives,
> changing just the last byte of the one data file makes both archives
> inaccessible.  I just wanted to check that this is expected.

Yeah, but hopefully the upcoming repair code will be able to recover it
without any data loss.

>> $ attic verify repo::archive some/file
>>
>> would be replaced with:
>>
>> $ attic extract --dry-run repo::archive some/file
> 
> Will there be a way to get attic to test the consistency of all
> of the archives?

Yeah,

I was thinking that "attic check repo" should do these checks by default:

1. Verify repository consistency (This it what it does today)
2. Verify archive meta data.
3. Verify file chunks.

Steps 2 and 3 should be optional since they are more expensive and
requires repository encryption keys.

/ Jonas

Re: [attic] data files deleted when listing a corrupt repo

From:
Dan Christensen
Date:
2014-02-05 @ 19:11
Jonas Borgström <jonas@borgstrom.se> writes:

>> Does this explain the other behaviour I saw?  With a larger repository,
>> changing a single byte in one of the data files caused all of the data
>> files (about 15 of them) to be deleted when I tried to list the
>> repository.  Also, this deletion happened even if I changed a byte
>> in the middle of a data file, not at the beginning or end.
>
> The problem you observed should only occur if the last 9 bytes of the
> most recent segment file does not contain the following byte sequence:
>
> 40 f4 3c 25 09 00 00 00 02
>
> My guess is that you managed to change the last 9 bytes by mistake.
> Perhaps you used a text editor that added a new-line character at the
> end of the file?

Good guess, that was what happened.  Still, I was surprised to find that
if I have an attic repo with two archives in it and about 15 data files,
and I add a single byte to the most recent data file, all of the data
files get deleted.  Since there should be at least two commits, I
thought only part of the data would be lost.

>> I just read about another similar backup program, called bup, which
>> provides the option of using par2 to store parity data in the repo
>
> I think that sounds like a layer violation. 

You may be right.  But in the real world, one often can't choose the
file system of the remote servers one has access to (or the python
version...)

I realize that a lot of my comments aren't realistic, but I'm hoping
that by throwing various ideas out there, attic can be made as resilient
and reliable as possible, without sacrificing its ease of use and speed.

My personal strategy will be to back up each system to at least two
separate locations with at least one of them using RAID.  But it's still
good to keep in mind ideas that will reduce the chance of failure.

Dan

Re: [attic] Re: Attic vs Obnam

From:
Dan Christensen
Date:
2014-02-05 @ 04:06
Jonas Borgström <jonas@borgstrom.se> writes:

> On 2014-02-01 15:19, Dan Christensen wrote:
>
>> Great!  I'm also curious how locking works.  I just tested simultaneous
>> writing, and saw that one writer was delayed until the other was done,
>> which is good.  Maybe you can say a few more words about this?  What
>> about read locks?
>
> Sure, a shared POSIX lock (lockf) is acquired as soon as a repository is
> accessed and later upgraded to an exclusive lock when a transaction is
> started.

I just tested, and sshfs doesn't fully support POSIX locks.  If two
processes on the local machine both try to lock a remote file via the
same sshfs connection, it seems to work.  But if they use separate
sshfs connections, or if a remote process directly locks the remote
file, multiple locks aren't prevented.

Rats.  I have two remote servers that I was hoping to use that might be
awkward to install attic on (neither has python >= 3.2), so I was
hoping to use sshfs.  I guess I could wrap my backup scripts in "ssh
remotehost lockfile backup.lock" and "ssh remotehost rm -f backup.lock",
but it would be nicer and safer if this wasn't needed.

Can you think of any ways to make this work better?  Could attic take
out two different types of locks instead of just one?  Or could attic
natively support sshfs or sftp, and do what is needed to make it safe?
(Obnam supports sftp internally and documents sshfs as an option, so 
I'm guessing they have figured out how to safely lock these.)

A final idea would be to write a minimal python program that could run
at the remote end and provide the very basic server features that attic
needs (file I/O, locking, etc).  Ideally, this would be something that
can run under python2.7 with no dependencies.  But maybe this is not
realistic.

Oh, one more idea:  has anyone tried to make a standalone attic
executable with something like cx_freeze?  Probably tricky, since the
systems are running very different versions of linux.

Dan

Re: [attic] Re: Attic vs Obnam

From:
Jonas Borgström
Date:
2014-02-05 @ 13:13
On 2014-02-05 05:06 , Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> On 2014-02-01 15:19, Dan Christensen wrote:
>>
>>> Great!  I'm also curious how locking works.  I just tested simultaneous
>>> writing, and saw that one writer was delayed until the other was done,
>>> which is good.  Maybe you can say a few more words about this?  What
>>> about read locks?
>>
>> Sure, a shared POSIX lock (lockf) is acquired as soon as a repository is
>> accessed and later upgraded to an exclusive lock when a transaction is
>> started.
> 
> I just tested, and sshfs doesn't fully support POSIX locks.  If two
> processes on the local machine both try to lock a remote file via the
> same sshfs connection, it seems to work.  But if they use separate
> sshfs connections, or if a remote process directly locks the remote
> file, multiple locks aren't prevented.
> 
> Rats.  I have two remote servers that I was hoping to use that might be
> awkward to install attic on (neither has python >= 3.2), so I was
> hoping to use sshfs.  I guess I could wrap my backup scripts in "ssh
> remotehost lockfile backup.lock" and "ssh remotehost rm -f backup.lock",
> but it would be nicer and safer if this wasn't needed.
> 
> Can you think of any ways to make this work better?  Could attic take
> out two different types of locks instead of just one?  Or could attic
> natively support sshfs or sftp, and do what is needed to make it safe?
> (Obnam supports sftp internally and documents sshfs as an option, so 
> I'm guessing they have figured out how to safely lock these.)

I've never looked at the Obnam source code but the only other option I
can think of right now is to create a $repository/lock file with
open(O_CREAT|O_EXCL) which might work if that's an atomic operation with
sshfs. One major drawback with this approach is that the file would have
to be manually removed to unlock a repository after a network failure or
program crash.

> A final idea would be to write a minimal python program that could run
> at the remote end and provide the very basic server features that attic
> needs (file I/O, locking, etc).  Ideally, this would be something that
> can run under python2.7 with no dependencies.  But maybe this is not
> realistic.

That seems like a lot of work. And would be a bit awkward since sshfs
looks like a regular local fileystem to Attic and it has no way of
knowing how and where to login to run that program.

> Oh, one more idea:  has anyone tried to make a standalone attic
> executable with something like cx_freeze?  Probably tricky, since the
> systems are running very different versions of linux.

Yeah, that would be handy. Twitter's pex (part of twitter.common.python)
looks pretty cool.

$ pex -r 'Attic' -e attic.archiver:main -p attic

would create a binary executable of attic that only depends on the
systems version of python (and libcrypto). In theory at least, I haven't
played with it much.

/ Jonas

Re: [attic] Re: Attic vs Obnam

From:
Dan Christensen
Date:
2014-02-05 @ 15:34
Jonas Borgström <jonas@borgstrom.se> writes:

> I've never looked at the Obnam source code but the only other option I
> can think of right now is to create a $repository/lock file with
> open(O_CREAT|O_EXCL) which might work if that's an atomic operation with
> sshfs. One major drawback with this approach is that the file would have
> to be manually removed to unlock a repository after a network failure or
> program crash.

I don't know whether that's atomic with sshfs, but it's worth
investigating.  When attic is not able to get the lock, it could report
that, and say to rerun the command with an option like "--steal-lock".

>> A final idea would be to write a minimal python program that could run
>> at the remote end and provide the very basic server features that attic
>> needs (file I/O, locking, etc).  Ideally, this would be something that
>> can run under python2.7 with no dependencies.  But maybe this is not
>> realistic.
>
> That seems like a lot of work. And would be a bit awkward since sshfs
> looks like a regular local fileystem to Attic and it has no way of
> knowing how and where to login to run that program.

Actually, I meant this as a way to avoid using sshfs.  The program 
would be like "attic serve", but pared down to the minimum required
to act as a server.  But I agree that it sounds like a lot of work.
In another year, it will become pretty standard to have python >= 3.2
on more machines.  Still, in an ideal world, it would be great to
be able to run attic just using an ssh connection to a remote machine
(no software installed there), and to still have the current speed
and security.  Maybe sftp support is the solution?  Not sure how
it handles locking.

>> Oh, one more idea:  has anyone tried to make a standalone attic
>> executable with something like cx_freeze?  Probably tricky, since the
>> systems are running very different versions of linux.
>
> Yeah, that would be handy. Twitter's pex (part of twitter.common.python)
> looks pretty cool.
>
> $ pex -r 'Attic' -e attic.archiver:main -p attic
>
> would create a binary executable of attic that only depends on the
> systems version of python (and libcrypto).

I couldn't find good documentation on this, but if it requires that the
machine already has the right python installed, then it won't really
solve the problem.  I believe that programs like cx_freeze bundle
python into the executable they produce.  Yes, I just tested it.
I made a trivial python3.2 program cxtest.py

  print('hello world')

and I ran "cxfreeze cxtest.py" (under python3.2), and it produced
a directory dist.  I had to manually copy three libraries to this
folder:
  cp /lib/x86_64-linux-gnu/libssl.so.1.0.0 dist
  cp /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 dist
  cp /lib/x86_64-linux-gnu/libc.so.6 dist
Then I copied that folder to a machine that only has python2.6, and
the executable ran correctly.

I think it's tricky to do this with a large project like attic,
but if I can't get python 3.2 installed on my servers, I may try.

Dan

Re: [attic] Re: Attic vs Obnam

From:
Jonas Borgström
Date:
2014-02-05 @ 16:24
On 2014-02-05 16:34 , Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> I've never looked at the Obnam source code but the only other option I
>> can think of right now is to create a $repository/lock file with
>> open(O_CREAT|O_EXCL) which might work if that's an atomic operation with
>> sshfs. One major drawback with this approach is that the file would have
>> to be manually removed to unlock a repository after a network failure or
>> program crash.
> 
> I don't know whether that's atomic with sshfs, but it's worth
> investigating.  When attic is not able to get the lock, it could report
> that, and say to rerun the command with an option like "--steal-lock".

Perhaps, but it would suck if you find that your automated nightly
backup hasn't been running for the last 6 months because of a stale lock
file.

>>> A final idea would be to write a minimal python program that could run
>>> at the remote end and provide the very basic server features that attic
>>> needs (file I/O, locking, etc).  Ideally, this would be something that
>>> can run under python2.7 with no dependencies.  But maybe this is not
>>> realistic.
>>
>> That seems like a lot of work. And would be a bit awkward since sshfs
>> looks like a regular local fileystem to Attic and it has no way of
>> knowing how and where to login to run that program.
> 
> Actually, I meant this as a way to avoid using sshfs.  The program 
> would be like "attic serve", but pared down to the minimum required
> to act as a server.  But I agree that it sounds like a lot of work.
> In another year, it will become pretty standard to have python >= 3.2
> on more machines.  Still, in an ideal world, it would be great to
> be able to run attic just using an ssh connection to a remote machine
> (no software installed there), and to still have the current speed
> and security.  Maybe sftp support is the solution?  Not sure how
> it handles locking.

Another option would be to implement proper flock support in sshfs.

>>> Oh, one more idea:  has anyone tried to make a standalone attic
>>> executable with something like cx_freeze?  Probably tricky, since the
>>> systems are running very different versions of linux.
>>
>> Yeah, that would be handy. Twitter's pex (part of twitter.common.python)
>> looks pretty cool.
>>
>> $ pex -r 'Attic' -e attic.archiver:main -p attic
>>
>> would create a binary executable of attic that only depends on the
>> systems version of python (and libcrypto).
> 
> I couldn't find good documentation on this, but if it requires that the
> machine already has the right python installed, then it won't really
> solve the problem.  I believe that programs like cx_freeze bundle
> python into the executable they produce.  Yes, I just tested it.
> I made a trivial python3.2 program cxtest.py
> 
>   print('hello world')
> 
> and I ran "cxfreeze cxtest.py" (under python3.2), and it produced
> a directory dist.  I had to manually copy three libraries to this
> folder:
>   cp /lib/x86_64-linux-gnu/libssl.so.1.0.0 dist
>   cp /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 dist
>   cp /lib/x86_64-linux-gnu/libc.so.6 dist
> Then I copied that folder to a machine that only has python2.6, and
> the executable ran correctly.
> 
> I think it's tricky to do this with a large project like attic,
> but if I can't get python 3.2 installed on my servers, I may try.

Compiling python3 is fairly easy, that might be an option if you're not
root but you're able to compile stuff.

/ Jonas

Re: [attic] Re: Attic vs Obnam

From:
Dan Christensen
Date:
2014-02-05 @ 19:19
Jonas Borgström <jonas@borgstrom.se> writes:

> On 2014-02-05 16:34 , Dan Christensen wrote:
>
>> I don't know whether that's atomic with sshfs, but it's worth
>> investigating.  When attic is not able to get the lock, it could report
>> that, and say to rerun the command with an option like "--steal-lock".
>
> Perhaps, but it would suck if you find that your automated nightly
> backup hasn't been running for the last 6 months because of a stale lock
> file.

Well, attic could update the timestamp on the lock every 5 minutes,
and automatically steal a lock older than a hour.  But I agree that
this complexity might not be worth it.

> Another option would be to implement proper flock support in sshfs.

Yes, I wonder why they haven't done so.  It seems like they should be
able to forward the request to the other end, try the lock, and return
the result.

In the meantime, could you add something like the following to
quickstart.rst?

  However, be aware that sshfs doesn't fully implement POSIX locks, so
  you must be sure to not have two processes trying to access the same
  repository at the same time.

(Or you could drop the sshfs stuff, but I think it is a big selling
point that you don't need to install attic remotely.)

Does the pythonhosted site get updated automatically?

> Compiling python3 is fairly easy, that might be an option if you're not
> root but you're able to compile stuff.

Yes, I could do so.  But it'll help attract users if attic requires as
little as possible to get running.

Dan

Re: [attic] Re: Attic vs Obnam

From:
Jonas Borgström
Date:
2014-02-06 @ 17:10
On 2014-02-05 20:19, Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> On 2014-02-05 16:34 , Dan Christensen wrote:
>>
>>> I don't know whether that's atomic with sshfs, but it's worth
>>> investigating.  When attic is not able to get the lock, it could report
>>> that, and say to rerun the command with an option like "--steal-lock".
>>
>> Perhaps, but it would suck if you find that your automated nightly
>> backup hasn't been running for the last 6 months because of a stale lock
>> file.
> 
> Well, attic could update the timestamp on the lock every 5 minutes,
> and automatically steal a lock older than a hour.  But I agree that
> this complexity might not be worth it.

Yeah that might work. I'll have to look into the details about how other
program deal with their lock-files.
> 
>> Another option would be to implement proper flock support in sshfs.
> 
> Yes, I wonder why they haven't done so.  It seems like they should be
> able to forward the request to the other end, try the lock, and return
> the result.

Yeah, but perhaps the fuse api might not expose that information.

> 
> In the meantime, could you add something like the following to
> quickstart.rst?
> 
>   However, be aware that sshfs doesn't fully implement POSIX locks, so
>   you must be sure to not have two processes trying to access the same
>   repository at the same time.
> 
> (Or you could drop the sshfs stuff, but I think it is a big selling
> point that you don't need to install attic remotely.)

I'll do some sshfs testing to see how well it works. But it's sounds
promising.

> 
> Does the pythonhosted site get updated automatically?

No, it's updated manually. That reminds me. It might be time to register
a domain name. Do you have any good ideas? All the obvious ones are
already taken...

>> Compiling python3 is fairly easy, that might be an option if you're not
>> root but you're able to compile stuff.
> 
> Yes, I could do so.  But it'll help attract users if attic requires as
> little as possible to get running.

Yeah, that's true.

/ Jonas

Re: [attic] Re: Attic vs Obnam

From:
Dan Christensen
Date:
2014-02-06 @ 19:54
Jonas Borgström <jonas@borgstrom.se> writes:

> On 2014-02-05 20:19, Dan Christensen wrote:
> 
>> Well, attic could update the timestamp on the lock every 5 minutes,
>> and automatically steal a lock older than a hour.  But I agree that
>> this complexity might not be worth it.
>
> Yeah that might work. I'll have to look into the details about how other
> program deal with their lock-files.

Here's an excerpt from the locking documentation from the Obnam ondisk
information at http://liw.fi/obnam/ondisk/

  Locking is done only for writes. Reads are always allowed, even while
  write locks exist. This allows race conditions between readers and
  writers, but thanks to copy-on-write updates those are no different
  from files getting corrupted or deleted on the server by hardware
  failures, and can be treated the same.
  
  ...
  
  Locks are implemented as files, which are created atomically. Each
  lock file contains the name of the host that holds it (which might not
  be a backup client), and the process id on that client, and the time
  of creating the lock. If the time is very old, another client may
  decide to break the lock. ...
  
  To reduce lock congestion, each client attempts to keep a lock for as
  short a time as possible. For per-client data, this means keeping the
  lock for the duration of the backup. For shared forests, updates can
  be spooled: the shared forest is used read-only until the end of the
  backup run, or until a checkpoint, and updated then, as quickly as
  possible.

Notice that there are no read locks.  If attic needs read locks, that
might be harder to do with lock files and still allow sharing.  Not
sure.

One idea would be to allow the user to specify --use-lockfile if they
want to specify this kind of locking, and then they might lose
simultaneous read locks.

>>> Another option would be to implement proper flock support in sshfs.
>> 
>> Yes, I wonder why they haven't done so.  It seems like they should be
>> able to forward the request to the other end, try the lock, and return
>> the result.
>
> Yeah, but perhaps the fuse api might not expose that information.

I now see that sshfs uses sftp as the underlying channel, and I would
bet that sftp doesn't support POSIX locks.  But it probably supports
file-based locks as described above, since Obnam cares a lot about
locking, and they support sftp.

Yes, according to this discussion, sshfs and sftp support O_CREAT|O_EXCL
locks since 2006, and darcs now falls back to this kind of lock when
it needs to:  http://bugs.darcs.net/issue904

Incidentally, native sftp support might not be too hard to add to attic.
I've read that the paramiko library is good for this, and it supports
O_EXCL, but unfortunately they are still working on python 3 support.

> That reminds me. It might be time to register a domain name. Do you
> have any good ideas? All the obvious ones are already taken...

These seem to be available:

atticbackup.org/com/net
attic-backup.org/com/net
battic.org/net

Dan