librelist archives

« back to archive

Comparison of Attic vs Bup vs Obnam

Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-03-31 @ 18:30
Hi all

I have spent the past two months poking and prodding at various
de-duplicating backup systems. To cut a very long story short, this has
ultimately resulted in comparison of Attic, Bup, and Obnam.

I started out using Bup, and it's pretty good. I set up some servers to do
their regular backups with it, and that worked a treat. So I started a
backup on our main fileserver, and it took ten days to backup 7TB of data...
okay, that's a fairly long time, but maybe acceptable. Then I tried
extracting some of that data, and realised I had a serious problem: it took
*ages*!

Hence I went looking for alternatives. I found Obnam, and it seemed very
promising, but is terribly slow. I found some other tools that for various
reasons did not fit what I was looking for.

And I found Attic.

I ended up doing performance tests of Attic vs Bup vs Obnam, possibly with
larger data sets than anyone else has tried (certainly larger than I have
read about). Along the way, I have run into some interesting issues, which I
will write to the list about separately, or log as bugs in some cases.

The outcome, to summarise the details below, is that Attic wins hands-down.
That's great! But it is also currently unusable as a universal solution for
me, for two reasons: 1) data corruption on large repositories, and 2) lack
of sparse and special file handling. I can live with #2 and get around it,
but #1 is critical.

Anyway, this email is purely about the results of my performance testing,
which people may or may not find interesting. Hopefully it will be of some
use to others looking for similar information - I did find a thread about
Attic vs Obnam, but the data set was much smaller.

So without more ado:


TEST SYSTEM 1:

  - Purpose: desktop machine
  - OS: Ubuntu 14.04 LTS (Trusty Tahr) 64-bit
  - Quad-core Intel i7-950 @ 3.07GHz (hyperthreaded)
  - 12GB RAM
  - Main OS on solid-state array (RAID 0)
  - Data storage solid-state array (RAID 0)
  - Tests took place using both arrays

TEST SYSTEM 2:

  - Purpose: fileserver
  - OS: Debian 8 (Jessie) 64-bit
  - Quad-core Intel i7-3770 @ 3.40GHz (hyperthreaded)
  - 32GB RAM
  - Main OS on solid-state array (RAID 0)
  - Data storage on various arrays over a total of 24 spinning drives
  - Tests took place on a RAID 6 array with 12x 2TB drives (7200rpm Seagate
Barracudas)

Note: Both systems ended up using the same Attic version, the latest code
from master (I've created my own Debian packages because the ones
generally-available are out-of-date and buggy).

I should also mention that all timing figures are subject to some small
degree of inaccuracy - although the systems were kept as quiet as possible
whilst the tests were occurring, the conditions were not perfect.


FIRST TEST
==========

My first test was fairly small, and comprised of me backing up my home
directory, which is composed of 15GB of files. I timed the initial backup
(for Bup I added the time taken for `bup index`), and then ran a subsequent
backup with no changes. I then restored everything in the repository to the
same disk, and then to a different disk, to try and factor out drive speed
effects.

Obnam was tricky. It has a fair few options and tuning parameters that
affect things. I ultimately found no difference trying different tuning
parameters, but I did find a huge difference turning on compression using
deflate (I know, it's not default! Boo!).

Of particular concern is that Obnam has a theoretical collision potential,
in that if a block has the same MD5 hash as another block, it will assume
they are the same. This behaviour is the default, but can be mitigated by
using the verify option. I tried with and without, and interestingly did not
notice any speed difference (2 seconds, which is statistically
insignificant) and also did not encounter any bad data on restoration. So I
don't know why it's off by default.

It's worth noting that Obnam with default settings was faster to backup than
Attic and Bup, but did not result in any space saving at all. From what I
can tell, it performed zero de-duplication of my data. From what I have
read, this seems to be because of the chunking method used, compared to the
rolling hash of Attic and Bup. Of course, subsequent backups will save
space, but Obnam with default settings is in my view pretty much useless.

Still, here are the results of Obnam alone, under four different
configurations:


1.  Backup home (initial) (from different disk) (Obam only)
-----------------------------------------------------------

    Number of entities:   26,785
    Number of files:      24,838
    Total data set size:    15GB

               Default   Verify  Deflate  Ver+Def
    ---------------------------------------------
    Time         08:42    08:40    11:55    11:54
    No. files   38,088   38,090   38,094   38,093
    Size          15GB     15GB    4.5GB    4.5GB

I'll spare you the rest of the data I collected about Obnam vs Obnam,
because it's not all that relevant. However, restoration was over 17 minutes
if not using deflate when backing up, and only 4:30 with it enabled.

All of the subsequent Obnam results for comparison use the deflate and
verify options.

Here are the results for Bup vs Obnam vs Attic:


2.  Backup home (initial) (from different disk)
-----------------------------------------------

    Number of entities:   26,785
    Number of files:      24,838
    Total data set size:    15GB

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time         06:05    06:55    09:43    11:54    10:24
    No. files                         26   38,093      764
    Size                           3.8GB    4.5GB    3.8GB

I have also shown the times taken by cp and rsync, for comparison. Bup and
Attic are pretty close in terms of time, and near enough identical on disk
space used, but Obnam lags behind a bit and notably uses an extraordinary
number of files! It also suffers because it is only benefitting from
compression and not de-duplication.


3.  Backup home (subsequent) (no changes)
-----------------------------------------

    Number of entities:        0 (none changed)
    Number of files:           0
    Total data set size:       0

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -    00:01    00:03    00:08    00:04
    No. files                         28   38,164      764
    Size                           3.8GB    4.5GB    3.8GB

Not much to say about this, just here for completeness. Obnam trails behind.


4.  Restore home (to same/different disk)
-----------------------------------------

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time (same)  07:25    08:25    04:10    04:28    02:34
    Time (diff)  07:11    08:13    04:03    04:13    02:03

Here we can see that although restoring to a different disk did help ever so
slightly, this operation is not really particularly disk-bound, as Attic
shows with a quite amazing result. The standard cp and rsync commands lag
behind because they have to read and write the entire 15GB, but even so,
Attic blazes ahead of both Obnam and Bup.

There was one problem restoring, however: Attic failed to restore a socket
file. Ouch! The other two restored the file just fine.

Next I tried adding a sparse file, to see how each program would handle it -
I created a 5GB sparse file and filled the first 1GB with noise from
/dev/urandom:


5.  Backup home (subsequent) (sparse file)
------------------------------------------

    Number of entities:        1 (one added)
    Number of files:           1
    Total data set size:     1GB

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -    00:36    09:30    01:18    09:24
    No. files                         33   39,262      966
    Size                           4.8GB    5.5GB    4.8GB

Obnam managed a major win here because it recognises sparse files and
handles them efficiently. Restoration on that one file was a similar story:


6.  Restore home (sparse file only)
-----------------------------------

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -        -    00:47    01:03    01:21
    Size                           5.0GB    1.0GB    5.0GB

This result was a bit of a surprise, in that Attic trailed behind Bup, just
in this one test. Unfortunately, only Obnam passed the test, correctly
restoring the file sparsely - Bup and Attic did not. This would be a huge
problem for me in backing up virtual machine images.

So far, the results are mixed. Attic seems to win for general restoration
speed, but doesn't handle sparse files and actually misses some files
altogether (sockets). Obnam is the most accurate, but loses out on
everything else - speed, disk space, number of files used, you name it. Bup
has a few extra features compared to Attic, but has a rather nasty usage of
index and then save, and nasty .bup directories everywhere (although Attic
creates its own, without telling you!) - Obnam needs no cache at all,
somehow.


SECOND TEST
===========

My next set of tests were with some Big Data. Part of the objective of this
whole exercise has been to find a tool suitable for use to regularly back up
critical data to a de-duplicated repository, which itself will live on an
external array with BTRFS snapshots (there's never too much paranoia when it
comes to backups!). So I tested each program with this.


7.  Backup fileserver (initial)
-------------------------------

    Number of entities:   5,120,641
    Number of files:      4,385,287
    Total data set size:      7.2TB

                         Bup          Obnam          Attic
    ------------------------------------------------------
    Time           256:00:00  DNF: 41:36:44+     100:58:47
    No. files         12,136  DNF:2,700,000+     1,149,668
    Backup size        5.7TB  DNF:    689GB+         5.6TB
    Cache size          44GB              -          9.7GB

I had already run Bup back when I thought it would be the tool of choice, so
my time is approximate. I recorded that it took 10.5 days but I did not have
the minutes and seconds hence have rounded the hours.

Attic is more than twice as quick as Bup on this dataset. It does however
suffer from using a large number of files (not a critical concern) and I did
encounter some problems along the way (my first attempt failed and I had to
use the latest code from master branch). It's also notable how much smaller
Attic's cache is when compared to Bup.

At this point Obnam falls by the wayside. After almost two days it had only
scanned 1.3TB of data, and hence was set to take longer than Bup. It also
uses a *colossal* number of files. There were no errors, but I stopped the
test because it was apparent that it is not a contender. Hence the values
are where it got to when it was stopped, and not final values, so cannot be
used for comparison.

The next test was to try and restore some files. I had a particular
directory of around 6GB which I had tried to restore with Bup, and had been
horrified by the time taken (in my view, it is extremely important that
restorations should be as quick as possible). Time to try Attic:


8.  Restore fileserver (specific folder)
----------------------------------------

    Number of entities:        4198
    Number of files:           2933
    Total data set size:      6.2GB

                         Bup          Obnam          Attic
    ------------------------------------------------------
    Time            03:34:51              -          05:58

Wow. That's right - just under six *minutes*! That's impressive. What's even
more impressive is that it restored 99% of that data in less than one
minute, and presumably spent the other five checking stuff.

On large data sets, where large measures in the terabytes, I cannot use Bup
- restoration is too slow to be usable, even if I was willing to put up with
the slower backup speed. I cannot even think about using Obnam. Attic wins,
hands down.


9.  Restore fileserver (everything, after sync)
-----------------------------------------------

The next step was meant to be doing a sync (i.e. running another backup to
pick up any changed files) and then restoring the entire repository and
running a diff on the result, to satisfy myself that everything works
correctly. Unfortunately, Attic failed this test. After performing the
second backup, I encountered errors (data corruption reported) during the
restoration. Running cleanup did not help. Deleting the backup fixed the
issue, but then when I ran another backup the problem returned. So I cannot
currently benchmark this scenario or verify data integrity.


COMMANDS USED
=============

For those that are interested, here is a list of commands used with the
various programs:

cp
    time cp /data/testdata . -aR

rsync
    time rsync -aAX /data/testdata/ rsync/

Bup
    mkdir bup
    BUP_DIR=$(pwd)/bup bup init
    BUP_DIR=$(pwd)/bup time bup index /data/testdata
    BUP_DIR=$(pwd)/bup time bup save -n test /data/testdata

    BUP_DIR=$(pwd)/bup time bup restore -C restored.bup test/latest

Obnam
    time obnam backup -r obnam /data/testdata
    time obnam backup --deduplicate=verify -r obnam.v /data/testdata
    time obnam backup --compress-with=deflate -r obnam.c /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify -r
obnam.cv /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify
--lru-size=1024 --upload-queue-size=8192 -r obnam.cvt /data/testdata

    time obnam restore -r obnam --to restored.obnam

Attic
    HOME=$(pwd)/attic.home attic init attic
    HOME=$(pwd)/attic.home time attic create attic::First /data/testdata

    mkdir restored.attic; cd restored.attic
    HOME=$(pwd)/../attic.home time attic extract ../attic::Second

Sparse file
    dd if=/dev/urandom of=sparse bs=1G count=1
    truncate -s 5G sparse
    du -sh sparse
    du -sh --apparent-size sparse


Overall I am very impressed with Attic, and I think with a couple of bugs
fixed and perhaps some minor features added it will be the best tool of this
kind around.

Thanks, Jonas - and more recently, Thomas, too - for your hard work on
Attic!

Cheers

Dan


Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
SanskritFritz
Date:
2015-04-06 @ 17:46
On Tue, Mar 31, 2015 at 8:30 PM, Dan Williams <dan@dotfive.co.uk> wrote:

>
> COMMANDS USED
> =============
>
> For those that are interested, here is a list of commands used with the
> various programs:
>
> Obnam
>     time obnam backup -r obnam /data/testdata
>     time obnam backup --deduplicate=verify -r obnam.v /data/testdata
>     time obnam backup --compress-with=deflate -r obnam.c /data/testdata
>     time obnam backup --compress-with=deflate --deduplicate=verify -r
> obnam.cv /data/testdata
>     time obnam backup --compress-with=deflate --deduplicate=verify
> --lru-size=1024 --upload-queue-size=8192 -r obnam.cvt /data/testdata
>
>     time obnam restore -r obnam --to restored.obnam
>

Just one observation about Obnam usage. The --deduplicate switch causes
Obnam to verify hash collisions (very unlikely) by downloading the chunk
again and comparing it by contents. This may cause a huge speed reduction.
I think this could be the cause of the speed difference when you compared
subsequent backups. What do you think?

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-07 @ 14:43
I was expecting that to be the case – but in fact, there was no difference
at all (on my data) when using the --verify option. Strange huh?

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of SanskritFritz
Sent: 06 April 2015 18:47
To: attic@librelist.com
Subject: Re: [attic] Comparison of Attic vs Bup vs Obnam

 

 

 

On Tue, Mar 31, 2015 at 8:30 PM, Dan Williams <dan@dotfive.co.uk> wrote:


COMMANDS USED
=============

For those that are interested, here is a list of commands used with the
various programs:

Obnam
    time obnam backup -r obnam /data/testdata
    time obnam backup --deduplicate=verify -r obnam.v /data/testdata
    time obnam backup --compress-with=deflate -r obnam.c /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify -r
obnam.cv /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify
--lru-size=1024 --upload-queue-size=8192 -r obnam.cvt /data/testdata

    time obnam restore -r obnam --to restored.obnam

 

Just one observation about Obnam usage. The --deduplicate switch causes 
Obnam to verify hash collisions (very unlikely) by downloading the chunk 
again and comparing it by contents. This may cause a huge speed reduction.
I think this could be the cause of the speed difference when you compared 
subsequent backups. What do you think?

 

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Thomas Waldmann
Date:
2015-04-01 @ 12:25
Hi Dan,

very interesting tests you did and issues you found.

I'm glad you did tests with large (like some terabytes) volumes, because
this is something I can not easily do (I just don't have that much data,
nor such a lot of storage space here).

Maybe if you find something in attic (original, master branch), you could
check "merge" repo, branches "merge" and "merge-all" branch also (not
"master" branch there, it is same as original master branch).

"merge" branch is rather conservative, there should be no breakage or
incompatibilites there, just fixes and improvements. If you look into the
main issue tracker, I usually add a note like "fixed, see PR #XXX" at the
end of quite a lot of issues I have fixed recently (but as long as the pull
request is not merged, they are not fixed in master - that's why I created
that "merge" branch, to have all the fixes in there).

"merge-all" branch has a lot more improvements (like way faster
compression, faster crypto, faster hashes, ...), but you should be careful
as there might be compatibility issues (esp. if mixing into same repo as
original attic, if backing up with merge-all code and trying to restore
with original code or even later versions of merge-all code). So, in short,
don't use "merge-all" code right now for production backups.

Why I am pointing this out is not just to get my changes tested, but also:
- so we avoid double work (and not hunt stuff that is already fixed, but
not merged into master yet)
- you get better error reporting (exception handling in 0.14/master is
problematic and not giving enough information - especially if remote
repositories come into play)

About issue reporting:

If you find something that happens (also) in original attic / master
branch, please file it into jborg/attic issue tracker.
If you find something that happens only(!) in merge/merge-all, please file
it into attic/merge repo's issue tracker.

In general, rather file issues in cleanly separated issue tracker issues -
they are just easier to track (and fix and close) there compared to the
mailing list.

Thanks,

Thomas

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Christensen
Date:
2015-04-01 @ 13:16
[Different Dan here.]

Thomas Waldmann <thomas.j.waldmann@gmail.com> writes:

> "merge-all" branch has a lot more improvements (like way faster
> compression, faster crypto, faster hashes, ...), but you should be
> careful as there might be compatibility issues (esp. if mixing into
> same repo as original attic, if backing up with merge-all code and
> trying to restore with original code or even later versions of
> merge-all code). So, in short, don't use "merge-all" code right now
> for production backups.

I'd love to see Dan Williams repeat some of his benchmarks using
the faster code from the merge-all branch!

Dan

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-01 @ 13:35
:o) I will happily do that if Thomas thinks the code is ready...?


 >> [Different Dan here.]
 >> 
 >> Thomas Waldmann <thomas.j.waldmann@gmail.com> writes:
 >> 
 >> > "merge-all" branch has a lot more improvements (like way faster
 >> > compression, faster crypto, faster hashes, ...), but you should be
 >> > careful as there might be compatibility issues (esp. if mixing into
 >> > same repo as original attic, if backing up with merge-all code and
 >> > trying to restore with original code or even later versions of
 >> > merge-all code). So, in short, don't use "merge-all" code right now
 >> > for production backups.
 >> 
 >> I'd love to see Dan Williams repeat some of his benchmarks using
 >> the faster code from the merge-all branch!
 >> 
 >> Dan

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-01 @ 13:08
Hi Thomas

 

I saw your email recently about the branches you are working on, but I have
been meaning to ask a question actually: what is the official position of
the Attic project on these branches? What I mean is, I know that Jonas has
less time at present (due to new baby etc.) but I want to know if his
support is behind what you are doing, or if it's just a hopeful endeavour.
The reason I ask is that I want to avoid using a different version of Attic
if Jonas comes back and goes in a different direction, or if there's a
community/project split, etc. So it seems sensible to ask the question!
Ideally he is supportive of your input, and these changes will become a part
of the official master branch before long.?

 

I did look at the summary of changes in the merge and merge-all branches
recently, but they did not seem to address any of my direct issues (although
I would like to benefit from the speed increases etc.). I did not want to
attempt tests on non-standard code - how robust is your work? I know nothing
about the Attic community or team other than what I have read on the mailing
list - other than Jonas, and recently you, I don't know if there are any
other core team members that contribute. Should I see your direction as safe
for me to use? Etc. E.g. merge-all being untrusted makes me wonder about
what automated tests are in place etc. A few words about this and how your
direction fits with official Attic and the verification procedures would be
great.

 

This is also part of the reason I have been reluctant to file anything
against the tracker before asking the mailing list - for all I know some of
what I am commenting on may be fixed or resolved in your branches, or may be
covered in a bug report I have not found. As the project is a little split
at present I figured it's best to raise these things here and then put them
onto the correct tracker if appropriate.

 

I'm also happy to use the facilities I have at my disposal to run
large-repository tests if needed - they will take a while, but if I can set
something in progress and collect results then I am very willing!

 

Do you think your Git repository become the official Attic location? It
seems created with this in mind, i.e. to be an Attic-organisation project.

 

Cheers

 

Dan

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Thomas
Waldmann
Sent: 01 April 2015 13:25
To: attic@librelist.com
Subject: Re: [attic] Comparison of Attic vs Bup vs Obnam

 

Hi Dan,

very interesting tests you did and issues you found.

I'm glad you did tests with large (like some terabytes) volumes, because
this is something I can not easily do (I just don't have that much data, nor
such a lot of storage space here).

 

Maybe if you find something in attic (original, master branch), you could
check "merge" repo, branches "merge" and "merge-all" branch also (not
"master" branch there, it is same as original master branch).

"merge" branch is rather conservative, there should be no breakage or
incompatibilites there, just fixes and improvements. If you look into the
main issue tracker, I usually add a note like "fixed, see PR #XXX" at the
end of quite a lot of issues I have fixed recently (but as long as the pull
request is not merged, they are n ot fixed in master - that's why I created
that "merge" branch, to have all the fixes in there).

"merge-all" branch has a lot more improvements (like way faster compression,
faster crypto, faster hashes, ...), but you should be careful as there might
be compatibility issues (esp. if mixing into same repo as original attic, if
backing up with merge-all code and trying to restore with original code or
even later versions of merge-all code). So, in short, don't use "merge-all"
code right now for production backups.

Why I am pointing this out is not just to get my changes tested, but also:
- so we avoid double work (and not hunt stuff that is already fixed, but not
merged into master yet)

- you get better error reporting (exception handling in 0.14/master is
problematic and not giving enough information - especially if remote
repositories come into play)

About issue reporting:

If you find something that happens (also) in original attic / master branch,
please file it into jborg/attic issue tracker.

If you find something that happens only(!) in merge/merge-all, please file
it into attic/merge repo's issue tracker.

In general, rather file issues in cleanly separated issue tracker issues -
they are just easier to track (and fix and close) there compared to the
mailing list.

Thanks,

Thomas

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Wayne Scott
Date:
2015-04-01 @ 13:17
This thread will give you a pretty good summary of where Jonas stands on
all the input on his project at the moment.
https://github.com/jborg/attic/issues/217


On Wed, Apr 1, 2015 at 9:08 AM, Dan Williams <dan@dotfive.co.uk> wrote:

> Hi Thomas
>
>
>
> I saw your email recently about the branches you are working on, but I
> have been meaning to ask a question actually: what is the official position
> of the Attic project on these branches? What I mean is, I know that Jonas
> has less time at present (due to new baby etc.) but I want to know if his
> support is behind what you are doing, or if it's just a hopeful endeavour.
> The reason I ask is that I want to avoid using a different version of Attic
> if Jonas comes back and goes in a different direction, or if there's a
> communit y/project split, etc. So it seems sensible to ask the question!
> Ideally he is supportive of your input, and these changes will become a
> part of the official master branch before long...?
>
>
>
> I did look at the summary of changes in the merge and merge-all branches
> recently, but they did not seem to address any of my direct issues
> (although I would like to benefit from the speed increases etc.). I did not
> want to attempt tests on non-standard code - how robust is your work? I
> know nothing about the Attic community or team other than what I have read
> on the mailing list - other than Jonas, and recently you, I don't know if
> there are any other core team members that contribute. Should I see your
> direction as safe for me to use ? Etc. E.g. merge-all being untrusted makes
> me wonder about what automated tests are in place etc. A few words about
> this and how your direction fits with official Attic and the verification
> procedures would be great.
>
>
>
> This is also part of the reason I have been reluctant to file anything
> against the tracker before asking the mailing list - for all I know some of
> what I am commenting on may be fixed or resolved in your branches, or may
> be covered in a bug report I have not found. As the project is a little
> split at present I figured it's best to raise these things here and then
> put them onto the correct tracker if appropriate.
>
>
>
> I'm also happy to use the facilities I have at my disposal to run
> large-repository tests if needed - they will take a while, but if I can set
> something in progress and collect results then I am very willing!
>
>
>
> Do you think your Git repository become the official Attic location? It
> seems created with this in mind, i.e. to be an Attic-organisation project.
>
>
>
> Cheers
>
>
>
> Dan
>
>
>
>
>
>
>
> *From:* attic@librelist.com [mailto:attic@librelist.com] *On Behalf Of *Thomas
> Waldmann
> *Sent:* 01 April 2015 13:25
> *To:* attic@librelist.com
> *Subject:* Re: [attic] Comparison of Attic vs Bup vs Obnam
>
>
>
> Hi Dan,
>
> very interesting tests you did and issues you found.
>
> I'm glad you did tests with large (like some terabytes) volumes, because
> this is something I can not easily do (I just don't have that much data,
> nor such a lot of storage space here).
>
>
>
> Maybe if you find something in attic (original, master branch), you could
> check "merge" repo, branches "merge" and "merge-all" branch also (not
> "master" bra nch there, it is same as original master branch).
>
> "merge" branch is rather conservative, there should be no breakage or
> incompatibilites there, just fixes and improvements. If you look into the
> main issue tracker, I usually add a note like "fixed, see PR #XXX" at the
> end of quite a lot of issues I have fixed recently (but as long as the pull
> request is not merged, they are n ot fixed in master - that's why I created
> that "merge" branch, to have all the fixes in there).
>
> "merge-all" branch has a lot more improvements (like way faster
> compression, faster crypto, faster hashes, ...), but you should be careful
> as there might be compatibility issues (esp. if mixing into same repo as
> original attic, if backing up with merge-all code and trying to restore
> with original code or even later versions of merge-al l code). So, in
> short, don't use "merge-all" code right now for production backups.
>
> Why I am pointing this out is not just to get my changes tested, but also:
> - so we avoid double work (and not hunt stuff that is already fixed, but
> not merged into master yet)
>
> - you get better error reporting (exception handling in 0.14/master is
> problematic and not giving enough information - especially if remote
> repositories come into play)
>
> About issue reporting:
>
> If you find something that happens (also) in original attic / master
> branch, please file it into jborg/attic issue tracker.
>
> If you find something that happens only(!) in merge/merge-all, please file
> it into attic/merge repo's issue tracker.
>
> In general, rather file issues in cleanly separated issue tracker issues -
> they are just easier to track (and fix and close) there compared to the
> mailing list.
>
> Thanks,
>
> Thomas
>

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Thomas Waldmann
Date:
2015-04-01 @ 14:16
> This thread will give you a pretty good summary of where Jonas stands on
all the input on his project at the moment.

> https://github.com/jborg/attic/issues/217
>

Exactly.

It's somehow not quite like I would have liked it to be (and that's not
just about getting "my" stuff in, but also about the bus-factor
consideration for projects in general, especially if the stuff is not just
some arbitrary software, but a backup software, where trust in maintenance
and project future is somehow more important than for other projects).

Some people I asked for a 2nd opinion about issue 217 told me why I don't
just fork the project right now.

My current POV on that is that I would not like to fork the project right
now just because Jonas maybe had sleepless nights with their babies or
other stress before writing that post, but rather I'ld join a bus-factor++
attic github organisation together with him and all other contributors in
the near future (== in the next months).

He wrote an excellent piece of code with attic, so it would be a pity to
fork and lose him (and also make things more difficult for users, who would
then have even more backup options to test and compare...).

Well, and in the (IMHO sad) case that Jonas would not want that kind of
collaborative project, there is still the option of forking.

if it’s just a hopeful endeavour.
>>
>
It currently is that.

I know Jonas has little time, so I am trying to make it easy for him to
accept my stuff: there are PRs against master for all the "just fixes" and
"just simple improvements" and also a "merge" branch that even did the work
of merging all that stuff, so he does not need to merge each single PR by
himself, but could also just pull from "merge" after having a look at the
stuff. The more critical (but still IMHO good) stuff is in "merge-all".

But as I personally want to use the improved code, I guess I won't throw it
away and wait an indefinite time for Jonas to re-do it all by himself, but
rather fork.


> The reason I ask is that I want to avoid using a different version of
>> Attic if Jonas comes back and goes in a different direction, or if there’s
>> a communit y/project split, etc. So it seems sensible to ask the question!
>>
>
Sure. Future of attic depends on Jonas. Future of that code in general
doesn't. The power of FOSS. :)

Ideally he is supportive of your input, and these changes will become a
>> part of the official master branch before long…?
>>
>
I hope it will be so, but there is quite some doubt after ticket 217.


> I did not want to attempt tests on non-standard code – how robust is your
>> work?
>>
>
Well, as long as you do TESTs only, I don't see any risk there. The worst
thing that could happen is that you waste some time with some code that you
won't use later.

But maybe you enjoy the recent fixes and improvements and your tests
increase your trust...


> I know nothing about the Attic community or team other than what I have
>> read on the mailing list – other than Jonas, and recently you, I don’t know
>> if there are any other core team members that contribute.
>>
>
You can have a look at github repo history, pull requests (open and closed)
and tickets to learn about (potential) contributors.


> Should I see your direction as safe for me to use?
>>
>
Don't use "merge-all" for production.

"merge" is hopefully safe, though, but more testers would be helpful.


> Etc. E.g. merge-all being untrusted makes me wonder about what automated
>> tests are in place etc.
>>
>
I wrote some tests and adapted some existing tests. Can't say about
coverage, AFAIK it is currently not measured.

In general, I noticed that in case one does some mistake somewhere, the
MACs tend to blow up with "integrity error". :)

A few words about this and how your direction fits with official Attic and
>> the verification procedures would be great.
>>
>
I run "tox" to test on all pythons before committing/pushing.

I also do manual tests (but with a rather small data set compared to yours).

This is also part of the reason I have been reluctant to file anything
>> against the tracker before asking the mailing list – for all I know some of
>> what I am commenting on may be fixed or resolved in your branches,
>>
>
Don't let that hold you back. The jborg/attic tracker is about THAT repo,
so you don't need to consider the other repo.

And if I see some new issue there, I might either add a comment saying
"fixed by <URL>" or just start to debug and fix it.

or may be covered in a bug report I have not found.
>>
>
If you do some quick search on the issue tracker to avoid duplicates,
nobody will have an issue in case you accidentally file a duplicate.
Sometimes duplicates are even somehow useful when they add additional
information or even just confirmation.

Do you think your Git repository become the official Attic location? It
>> seems created with this in mind, i.e. to be an Attic-organisation project.
>>
>
The "attic" name was taken by some inactive github user, but luckily github
has some "kill dusty unused accounts" policy.

So I triggered that to get it for us all, to have some common space to
collaborate and to increase bus factor.

I can't say ofc whether Jonas will join us later, he of course was the
first who got an invitation to there and it was intended that he would be
one of the organisation administrators.

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-01 @ 13:47
Thanks!!! I did not know that discussion existed. I have read every
conversation on the mailing list (there are not all that many.) and quite a
few issues, but I hadn't found that.

 

To me it is good and bad news. Good because I now have more of an
understanding of Thomas's background and ability, so I feel more comfortable
with considering the merge and merge-all branches. Bad because it is as I
feared - Jonas is very protective of his (code) baby and understandably
reluctant to give away too much control.

 

I write a lot of open-source software myself (although unfortunately none in
Python, and I do very little Python at all) so I can completely understand
Jonas's position on this. However, it *is* open-source, and therefore anyone
is able to take it in a different direction. That's what happens when you
release to the world, and unfortunately it sometimes leads to a
project/community split.

 

It seems that there are a few interested and able contributors to Attic, so
I hope things can be resolved with the tool and community intact. Otherwise
I guess time will go by and in a few months split-Attic will not resemble
original-Attic too much at all, meaning a lot of PRs to work through and
review, and it sounds like Jonas would prefer to simply go in a different
direction.

 

So I'm thinking "uh-oh" - but ultimately I want to use whichever tool
delivers, first. Because I want to use Attic now, and get issues solved now.

 

 

This thread will give you a pretty good summary of where Jonas stands on all
the input on his project at the moment.

https://github.com/jborg/attic/issues/217

 

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-01 @ 10:17
Hi Dan

Thanks for your testing. It's been interesting reading. Just like you, I 
started out using obnam, realized restore was too slow, and switched to 
attic. For me Attic do slower backups (due to large sparse files), but 
restores a lot faster from the 5th backup. Also Attic consumes more 
storage than obnam for initial backup, but it grows slower for the next 
backups (probably due to the handling of sparse files). My backup files 
total to about 1TB of data. Counting the sparse files full size it would 
be about 2,5TB.

My data is mainly vmware esxi snapshots of various windows servers  
virtual disk files. Including exchange and mssql. Daily snapshots are 
copied from esxi to a local gentoo providing nfs datastore. The gentoo 
server runs attic over ssh to copy changes to a offsite backupserver. (The
gentoo server is actually a vm on the esxi.)

I haven't run into issues using attic yet, except that backup of sparse 
files takes a while. Your test is a reminder that I should do some restore
testing again.

PG
   

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-01 @ 10:47
Hi Peter

 

That's very interesting. I did not perform any tests of data sets of around
1TB - only 15GB and then 7TB. I find it surprising that Obnam produces a
*smaller* backup repository than Attic. I'm scratching my head about that!
What options were you using with Obnam? Did you alter chunk size or do any
tuning? In theory, Obnam should miss a load of matching areas due to using
block/chunk comparison instead of a rolling checksum. But maybe this has
something to do with the sparse files.

 

I agree that the poor performance you are seeing is likely due to the
handling of sparse files as ordinary files. If Attic can correct this, it
would be awesome.

 

To me it's critical that I can trust the backup system to correctly store
and restore everything, hence my rigorous testing! I suspect that my
problems are somehow related to the size of my data set, so I would suspect
you are safe, but it's always good to check.

 

Cheers

 

Dan

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 11:17
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

Hi Dan

Thanks for your testing. It's been interesting reading. Just like you, I
started out using obnam, realized restore was too slow, and switched to
attic. For me Attic do slower backups (due to large sparse files), but
restores a lot faster from the 5th backup. Also Attic consumes more storage
than obnam for initial backup, but it grows slower for the next backups
(probably due to the handling of sparse files). My backup files total to
about 1TB of data. Counting the sparse files full size it would be about
2,5TB.

My data is mainly vmware esxi snapshots of various windows servers virtual
disk files. Including exchange and mssql. Daily snapshots are copied from
esxi to a local gentoo providing nfs datastore. The gentoo server runs attic
over ssh to copy changes to a offsite backupserver. (The gentoo server is
actually a vm on the esxi.)

I haven't run into issues using attic yet, except that backup of sparse
files takes a while. Your test is a reminder that I should do some restore
testing again.

PG

 

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-01 @ 15:26
I'm quite sure the smaller obnam initial repo was because of it handling 
of sparse files. As I mentioned, my files are virtual disk files. Thats 
typically three 3KB text files, one 50GB sparse file (25GB on disk) and 
one 300GB sparse (120GB on disk) for each vm. Basically my backupset is 
sparse files only.
I ran obnam with default parameters. I tried to play some with a few 
parameters, but it didn't make much difference. (Obnam accessed the repo 
using nfs over ipsec)

The issues I ran into with obnam was:1) drop in networkconnection to the 
nfs share frequently caused obnam to hang. (A better approach would the 
resume or quit with an error.)
2) Repo grows too much. Typically 15GB/day. (Attic grows just 
3-4GB/day.)(Keep in mind I'm using real data. Daily changes on source data
depends on humans inconsistent activities.)3) Restore time increases with 
the number of backups in repo. After 70 backups in the repo, restore of a 
20GB sparse file (8GB on disk) took 11 hours! I concluded that 
reinstalling the vm from scratch would be faster than restore from backup,
and ditched offsite backup of vm snapshots.

The downside of attic is that it takes more than double the time of obnam 
to run. A daily backup takes 12-15 hours. However, it's network load is 
far less, so I doesn't really matter that it's running while people are at
work.


      Fra: Dan Williams <dan@dotfive.co.uk>
 Til: attic@librelist.com 
 Sendt: Onsdag, 1. april 2015 12.47
 Emne: RE: [attic] Comparison of Attic vs Bup vs Obnam
   
<!--#yiv8825224915 _filtered #yiv8825224915 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered 
#yiv8825224915 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} 
_filtered #yiv8825224915 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 
4;} _filtered #yiv8825224915 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4
2 4;}#yiv8825224915 #yiv8825224915 p.yiv8825224915MsoNormal, 
#yiv8825224915 li.yiv8825224915MsoNormal, #yiv8825224915 
div.yiv8825224915MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv8825224915 a:link, #yiv8825224915 
span.yiv8825224915MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv8825224915 a:visited, 
#yiv8825224915 span.yiv8825224915MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv8825224915 
span.yiv8825224915EmailStyle17 {font-family:"Calibri", 
"sans-serif";color:#1F497D;}#yiv8825224915 .yiv8825224915MsoChpDefault 
{font-size:10.0pt;} _filtered #yiv8825224915 {margin:72.0pt 72.0pt 72.0pt 
72.0pt;}#yiv8825224915 div.yiv8825224915WordSection1 {}-->Hi Peter  That’s
very interesting. I did not perform any tests of data sets of around 1TB –
only 15GB and then 7TB. I find it surprising that Obnam produces a 
*smaller* backup repository than Attic. I’m scratching my head about that!
What options were you using with Obnam? Did you alter chunk size or do any
tuning? In theory, Obnam should miss a load of matching areas due to using
block/chunk comparison instead of a rolling checksum. But maybe this has 
something to do with the sparse files.  I agree that the poor performance 
you are seeing is likely due to the handling of sparse files as ordinary 
files. If Attic can correct this, it would be awesome.  To me it’s 
critical that I can trust the backup system to correctly store and restore
everything, hence my rigorous testing! I suspect that my problems are 
somehow related to the size of my data set, so I would suspect you are 
safe, but it’s always good to check.  Cheers  Dan    From: 
attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter 
Gunnerud
Sent: 01 April 2015 11:17
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam  Hi Dan

Thanks for your testing. It's been interesting reading. Just like you, I 
started out using obnam, realized restore was too slow, and switched to 
attic. For me Attic do slower backups (due to large sparse files), but 
restores a lot faster from the 5th backup. Also Attic consumes more 
storage than obnam for initial backup, but it grows slower for the next 
backups (probably due to the handling of sparse files). My backup files 
total to about 1TB of data. Counting the sparse fi les full size it would 
be about 2,5TB.

My data is mainly vmware esxi snapshots of various windows servers virtual
disk files. Including exchange and mssql. Daily snapshots are copied from 
esxi to a local gentoo providing nfs datastore. The gentoo server runs 
attic over ssh to copy changes to a offsite backupserver. (The gentoo 
server is actually a vm on the esxi.)

I haven't run into issues using attic yet, except that backup of sparse 
files takes a while. Your test is a reminder that I should do some restore
testing again.

PG  

  

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-01 @ 16:27
It’s a good point; I tested with an “artificial” sparse file that had the 
first 1GB used. But, the point of sparse files is that they grow as used… 
so even with a “real” one I would have expected the behaviour to be the 
same (i.e. Attic would process the whole file, but after the data portion 
of the file it just contains zeroes). Your point #2 is exactly what I 
would expect in comparison, because Attic would recognise more of the data
being the same (due to the rolling hash; Bup would be the same in this 
regard) whereas Obnam has to match whole blocks so would miss lots.

 

Point #3 is one of the critical aspects to me: “how long will Critical 
Thing be offline”. An ideal backup tool should be able to saturate I/O and
get close to (or better than) the speed of cp/rsync. Attic satisfies me in
the regard, although I have not had opportunity to test it over many 
generations of real data yet.

 

I do wonder why Attic takes so long to run subsequent backups. If rsync 
can do it more quickly without using a cache, then something is wrong in 
my opinion. Obnam also doesn’t use a local cache, so either the local 
Attic cache should be ditched (as it is a little annoying, but far less 
than with Bup) or it should present a tangible benefit. It’s an area that 
I am very curious about. I guess the answer relies heavily upon the 
chunker stuff that Thomas has been talking about – maybe his work there 
would speed things up! (Although… I would still expect it to only apply to
changed/new files; unchanged files should surely be as quick as rsync?)

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 16:26
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

I'm quite sure the smaller obnam initial repo was because of it handling 
of sparse files. As I mentioned, my files are virtual disk files. Thats 
typically three 3KB text files, one 50GB sparse file (25GB on disk) and 
one 300GB sparse (120GB on disk) for each vm. Basically my backupset is 
sparse files only.

 

I ran obnam with default parameters. I tried to play some with a few 
parameters, but it didn't make much difference. (Obnam accessed the repo 
using nfs over ipsec)

 

The issues I ran into with obnam was:

1) drop in networkconnection to the nfs share frequently caus ed obnam to 
hang. (A better approach would the resume or quit with an error.)

2) Repo grows too much. Typically 15GB/day. (Attic grows just 3-4GB/day.)

(Keep in mind I'm using real data. Daily changes on source data depends on
humans inconsistent activities.)

3) Restore time increases with the number of backups in repo. After 70 
backups in the repo, restore of a 20GB sparse file (8GB on disk) took 11 
hours! I concluded that reinstalling the vm from scratch would be faster 
than restore from backup, and ditched offsite backup of vm snapshots.

 

The downside of attic is that it takes more than double the time of obnam 
to run. A daily backup takes 12-15 hours. However, it's network load is 
far less, so I doesn't really matter that it's running while people are at
work.

 

 

  _____  

Fra: Dan Williams <dan@dotfive.co.uk>
Til: attic@librelist.com 
Sendt: Onsdag, 1. april 2015 12.47
Emne: RE: [attic] Comparison of Attic vs Bup vs Obnam

 

Hi Peter

 

That’s very interesting. I did not perform any tests of data sets of 
around 1TB – only 15GB and then 7TB. I find it surprising that Obnam 
produces a *smaller* backup repository than Attic. I’m scratching my head 
abou t that! What options were you using with Obnam? Did you alter chunk 
size or do any tuning? In theory, Obnam should miss a load of matching 
areas due to using block/chunk comparison instead of a rolling checksum. 
But maybe this has something to do with the sparse files.

 

I agree that the poor performance you are seeing is likely due to the 
handling of sparse files as ordinary files. If Attic can correct this, it 
would be awesome.

 

To me it’s critical that I can trust the backup system to correctly store 
and restore everything, hence my rigorous testing! I suspect that my 
problems are somehow related to the size of my data set, so I would 
suspect you are safe, but it’s always good to check.

 

Cheers

 

Dan

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 11:17
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

Hi Dan

Thanks for your testing. It's been interesting reading. Just like you, I 
started out using obnam, realized restore was too slow, and switched to 
attic. For me Attic do slower backups (due to large sparse files), but 
restores a lot faster from the 5th backup. Also Attic consumes more 
storage than obnam for initial backup, but it grows slower for the next 
backups (probably due to the handling of sparse files). My backup files 
total to about 1TB of data. Counting the sparse fi les full size it would 
be about 2,5TB.

My data is mainly vmware esxi snapshots of various windows servers virtual
disk files. Including exchange and mssql. Daily snapshots are copied from 
esxi to a local gentoo providing nfs datastore. The gentoo server runs 
attic over ssh to copy changes to a offsite backupserver. (The gentoo 
server is actually a vm on the esxi.)

I haven't run into issues using attic yet, except that backup of sparse 
files takes a while. Your test is a reminder that I should do some restore
testing again.

PG

 

 

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-02 @ 23:17
Here are some numbers from my repo check / restore test, mainly sparse files.
I'm running attic 0.13.

First some facts about my setup:
Repo mainly consist of vmware disk files, sparse files. Backup is 
performed like this:esxi creates snapshotsnapshot files are copied to a 
onsite gentoo serveresxi deletes snapshotonsite gentoo server start attic 
backup to offsite gentoo server over ssh.Repo is encrypted
Total size of source data is about 1TB on disk. Without the use of sparse 
files it would consume about 2,5TB of disk.
This backup has now run 70 times. No backupset has ever been deleted from 
the repo.Repo size on disk is now 248GB

Before I started using attic, I used obnam. After 60 runs I stopped using 
obnam mainly because of slow restore. I think it took about 12 hours to 
restore a 8GB file. Obnam performed backup in less than half the time of 
what does. I think bandwidth was the bottleneck for obnam backup. Obnam 
made a network traffic of about 15GB (nfs over ipsec) for each run. Attic 
creates network traffic of about 3-4GB for each run.
Obnam repo was quite small after initial backup compared to attic repo 
after initial backup. I don't remember exact, but I think attic repo was 
about 150GB on disk, while obnam I think was about 110GB.

After 60 runs obnam repo was large. I don't remember how large, but I had 
to delete some to free diskspace to start testing on attic. Out of the 60 
backups, I kept 14. After deleting the obnam repo came down to 324GB.To 
compare, Attic has 70 runs, none deleted. Repo is 248GB.
So, to the subject of restore. I'm testing restore locally on the server 
where the repo lives.
Obnam restore, as mentioned managed restore 8GB in 12hours. That is 
660MB/hour. I also started a restore of a 20GB file which seemed to run in
about the same speed, but there was a power outage before it managed to 
finish. I didn't bother to restart.

To test attic I first did attic check - no errors found. It took near 6 
hours to complete.Then I did a restore from the latest backup. I picked a 
single sparse file. 28GB on disk, 80GB total. I kept an eye on the file as
I grew on disk. After 10 minutes: 6.6GB. After 20 minutes 14GB. After 30 
minutes 21GB, After 40 minutes 30GB. After 57 minutes: 80GB and finished! 
That makes 750MB/minute for the first part, and 2940MB/minute for the last
part.
CPU usage is from 30 to 40% during restore. Mem usage is 10%. (according 
to top). The gentoo vm where the repo lives, and where I'm testing 
restore, has 4 cores, 4GB ram. The esxi hosting the vm has an amd fx-6300 
cpu. Repo lives on a vmdk located on a single 4TB sata seagate drive. I do
restore tests to a thin provisioned vmdk located on a 500GB hgst laptop 
sata drive.

Next test is restore of a 100GB sparse file, containing 65GB data.Restored
file reached to 65GB after 82 minutes. That makes 800MB/minute. for the 
datapart.
77GB after 87 minutes, and 101GB after 94minutes. That makes 3GB/minute 
for the zerofill part.
Restore finished after 96minutes, restored file is still 101GB.

What there is to learn:Restore process accelerates when it comes to 
zerofill of the no longer sparse file.A better sparse file handling would 
probably make attic backup at the same speed as obnam on this 
dataset.Attic dedup of sparse files works well. It just takes time.Dedup 
of vmdk files containing mostly windows server works very well. It would 
never managed to compress data 90% (initial backup) if if didn't recognize
that windows system is mostly the same datapattern on every vm. Loads of 
these data are already compressed on source (.cab files).Attic dedup also 
works great between generations. Each new generation creates just 20-30% 
of the network traffic compared to obnam.
Attic restore on a backupset of 60/70 generations, is 72 times faster than
obnam restore!
Keep in mind I'm using real data. Changes between generations are 
dependent on humans inconsistent activities.

   

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-03 @ 09:49
I found that all highly interesting - I assume that Obnam is using better
compression, hence the lower initial figures, but Attic is definitely more
clever at de-duplication. I wonder how your repository will be affected if
you use the merge-all branch with the new compression code?

 

The Obnam restore is too slow to be useful. This is the primary reason I am
moving everything to Attic from Bup - I can live with slow backups, but if a
restore doesn't max out I/O then there's something wrong with the design.
Attic continually impresses me on that front.

 

I suspect Obnam would fare better over the generations if you were to mount
the images and backup the filesystems, because then it would have its chunks
aligned. But that is useless. So for disk images I think your experience
shows clearly the benefits of the Attic & Bup approach over Obnam.

 

I would like to know, though, have you noticed any slow-down in using Attic
over the generations? I read a message on the list from a while back,
reporting this, but it seemed to be a unique experience?

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 03 April 2015 00:18
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

Here are some numbers from my repo check / restore test, mainly sparse
files.

 

I'm running attic 0.13.

 

First some facts about my setup:

Repo mainly consist of vmware disk files, sparse files. Backup is performed
like this:

esxi creates snapshot

snapshot files are copied to a onsite gentoo server

esxi deletes snapshot

onsite gentoo server start attic backup to offsite gentoo server over ssh.

Repo is encrypted

Total size of source data is about 1TB on disk. Without the use of sparse
files it would consume about 2,5TB of disk.

 

This backup has now run 70 times. No backupset has ever been deleted from
the repo.

Repo size on disk is now 248GB

 

Before I started using attic, I used obnam. After 60 runs I stopped using
obnam mainly because of slow restore. I think it took about 12 hours to
restore a 8GB file. Obnam performed backup in less than half the time of
what does. I think bandwidth was the bottleneck for obnam backup. Obnam made
a network traffic of about 15GB (nfs over ipsec) for each run. Attic creates
network traffic of about 3-4GB for each run.

 

Obnam repo was quite small after initial backup compared to attic repo after
initial backup. I don't remember exact, but I think attic repo was about
150GB on disk, while obnam I think was about 110GB.

 

After 60 runs obnam repo was large. I don't remember how large, but I had to
delete some to free diskspace to start testing on attic. Out of the 60
backups, I kept 14. After deleting the obnam repo came down to 324GB.

To compare, Attic has 70 runs, none deleted. Repo is 248GB.

 

So, to the subject of restore. I'm testing restore locally on the server
where the repo lives.

Obnam restore, as mentioned managed restore 8GB in 12hours. That is
660MB/hour. I also started a restore of a 20GB file which seemed to run in
about the same speed, but there was a power outage before it managed to
finish. I didn't bother to restart.

 

To test attic I first did attic check - no errors found. It took near 6
hours to complete.

Then I did a restore from the latest backup. I picked a single sparse file.
28GB on disk, 80GB total. I kept an eye on the file as I grew on disk. After
10 minutes: 6.6GB. After 20 minutes 14GB. After 30 minutes 21GB, After 40
minutes 30GB. After 57 minutes: 80GB and finished! That makes 750MB/minute
for the first part, and 2940MB/minute for the last part.

CPU usage is from 30 to 40% during restore. Mem usage is 10%. (according to
top). The gentoo vm where the repo lives, and where I'm testing restore, has
4 cores, 4GB ram. The esxi hosting the vm has an amd fx-6300 cpu. Repo lives
on a vmdk located on a single 4TB sata seagate drive. I do restore tests to
a thin provisioned vmdk located on a 500GB hgst laptop sata drive.

 

Next test is restore of a 100GB sparse file, containing 65GB data.

Restored file reached to 65GB after 82 minutes. That makes 800MB/minute. for
the datapart.

77GB after 87 minutes, and 101GB after 94minutes. That makes 3GB/minute for
the zerofill part.

Restore finished after 96minutes, restored file is still 101GB.

 

What there is to learn:

Restore process accelerates when it comes to zerofill of the no longer
sparse file.

A better sparse file handling would probably make attic backup at th e same
speed as obnam on this dataset.

Attic dedup of sparse files works well. It just takes time.

Dedup of vmdk files containing mostly windows server works very well. It
would never managed to compress data 90% (initial backup) if if didn't
recognize that windows system is mostly the same datapattern on every vm.
Loads of these data are already compressed on source (.cab files).

Attic dedup also works great between generations. Each new generation
creates just 20-30% of the network traffic compared to obnam.

Attic restore on a backupset of 60/70 generations, is 72 times faster than
obnam restore!

 

Keep in mind I'm using real data. Change s between generations are dependent
on humans inconsistent activities.

 

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-03 @ 15:02
I'm thinking of switching to Thomas' code. I'm holding back because there 
is no way back. I'm kind of waiting to see which path Jonas will follow. 
While both versions handles sparse files the way they do, I don't really 
have a major reason to switch.

Attic backup time has increased some over time. In the early backupset a 
backup run took 9 to 12 hours. Lately each run takes from 10 to 14 hours. 
I suspect this is partly caused by sparse files consumes more space on 
disk as time goes, partly because of increased activity on the files being
backed up, and partly because of growing attic cache.
Regarding restore time, my first test showed acceptable speed. I didn't 
time it. I was curious if attic would slow down after several runs, like 
obnam did. The restore test I ran last night showed that restore speed 
still is acceptable.

vmware managed to convert the restored files into sparse. What I restored 
was the systemdisk and datadisk for an windows server 2008r2 with exchange
2010. The VM started up nicely after a warning of incorrect shutdown, and 
a 10 minute search for the other domain controller. Exchange has checked 
its database - no errors found.

My only issue is that I don't have enough free space to do a full restore 
because files are not restored as sparse. My restore strategy has to be to
restore the most critical diskfiles first on the gentoo repo server, copy 
them (while making them sparse) to esxi, then repeat the process. I will 
have to copy to esxi anyway because hosting virtual disks over a nfs 
shared by a vm running on the same host will be too many layers over the 
physical disk.


      Fra: Dan Williams <dan@dotfive.co.uk>
 Til: attic@librelist.com 
 Sendt: Fredag, 3. april 2015 11.49
 Emne: RE: [attic] Comparison of Attic vs Bup vs Obnam
   
<!--#yiv3244185252 _filtered #yiv3244185252 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered 
#yiv3244185252 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} 
_filtered #yiv3244185252 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 
4;} _filtered #yiv3244185252 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4
2 4;}#yiv3244185252 #yiv3244185252 p.yiv3244185252MsoNormal, 
#yiv3244185252 li.yiv3244185252MsoNormal, #yiv3244185252 
div.yiv3244185252MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv3244185252 a:link, #yiv3244185252 
span.yiv3244185252MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3244185252 a:visited, 
#yiv3244185252 span.yiv3244185252MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3244185252 
span.yiv3244185252EmailStyle17 {font-family:"Calibri", 
"sans-serif";color:#1F497D;}#yiv3244185252 .yiv3244185252MsoChpDefault 
{font-size:10.0pt;} _filtered #yiv3244185252 {margin:72.0pt 72.0pt 72.0pt 
72.0pt;}#yiv3244185252 div.yiv3244185252WordSection1 {}-->I found that all
highly interesting – I assume that Obnam is using better compression, 
hence the lower initial figures, but Attic is definitely more clever at 
de-duplication. I wonder how your repository will be affected if you use 
the merge-all branch with the new compression code?  The Obnam restore is 
too slow to be useful. This is the primary reason I am moving everything 
to Attic from Bup – I can live with slow backups, but if a restore doesn’t
max out I/O then there’s something wrong with the design. Attic contin 
ually impresses me on that front.  I suspect Obnam would fare better over 
the generations if you were to mount the images and backup the 
filesystems, because then it would have its chunks aligned. But that is 
useless. So for disk images I think your experience shows clearly the 
benefits of the Attic & Bup approach over Obnam.  I would like to know, 
though, have you noticed any slow-down in using Attic over the 
generations? I read a message on the list from a while back, reporting 
this, but it see med to be a unique experience?      From: 
attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter 
Gunnerud
Sent: 03 April 2015 00:18
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam  Here are some 
numbers from my repo check / restore test, mainly sparse files.  I'm 
running attic 0.13.  First some facts about my setup:Repo mainly consist 
of vmware disk files, sparse files. Backup is performed like this:esxi 
creates snapshotsnapshot files are copied to a onsite gentoo serveresxi 
deletes snapshotonsite gentoo server start attic backup to offsite gentoo 
server over ssh.Repo is encryptedTotal size of source data is about 1TB on
disk. Without the use of sparse files it would consume about 2,5TB of 
disk.  This backup has now run 70 times. No backupset has ever been 
deleted from the repo.Repo size on disk is now 248GB  Before I started 
using attic, I used obnam. After 60 runs I stopped using obnam mainly 
because of slow restor e. I think it took about 12 hours to restore a 8GB 
file. Obnam performed backup in less than half the time of what does. I 
think bandwidth was the bottleneck for obnam backup. Obnam made a network 
traffic of about 15GB (nfs over ipsec) for each run. Attic creates network
traffic of about 3-4GB for each run.  Obnam repo was quite small after 
initial backup compared to attic repo after initial backup. I don't 
remember exact, but I think attic repo was about 150GB on disk, while 
obnam I think was about 110GB.  After 60 runs obnam repo was large. I 
don't remember how large, but I had to delete some to free diskspace to 
start testing on attic. Out of the 60 backups, I kept 14. After deleting 
the obnam repo came down to 324GB.To compare, Attic has 70 runs, none 
deleted. Repo is 248GB.  So, to the subject of restore. I'm testing 
restore locally on the server where the repo lives.Obnam restore, as 
mentioned managed restore 8GB in 12hours. That is 660MB/hour. I also 
started a restore of a 20GB file which seemed to run in about the same 
speed, but there was a power outage before it managed to finish. I didn't 
bother to restart.  To test attic I first did attic check - no errors 
found. It took near 6 hours to complete.Then I did a restore from the 
latest backup. I picked a single sparse file. 28GB on disk, 80GB total. I 
kept an eye on the file as I grew on disk. After 10 minutes: 6.6GB. After 
20 minutes 14GB. After 30 minutes 21GB, After 40 minutes 30GB. After 57 
minutes: 80GB and finished! That makes 750MB/minute for the first part, 
and 2940MB/minute for the last part.CPU usage is from 30 to 40% during 
restore. Mem usage is 10%. (according to top). The gentoo vm where the 
repo lives, and where I'm testing restore, has 4 cores, 4GB ram. The esxi 
hosting the vm has an amd fx-6300 cpu. Repo lives on a vmdk located on a 
single 4TB sata seagate drive. I do restore tests to a thin provisioned 
vmdk located on a 500GB hgst laptop sata drive.  Next test is restore of a
100GB sparse file, containing 65GB data.Restored file reached to 65GB 
after 82 minutes. That makes 800MB/minute. for the datapart.77GB after 87 
minutes, and 101GB after 94minutes. That makes 3GB/minute for the zerofill
part.Restore finished after 96minutes, restored file is still 101GB.  What
there is to learn:Restore process accelerates when it comes to zerofill of
the no longer sparse file.A better sparse file handling would probably 
make attic backup at th e same speed as obnam on this dataset.Attic dedup 
of sparse files works well. It just takes time.Dedup of vmdk files 
containing mostly windows server works very well. It would never managed 
to compress data 90% (initial backup) if if didn't recognize that windows 
system is mostly the same datapattern on every vm. Loads of these data are
already compressed on source (.cab files).Attic dedup also works great 
between generations. Each new generation creates just 20-30% of the 
network traffic compared to obnam.Attic restore on a backupset of 60/70 
generations, is 72 times faster than obnam restore!  Keep in mind I'm 
using real data. Change s between generations are dependent on humans 
inconsis tent activities.  

  

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Thomas Waldmann
Date:
2015-04-04 @ 09:31
> My only issue is that I don't have enough free space to do a full restore
> because files are not restored as sparse.
>

While thinking about sparse VMs and "VM hosters" I realized precisely THAT
and what a dealstopper that can be for some - if you are "overcommitting
space".

Thus, there is a ticket about sparse file support now  and I'll try to fix
this with high prio.

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-04 @ 12:58
Thomas, I'm looking forward to that fuction!
The missing function is not a dealstopper thou. Attic is still the best 
tool I know of for the job. The issue is when restoring to esxi the 
required space is double of the restored data as the restore must be done 
on linux, then copied to esxi. And when doing restore testing, also the 
space consumed by the currently running system is required. So that is 3 
times the data. When one of these locations don't handle sparse files, I 
run out of space.
I've now had the first backup run with your merge-code. Backup used14% 
less time than the previous 6 runs (They varied with only 4 minutes from 
slowest to fastest). 14% is within the normal variance.
Maybe I have to start with a new repo to make use of your faster code?Or 
maybe my source of large files with small changes mostly spend time i a 
part of the code you have not changed?
Is there a list available of the issues that you have addressed in your 
merge and merge-all branches?


      Fra: Thomas Waldmann <thomas.j.waldmann@gmail.com>
 Til: attic@librelist.com 
 Sendt: Lørdag, 4. april 2015 11.31
 Emne: Re: [attic] Comparison of Attic vs Bup vs Obnam
   


My only issue is that I don't have enough free space to do a full restore 
because files are not restored as sparse.

While thinking about sparse VMs and "VM hosters" I realized precisely THAT
and what a dealstopper that can be for some - if you are "overcommitting 
space".
 
Thus, there is a ticket about sparse file support now  and I'll try to fix
this with high prio.



  

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Thomas Waldmann
Date:
2015-04-04 @ 22:22
> So that is 3 times the data. When one of these locations don't handle
sparse files, I run out of space.

Well, running out of space (maybe unexpectedly) on restore sounds like a
dealstopper.

I've now had the first backup run with your merge-c ode. Backup used14%
> less time than the previous 6 runs (They varied with only 4 minutes from
> slowest to fastest). 14% is within the normal variance.
> Maybe I have to start with a new repo to make use of your faster code?
>

If you use and existing repo, some compatibility code in merge-all attic
will initialize compression and crypto with compatible params as original
attic, so most of the new stuff won't get active. It WILL create new
metadata headers though, so you won't be able to go back to original 0.14
for that repo once you have written to it.

You can set the new crypto / compression / mac stuff only at "attic init"
time (see attic init --help with merge-all code), when creating a new repo
and it'll be fixed then as long as you use that repo.


> Or maybe my source of large files with small changes mostly spend time i a
> part of the code you have not changed?
>

You will see the biggest difference on the first backup you create, because
then all you files will be encrypted / macced and maybe compressed.

You likely need to tune params to match your CPU and I/O capabilities, see
docs/tuning.rst.


> Is there a list available of the issues that you have addressed in your
> merge and merge-all branches?
>

There is a .txt file in the toplevel dir documenting most of the changes.

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-09 @ 23:02
Thanks for that info, Thomas.
I've had a few backup runs with Thomas' code the last week. The backup 
seems to use 15-20% less time than attic 0.13, using the same repo.I'll 
create a new repo the next time I'm onsite.
   

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-03 @ 16:14
I have the same problem regarding the restoration of the images, but for 
me the time taken to do the second copy (to make the images sparse again) 
is more important than the disk space, which I usually have enough of.

 

I made my own Debian packages for official Attic master branch, because 
the officially-available copies are not up-to-date (and the Ubuntu ones 
don’t work at all). When I test Thomas’s code I will again make Debian 
packages. These get uploaded to our Apt repository, which you are welcome 
to use if that makes life easier to test things.

 

I am encouraged by your experiences, as it sounds that there is no 
appreciable slowdown over time, other than the natural growth of data 
being backed up.

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 03 April 2015 16:03
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

I'm thinking of switching to Thomas' code. I'm holding back because there 
is no way back. I'm kind of waiting to see which path Jonas will follow. 
While both versions handles sparse files the way they do, I don't really 
have a major reason to switch.

 

Attic backup time has increased some over time. In the early backupset a 
backup run took 9 to 12 hours. Lately each run takes from 10 to 14 hours. 
I suspect this is partly caused by sparse files consumes more space on 
disk as time goes, partly because of increased activity on the files being
backed up, and partly because of growing attic cache.

 

Regarding restore time, my first test showed acceptable speed. I didn't 
time it. I was curious if attic would slow down after several runs, like 
obnam did. The restore test I ran last night showed that restore speed 
still is acceptable.

 

vmware managed to convert the restored files into sparse. What I restored 
was the systemdisk and datadisk for an windows server 2008r2 with exchange
2010. The VM started up nicely after a warning of incorrect shutdown, and 
a 10 minute search for the other domain controller. Exchange has checked 
its database - no errors found.

 

My only issue is that I don't have enough free space to do a full restore 
because files are not restored as sparse. My restore strategy has to be to
restore the most critical diskfiles first on the gentoo repo server, copy 
them (while making them sparse) to esxi, then repeat the process. I will 
have to copy to esxi anyway because hosting virtual disks over a nfs 
shared by a vm running on the same host will be too many layers over the 
physical disk.

 

 

 

Sv: [attic] Comparison of Attic vs Bup vs Obnam

From:
Petter Gunnerud
Date:
2015-04-04 @ 00:26
Why do you create debian packages to install attic?
I just made a clone of the systemdisk on the server  holding the repo and 
installed Thomas code like this:
pip install https://github.com/ThomasWaldmann/attic/zipball/mergeIt didn't
pull dependencies, but in a rather long error message the missing 
dependencies was listed and easily installed using pip.
I'm currently waiting for the repo to copy over to the cloned 
backupserver.(I'd be surprised if pip is not a package for debian)

My restore could have been faster (and finish without interaction) if 
restore could be done directly to esxi. But the only way to access esxi 
filesystem from another computer is a slow scp. That would be even slower.
Hence it must be a two step process where esxi is the nfs (or iscsi) 
client. (scp to the host is actually far slower than scp to a vm running 
on the host!)

      Fra: Dan Williams <dan@dotfive.co.uk>
 Til: attic@librelist.com 
 Sendt: Fredag, 3. april 2015 18.14
 Emne: RE: [attic] Comparison of Attic vs Bup vs Obnam
   
<!--#yiv0548420841 _filtered #yiv0548420841 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered 
#yiv0548420841 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} 
_filtered #yiv0548420841 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 
4;} _filtered #yiv0548420841 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4
2 4;}#yiv0548420841 #yiv0548420841 p.yiv0548420841MsoNormal, 
#yiv0548420841 li.yiv0548420841MsoNormal, #yiv0548420841 
div.yiv0548420841MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv0548420841 a:link, #yiv0548420841 
span.yiv0548420841MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv0548420841 a:visited, 
#yiv0548420841 span.yiv0548420841MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv0548420841 
p.yiv0548420841MsoAcetate, #yiv0548420841 li.yiv0548420841MsoAcetate, 
#yiv0548420841 div.yiv0548420841MsoAcetate 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;font-family:"Tahoma", 
"sans-serif";}#yiv0548420841 p.yiv0548420841msonormal, #yiv0548420841 
li.yiv0548420841msonormal, #yiv0548420841 div.yiv0548420841msonormal 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv0548420841 p.yiv0548420841msochpdefault, 
#yiv0548420841 li.yiv0548420841msochpdefault, #yiv0548420841 
div.yiv0548420841msochpdefault 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv0548420841 span.yiv0548420841msohyperlink 
{}#yiv0548420841 span.yiv0548420841msohyperlinkfollowed {}#yiv0548420841 
span.yiv0548420841emailstyle17 {}#yiv0548420841 p.yiv0548420841msonormal1,
#yiv0548420841 li.yiv0548420841msonormal1, #yiv0548420841 
div.yiv0548420841msonormal1 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New 
Roman", "serif";}#yiv0548420841 span.yiv0548420841msohyperlink1 
{color:blue;text-decoration:underline;}#yiv0548420841 
span.yiv0548420841msohyperlinkfollowed1 
{color:purple;text-decoration:underline;}#yiv0548420841 
span.yiv0548420841emailstyle171 {font-family:"Calibri", 
"sans-serif";color:#1F497D;}#yiv0548420841 p.yiv0548420841msochpdefault1, 
#yiv0548420841 li.yiv0548420841msochpdefault1, #yiv0548420841 
div.yiv0548420841msochpdefault1 
{margin-right:0cm;margin-left:0cm;font-size:10.0pt;font-family:"Times New 
Roman", "serif";}#yiv0548420841 span.yiv0548420841EmailStyle27 
{font-family:"Calibri", "sans-serif";color:#1F497D;}#yiv0548420841 
span.yiv0548420841BalloonTextChar {font-family:"Tahoma", 
"sans-serif";}#yiv0548420841 .yiv0548420841MsoChpDefault 
{font-size:10.0pt;} _filtered #yiv0548420841 {margin:72.0pt 72.0pt 72.0pt 
72.0pt;}#yiv0548420841 div.yiv0548420841WordSection1 {}-->I have the same 
problem regarding the restoration of the images, but for me the time taken
to do the second copy (to make the images sparse again) is more important 
than the disk space, which I usually have enough of.  I made my own Debian
packages for official Attic master branch, because the 
officially-available copies are not up-to-date (and the Ubuntu ones don’t 
work at all). When I test Thomas’s code I will again make Debian packages.
These get uploaded to our Apt repository, which you are welcome to use if 
that makes life easier to test things.  I am encouraged by your 
experiences, as it sounds that there is no appreciable slowdown over time,
other than the natural growth of data being backed up.     

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-04-04 @ 09:36
>   Why do you create debian packages to install attic?

 

Due to policy – we don’t put build tools or generally even Pip onto 
servers, and everything that is installed has to be from a Debian package.
There are already packages for Debian/Ubuntu so I created upgraded ones. 
This is preferable to the Pip route for us. I did however use the Pip 
route on a test machine before upgrading the packages (Pip is indeed a 
Debian package).

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 04 April 2015 01:27
To: attic@librelist.com
Subject: Sv: [attic] Comparison of Attic vs Bup vs Obnam

 

Why do you create debian packages to install attic?

I just made a clone of the systemdisk on the server  holding the repo and 
installed Thomas code like this:

pip install https://github.com/ThomasWaldmann/attic/zipball/merge

It didn't pull dependencies, but in a rather long error message the 
missing dependencies was listed and easily installed using pip.

 

I'm currently waiting for the repo to copy over to the cloned backupserver.

(I'd be surprised if pip is not a package for debian)

 

My restore could have been faster (and finish without interaction) if 
restore could be done directly to esxi. But the only way to access esxi 
filesystem from another computer is a slow scp. That would be even slower.
Hence it must be a two step process where esxi is the nfs (or iscsi) 
client. (scp to the host is actually far slower than scp to a vm running 
on the host!)

 

  _____  

Fra: Dan Williams <dan@dotfive.co.uk>
Til: attic@librelist.com 
Sendt: Fredag, 3. april 2015 18.14
Emne: RE: [attic] Comparison of Attic vs Bup vs Obnam

 

I have the same problem regarding the restoration of the images, but for 
me the time taken to do the second copy (to make the images sparse again) 
is more important than the disk space, which I usually have enough of.

 

I made my own Debian packages for official Attic master branch, because 
the officially-available copies are not up-to-date (and the Ubuntu ones 
don’t work at all). When I test Thomas’s code I will again make Debian 
packages. These get uploaded to our Apt repository, which you are welcome 
to use if that makes life easier to test things.

 

I am encouraged by your experiences, as it sounds that there is no 
appreciable slowdown over time, other than the natural growth of data 
being backed up.

 

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Yuri D'Elia
Date:
2015-03-31 @ 18:58
On 03/31/2015 08:30 PM, Dan Williams wrote:
> The outcome, to summarise the details below, is that Attic wins hands-down.
> That's great! But it is also currently unusable as a universal solution for
> me, for two reasons: 1) data corruption on large repositories, and 2) lack
> of sparse and special file handling. I can live with #2 and get around it,
> but #1 is critical.

Are you able to replicate the data corruption issue reliably?

Is the error reported by attic itself? Did you check the restore against
a known list of checksums?

It would help a lot if you could track down the issue as much as
possible. I performed a sequential migration from a large rdiffbackup
archive, and verified the restore at each archive without issues.

[Is that maybe another case of the index size getting too large?]

Also, for the tests it would only make sense to use the latest attic
from the master branch.

Referring your tests, I also tested bup in the past and I love the
archive format for many reasons, but without the ability to prune the
archives there's no way I would be able to use it for basically anything.

duplicity is also another option if you want to play. I wrote a quick
patch years ago to speed up large file archiving and it was reasonably
quick. My main issue with duplicity is that it only support strict
forward-differential mode (there's no option to base the diff on a
different ancestor). That's kind of stupid, since this would be a 3
lines patch (which got dismissed, by the way). You are basically forced
to perform many full backups just to keep restore times manageable,
which is also unrealistic for space management issues.

Using rdiffdir (also from duplicity) directly is another option to
overcome this issue [note: I tried and it works], but it cannot compete
with a block-based approach like attic/obnam in terms of speed.

rdiffbackup is a solid tool I've used for years. It's slow though, and
gets linearly slower with the number of file revisions.

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-03-31 @ 19:14
Hi Yuri

Yes - I can replicate the issue every time. I have captured a fair amount of
output and am currently preparing an email detailing the issue. As far as I
am aware (from looking through open bugs, and even closed ones) this is not
a known issue so far.

Yes - the error I get is an Attic one. I'm unsure what you mean about
checking against a list of checksums? My intention was to compare the
restored files against the originals - but I didn't get that far, due to the
error (when I encountered the error, I stopped, tried again, and then
decided to ask for help!).

I am unsure what to do to track it down but perhaps once I send all the
details someone will tell me something else to check... as my email about
the performance comparison was quite long, I figured I would separate them.
How large was your rdiffbackup?

How can I check index size - are there known issues about this? (I read
about one with large files and upgraded to use the latest code from master
which solved the first issue I encountered, which was when actually
performing the backup.)

I can confirm all tests were run under latest code from master (not the
other ones that Thomas has been working on).

I agree about the lack of pruning in Bup. It's a major problem. I am
currently using Bup for some data that will never be deleted, but I would
prefer to completely change over to Attic. I don't think there are many
things preventing that - this data corruption issue and some minor features
really.

Unfortunately Duplicity and other rdiff-style programs are not suitable for
what I am after... nor are tar-style ones like ddar or zbackup :o( Therefore
Attic is the holy grail in many ways!

Cheers

Dan


 >> -----Original Message-----
 >> From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Yuri
 >> D'Elia
 >> Sent: 31 March 2015 19:59
 >> To: attic@librelist.com
 >> Subject: Re: [attic] Comparison of Attic vs Bup vs Obnam
 >> 
 >> On 03/31/2015 08:30 PM, Dan Williams wrote:
 >> > The outcome, to summarise the details below, is that Attic wins hands-
 >> down.
 >> > That's great! But it is also currently unusable as a universal
 >> solution for
 >> > me, for two reasons: 1) data corruption on large repositories, and 2)
 >> lack
 >> > of sparse and special file handling. I can live with #2 and get around
 >> it,
 >> > but #1 is critical.
 >> 
 >> Are you able to replicate the data corruption issue reliably?
 >> 
 >> Is the error reported by attic itself? Did you check the restore against
 >> a known list of checksums?
 >> 
 >> It would help a lot if you could track down the issue as much as
 >> possible. I performed a sequential migration from a large rdiffbackup
 >> archive, and verified the restore at each archive without issues.
 >> 
 >> [Is that maybe another case of the index size getting too large?]
 >> 
 >> Also, for the tests it would only make sense to use the latest attic
 >> from the master branch.
 >> 
 >> Referring your tests, I also tested bup in the past and I love the
 >> archive format for many reasons, but without the ability to prune the
 >> archives there's no way I would be able to use it for basically
 >> anything.
 >> 
 >> duplicity is also another option if you want to play. I wrote a quick
 >> patch years ago to speed up large file archiving and it was reasonably
 >> quick. My main issue with duplicity is that it only support strict
 >> forward-differential mode (there's no option to base the diff on a
 >> different ancestor). That's kind of stupid, since this would be a 3
 >> lines patch (which got dismissed, by the way). You are basically forced
 >> to perform many full backups just to keep restore times manageable,
 >> which is also unrealistic for space management issues.
 >> 
 >> Using rdiffdir (also from duplicity) directly is another option to
 >> overcome this issue [note: I tried and it works], but it cannot compete
 >> with a block-based approach like attic/obnam in terms of speed.
 >> 
 >> rdiffbackup is a solid tool I've used for years. It's slow though, and
 >> gets linearly slower with the number of file revisions.

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Thiago Coutinho
Date:
2015-03-31 @ 18:35
Hi Dan.

It would be nice if you could test zbackup too: http://zbackup.org/

2015-03-31 15:30 GMT-03:00 Dan Williams <dan@dotfive.co.uk>:

> Hi all
>
> I have spent the past two months poking and prodding at various
> de-duplicating backup systems. To cut a very long story short, this has
> ultimately resulted in comparison of Attic, Bup, and Obnam.
>
> I started out using Bup, and it's pretty good. I set up some servers to do
> their regular backups with it, and that worked a treat. So I started a
> backup on our main fileserver, and it took ten days to backup 7TB of
> data...
> okay, that's a fairly long time, but maybe acceptable. Then I tried
> extracting some of that data, and realised I had a serious problem: it took
> *ages*!
>
> Hence I went looking for alternatives. I found Obnam, and it seemed very
> promising, but is terribly slow. I found some other tools that for various
> reasons did not fit what I was looking for.
>
> And I found Attic.
>
> I ended up doing performance tests of Attic vs Bup vs Obnam, possibly with
> larger data sets than anyone else has tried (certainly larger than I have
> read about). Along the way, I have run into some interesting issues, which
> I
> will write to the list about separately, or log as bugs in some cases.
>
> The outcome, to summarise the details below, is that Attic wins hands-down.
> That's great! But it is also currently unusable as a universal solution for
> me, for two reasons: 1) data corruption on large repositories, and 2) lack
> of sparse and special file handling. I can live with #2 and get around it,
> but #1 is critical.
>
> Anyway, this email is purely about the results of my performance testing,
> which people may or may not find interesting. Hopefully it will be of some
> use to others looking for similar information - I did find a thread about
> Attic vs Obnam, but the data set was much smaller.
>
> So without more ado:
>
>
> TEST SYSTEM 1:
>
>   - Purpose: desktop machine
>   - OS: Ubuntu 14.04 LTS (Trusty Tahr) 64-bit
>   - Quad-core Intel i7-950 @ 3.07GHz (hyperthreaded)
>   - 12GB RAM
>   - Main OS on solid-state array (RAID 0)
>   - Data storage solid-state array (RAID 0)
>   - Tests took place using both arrays
>
> TEST SYSTEM 2:
>
>   - Purpose: fileserver
>   - OS: Debian 8 (Jessie) 64-bit
>   - Quad-core Intel i7-3770 @ 3.40GHz (hyperthreaded)
>   - 32GB RAM
>   - Main OS on solid-state array (RAID 0)
>   - Data storage on various arrays over a total of 24 spinning drives
>   - Tests took place on a RAID 6 array with 12x 2TB drives (7200rpm Seagate
> Barracudas)
>
> Note: Both systems ended up using the same Attic version, the latest code
> from master (I've created my own Debian packages because the ones
> generally-available are out-of-date and buggy).
>
> I should also mention that all timing figures are subject to some small
> degree of inaccuracy - although the systems were kept as quiet as possible
> whilst the tests were occurring, the conditions were not perfect.
>
>
> FIRST TEST
> ==========
>
> My first test was fairly small, and comprised of me backing up my home
> directory, which is composed of 15GB of files. I timed the initial backup
> (for Bup I added the time taken for `bup index`), and then ran a subsequent
> backup with no changes. I then restored everything in the repository to the
> same disk, and then to a different disk, to try and factor out drive speed
> effects.
>
> Obnam was tricky. It has a fair few options and tuning parameters that
> affect things. I ultimately found no difference trying different tuning
> parameters, but I did find a huge difference turning on compression using
> deflate (I know, it's not default! Boo!).
>
> Of particular concern is that Obnam has a theoretical collision potential,
> in that if a block has the same MD5 hash as another block, it will assume
> they are the same. This behaviour is the default, but can be mitigated by
> using the verify option. I tried with and without, and interestingly did
> not
> notice any speed difference (2 seconds, which is statistically
> insignificant) and also did not encounter any bad data on restoration. So I
> don't know why it's off by default.
>
> It's worth noting that Obnam with default settings was faster to backup
> than
> Attic and Bup, but did not result in any space saving at all. From what I
> can tell, it performed zero de-duplication of my data. From what I have
> read, this seems to be because of the chunking method used, compared to the
> rolling hash of Attic and Bup. Of course, subsequent backups will save
> space, but Obnam with default settings is in my view pretty much useless.
>
> Still, here are the results of Obnam alone, under four different
> configurations:
>
>
> 1.  Backup home (initial) (from different disk) (Obam only)
> -----------------------------------------------------------
>
>     Number of entities:   26,785
>     Number of files:      24,838
>     Total data set size:    15GB
>
>                Default   Verify  Deflate  Ver+Def
>     ---------------------------------------------
>     Time         08:42    08:40    11:55    11:54
>     No. files   38,088   38,090   38,094   38,093
>     Size          15GB     15GB    4.5GB    4.5GB
>
> I'll spare you the rest of the data I collected about Obnam vs Obnam,
> because it's not all that relevant. However, restoration was over 17
> minutes
> if not using deflate when backing up, and only 4:30 with it enabled.
>
> All of the subsequent Obnam results for comparison use the deflate and
> verify options.
>
> Here are the results for Bup vs Obnam vs Attic:
>
>
> 2.  Backup home (initial) (from different disk)
> -----------------------------------------------
>
>     Number of entities:   26,785
>     Number of files:      24,838
>     Total data set size:    15GB
>
>                     cp    Rsync      Bup    Obnam    Attic
>     ------------------------------------------------------
>     Time         06:05    06:55    09:43    11:54    10:24
>     No. files                         26   38,093      764
>     Size                           3.8GB    4.5GB    3.8GB
>
> I have also shown the times taken by cp and rsync, for comparison. Bup and
> Attic are pretty close in terms of time, and near enough identical on disk
> space used, but Obnam lags behind a bit and notably uses an extraordinary
> number of files! It also suffers because it is only benefitting from
> compression and not de-duplication.
>
>
> 3.  Backup home (subsequent) (no changes)
> -----------------------------------------
>
>     Number of entities:        0 (none changed)
>     Number of files:           0
>     Total data set size:       0
>
>                     cp    Rsync      Bup    Obnam    Attic
>     ------------------------------------------------------
>     Time             -    00:01    00:03    00:08    00:04
>     No. files                         28   38,164      764
>     Size                           3.8GB    4.5GB    3.8GB
>
> Not much to say about this, just here for completeness. Obnam trails
> behind.
>
>
> 4.  Restore home (to same/different disk)
> -----------------------------------------
>
>                     cp    Rsync      Bup    Obnam    Attic
>     ------------------------------------------------------
>     Time (same)  07:25    08:25    04:10    04:28    02:34
>     Time (diff)  07:11    08:13    04:03    04:13    02:03
>
> Here we can see that although restoring to a different disk did help ever
> so
> slightly, this operation is not really particularly disk-bound, as Attic
> shows with a quite amazing result. The standard cp and rsync commands lag
> behind because they have to read and write the entire 15GB, but even so,
> Attic blazes ahead of both Obnam and Bup.
>
> There was one problem restoring, however: Attic failed to restore a socket
> file. Ouch! The other two restored the file just fine.
>
> Next I tried adding a sparse file, to see how each program would handle it
> -
> I created a 5GB sparse file and filled the first 1GB with noise from
> /dev/urandom:
>
>
> 5.  Backup home (subsequent) (sparse file)
> ------------------------------------------
>
>     Number of entities:        1 (one added)
>     Number of files:           1
>     Total data set size:     1GB
>
>                     cp    Rsync      Bup    Obnam    Attic
>     ------------------------------------------------------
>     Time             -    00:36    09:30    01:18    09:24
>     No. files                         33   39,262      966
>     Size                           4.8GB    5.5GB    4.8GB
>
> Obnam managed a major win here because it recognises sparse files and
> handles them efficiently. Restoration on that one file was a similar story:
>
>
> 6.  Restore home (sparse file only)
> -----------------------------------
>
>                     cp    Rsync      Bup    Obnam    Attic
>     ------------------------------------------------------
>     Time             -        -    00:47    01:03    01:21
>     Size                           5.0GB    1.0GB    5.0GB
>
> This result was a bit of a surprise, in that Attic trailed behind Bup, just
> in this one test. Unfortunately, only Obnam passed the test, correctly
> restoring the file sparsely - Bup and Attic did not. This would be a huge
> problem for me in backing up virtual machine images.
>
> So far, the results are mixed. Attic seems to win for general restoration
> speed, but doesn't handle sparse files and actually misses some files
> altogether (sockets). Obnam is the most accurate, but loses out on
> everything else - speed, disk space, number of files used, you name it. Bup
> has a few extra features compared to Attic, but has a rather nasty usage of
> index and then save, and nasty .bup directories everywhere (although Attic
> creates its own, without telling you!) - Obnam needs no cache at all,
> somehow.
>
>
> SECOND TEST
> ===========
>
> My next set of tests were with some Big Data. Part of the objective of this
> whole exercise has been to find a tool suitable for use to regularly back
> up
> critical data to a de-duplicated repository, which itself will live on an
> external array with BTRFS snapshots (there's never too much paranoia when
> it
> comes to backups!). So I tested each program with this.
>
>
> 7.  Backup fileserver (initial)
> -------------------------------
>
>     Number of entities:   5,120,641
>     Number of files:      4,385,287
>     Total data set size:      7.2TB
>
>                          Bup          Obnam          Attic
>     ------------------------------------------------------
>     Time           256:00:00  DNF: 41:36:44+     100:58:47
>     No. files         12,136  DNF:2,700,000+     1,149,668
>     Backup size        5.7TB  DNF:    689GB+         5.6TB
>     Cache size          44GB              -          9.7GB
>
> I had already run Bup back when I thought it would be the tool of choice,
> so
> my time is approximate. I recorded that it took 10.5 days but I did not
> have
> the minutes and seconds hence have rounded the hours.
>
> Attic is more than twice as quick as Bup on this dataset. It does however
> suffer from using a large number of files (not a critical concern) and I
> did
> encounter some problems along the way (my first attempt failed and I had to
> use the latest code from master branch). It's also notable how much smaller
> Attic's cache is when compared to Bup.
>
> At this point Obnam falls by the wayside. After almost two days it had only
> scanned 1.3TB of data, and hence was set to take longer than Bup. It also
> uses a *colossal* number of files. There were no errors, but I stopped the
> test because it was apparent that it is not a contender. Hence the values
> are where it got to when it was stopped, and not final values, so cannot be
> used for comparison.
>
> The next test was to try and restore some files. I had a particular
> directory of around 6GB which I had tried to restore with Bup, and had been
> horrified by the time taken (in my view, it is extremely important that
> restorations should be as quick as possible). Time to try Attic:
>
>
> 8.  Restore fileserver (specific folder)
> ----------------------------------------
>
>     Number of entities:        4198
>     Number of files:           2933
>     Total data set size:      6.2GB
>
>                          Bup          Obnam          Attic
>     ------------------------------------------------------
>     Time            03:34:51              -          05:58
>
> Wow. That's right - just under six *minutes*! That's impressive. What's
> even
> more impressive is that it restored 99% of that data in less than one
> minute, and presumably spent the other five checking stuff.
>
> On large data sets, where large measures in the terabytes, I cannot use Bup
> - restoration is too slow to be usable, even if I was willing to put up
> with
> the slower backup speed. I cannot even think about using Obnam. Attic wins,
> hands down.
>
>
> 9.  Restore fileserver (everything, after sync)
> -----------------------------------------------
>
> The next step was meant to be doing a sync (i.e. running another backup to
> pick up any changed files) and then restoring the entire repository and
> running a diff on the result, to satisfy myself that everything works
> correctly. Unfortunately, Attic failed this test. After performing the
> second backup, I encountered errors (data corruption reported) during the
> restoration. Running cleanup did not help. Deleting the backup fixed the
> issue, but then when I ran another backup the problem returned. So I cannot
> currently benchmark this scenario or verify data integrity.
>
>
> COMMANDS USED
> =============
>
> For those that are interested, here is a list of commands used with the
> various programs:
>
> cp
>     time cp /data/testdata . -aR
>
> rsync
>     time rsync -aAX /data/testdata/ rsync/
>
> Bup
>     mkdir bup
>     BUP_DIR=$(pwd)/bup bup init
>     BUP_DIR=$(pwd)/bup time bup index /data/testdata
>     BUP_DIR=$(pwd)/bup time bup save -n test /data/testdata
>
>     BUP_DIR=$(pwd)/bup time bup restore -C restored.bup test/latest
>
> Obnam
>     time obnam backup -r obnam /data/testdata
>     time obnam backup --deduplicate=verify -r obnam.v /data/testdata
>     time obnam backup --compress-with=deflate -r obnam.c /data/testdata
>     time obnam backup --compress-with=deflate --deduplicate=verify -r
> obnam.cv /data/testdata
>     time obnam backup --compress-with=deflate --deduplicate=verify
> --lru-size=1024 --upload-queue-size=8192 -r obnam.cvt /data/testdata
>
>     time obnam restore -r obnam --to restored.obnam
>
> Attic
>     HOME=$(pwd)/attic.home attic init attic
>     HOME=$(pwd)/attic.home time attic create attic::First /data/testdata
>
>     mkdir restored.attic; cd restored.attic
>     HOME=$(pwd)/../attic.home time attic extract ../attic::Second
>
> Sparse file
>     dd if=/dev/urandom of=sparse bs=1G count=1
>     truncate -s 5G sparse
>     du -sh sparse
>     du -sh --apparent-size sparse
>
>
> Overall I am very impressed with Attic, and I think with a couple of bugs
> fixed and perhaps some minor features added it will be the best tool of
> this
> kind around.
>
> Thanks, Jonas - and more recently, Thomas, too - for your hard work on
> Attic!
>
> Cheers
>
> Dan
>
>
>
>


-- 
Thiago Coutinho

"O povo não deveria temer o governo. O governo é quem deveria temer o povo."
V de Vingança

Re: [attic] Comparison of Attic vs Bup vs Obnam

From:
Dan Williams
Date:
2015-03-31 @ 18:45
Hey Thiago

 

Interestingly, ZBackup was one of the tools I assessed and ruled out. The 
main reason for this was the tar-like nature of the tool:

 

“Right now the only modes supported are reading from standard input and 
writing to standard output.”

“It's only possible to fully restore the backup in order to get to a 
required file, without any option to quickly pick it out.”

 

This made me conclude that it wasn’t suitable for my purposes, plus there 
were observations about it not being suitable for large repositories.

 

If I get time I might give it a try out of pure curiosity :o)

 

Cheers

 

Dan

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Thiago
Coutinho
Sent: 31 March 2015 19:36
To: attic@librelist.com
Subject: Re: [attic] Comparison of Attic vs Bup vs Obnam

 

Hi Dan.

It would be nice if you could test zbackup too: http://zbackup.org/

 

2015-03-31 15:30 GMT-03:00 Dan Williams <dan@dotfive.co.uk>:

Hi all

I have spent the past two months poking and prodding at various
de-duplicating backup systems. To cut a very long story short, this has
ultimately resulted in comparison of Attic, Bup, and Obnam.

I started out using Bup, and it's pretty good. I set up some servers to do
their regular backups with it, and that worked a treat. So I started a
backup on our main fileserver, and it took ten days to backup 7TB of data...
okay, that's a fairly long time, but maybe acceptable. Then I tried
extracting some of that data, and realised I had a serious problem: it took
*ages*!

Hence I went looking for alternatives. I found Obnam, and it seemed very
promising, but is terribly slow. I found some other tools that for various
reasons did not fit what I was looking for.

And I found Attic.

I ended up doing performance tests of Attic vs Bup vs Obnam, possibly with
larger data sets than anyone else has tried (certainly larger than I have
read about). Along the way, I have run into some interesting issues, which I
will write to the list about separately, or log as bugs in some cases.

The outcome, to summarise the details below, is that Attic wins hands-down.
That's great! But it is also currently unusable as a universal solution for
me, for two reasons: 1) data corruption on large repositories, and 2) lack
of sparse and special file handling. I can live with #2 and get around it,
but #1 is critical.

Anyway, this email is purely about the results of my performance testing,
which people may or may not find interesting. Hopefully it will be of some
use to others looking for similar information - I did find a thread about
Attic vs Obnam, but the data set was much smaller.

So without more ado:


TEST SYSTEM 1:

  - Purpose: desktop machine
  - OS: Ubuntu 14.04 LTS (Trusty Tahr) 64-bit
  - Quad-core Intel i7-950 @ 3.07GHz (hyperthreaded)
  - 12GB RAM
  - Main OS on solid-state array (RAID 0)
  - Data storage solid-state array (RAID 0)
  - Tests took place using both arrays

TEST SYSTEM 2:

  - Purpose: fileserver
  - OS: Debian 8 (Jessie) 64-bit
  - Quad-core Intel i7-3770 @ 3.40GHz (hyperthreaded)
  - 32GB RAM
  - Main OS on solid-state array (RAID 0)
  - Data storage on various arrays over a total of 24 spinning drives
  - Tests took place on a RAID 6 array with 12x 2TB drives (7200rpm Seagate
Barracudas)

Note: Both systems ended up using the same Attic version, the latest code
from master (I've created my own Debian packages because the ones
generally-available are out-of-date and buggy).

I should also mention that all timing figures are subject to some small
degree of inaccuracy - although the systems were kept as quiet as possible
whilst the tests were occurring, the conditions were not perfect.


FIRST TEST
==========

My first test was fairly small, and comprised of me backing up my home
directory, which is composed of 15GB of files. I timed the initial backup
(for Bup I added the time taken for `bup index`), and then ran a subsequent
backup with no changes. I then restored everything in the repository to the
same disk, and then to a different disk, to try and factor out drive speed
effects.

Obnam was tricky. It has a fair few options and tuning parameters that
affect things. I ultimately found no difference trying different tuning
parameters, but I did find a huge difference turning on compression using
deflate (I know, it's not default! Boo!).

Of particular concern is that Obnam has a theoretical collision potential,
in that if a block has the same MD5 hash as another block, it will assume
they are the same. This behaviour is the default, but can be mitigated by
using the verify option. I tried with and without, and interestingly did not
notice any speed difference (2 seconds, which is statistically
insignificant) and also did not encounter any bad data on restoration. So I
don't know why it's off by default.

It's worth noting that Obnam with default settings was faster to backup than
Attic and Bup, but did not result in any space saving at all. From what I
can tell, it performed zero de-duplication of my data. From what I have
read, this seems to be because of the chunking method used, compared to the
rolling hash of Attic and Bup. Of course, subsequent backups will save
space, but Obnam with default settings is in my view pretty much useless.

Still, here are the results of Obnam alone, under four different
configurations:


1.  Backup home (initial) (from different disk) (Obam only)
-----------------------------------------------------------

    Number of entities:   26,785
    Number of files:      24,838
    Total data set size:    15GB

               Default   Verify  Deflate  Ver+Def
    ---------------------------------------------
    Time         08:42    08:40    11:55    11:54
    No. files   38,088   38,090   38,094   38,093
    Size          15GB     15GB    4.5GB    4.5GB

I'll spare you the rest of the data I collected about Obnam vs Obnam,
because it's not all that relevant. However, restoration was over 17 minutes
if not using deflate when backing up, and only 4:30 with it enabled.

All of the subsequent Obnam results for comparison use the deflate and
verify options.

Here are the results for Bup vs Obnam vs Attic:


2.  Backup home (initial) (from different disk)
-----------------------------------------------

    Number of entities:   26,785
    Number of files:      24,838
    Total data set size:    15GB

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time         06:05    06:55    09:43    11:54    10:24
    No. files                         26   38,093      764
    Size                           3.8GB    4.5GB    3.8GB

I have also shown the times taken by cp and rsync, for comparison. Bup and
Attic are pretty close in terms of time, and near enough identical on disk
space used, but Obnam lags behind a bit and notably uses an extraordinary
number of files! It also suffers because it is only benefitting from
compression and not de-duplication.


3.  Backup home (subsequent) (no changes)
-----------------------------------------

    Number of entities:        0 (none changed)
    Number of files:           0
    Total data set size:       0

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -    00:01    00:03    00:08    00:04
    No. files                         28   38,164      764
    Size                           3.8GB    4.5GB    3.8GB

Not much to say about this, just here for completeness. Obnam trails behind.


4.  Restore home (to same/different disk)
-----------------------------------------

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time (same)  07:25    08:25    04:10    04:28    02:34
    Time (diff)  07:11    08:13    04:03    04:13    02:03

Here we can see that although restoring to a different disk did help ever so
slightly, this operation is not really particularly disk-bound, as Attic
shows with a quite amazing result. The standard cp and rsync commands lag
behind because they have to read and write the entire 15GB, but even so,
Attic blazes ahead of both Obnam and Bup.

There was one problem restoring, however: Attic failed to restore a socket
file. Ouch! The other two restored the file just fine.

Next I tried adding a sparse file, to see how each program would handle it -
I created a 5GB sparse file and filled the first 1GB with noise from
/dev/urandom:


5.  Backup home (subsequent) (sparse file)
------------------------------------------

    Number of entities:        1 (one added)
    Number of files:           1
    Total data set size:     1GB

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -    00:36    09:30    01:18    09:24
    No. files                         33   39,262      966
    Size                           4.8GB    5.5GB    4.8GB

Obnam managed a major win here because it recognises sparse files and
handles them efficiently. Restoration on that one file was a similar story:


6.  Restore home (sparse file only)
-----------------------------------

                    cp    Rsync      Bup    Obnam    Attic
    ------------------------------------------------------
    Time             -        -    00:47    01:03    01:21
    Size                           5.0GB    1.0GB    5.0GB

This result was a bit of a surprise, in that Attic trailed behind Bup, just
in this one test. Unfortunately, only Obnam passed the test, correctly
restoring the file sparsely - Bup and Attic did not. This would be a huge
problem for me in backing up virtual machine images.

So far, the results are mixed. Attic seems to win for general restoration
speed, but doesn't handle sparse files and actually misses some files
altogether (sockets). Obnam is the most accurate, but loses out on
everything else - speed, disk space, number of files used, you name it. Bup
has a few extra features compared to Attic, but has a rather nasty usage of
index and then save, and nasty .bup directories everywhere (although Attic
creates its own, without telling you!) - Obnam needs no cache at all,
somehow.


SECOND TEST
===========

My next set of tests were with some Big Data. Part of the objective of this
whole exercise has been to find a tool suitable for use to regularly back up
critical data to a de-duplicated repository, which itself will live on an
external array with BTRFS snapshots (there's never too much paranoia when it
comes to backups!). So I tested each program with this.


7.  Backup fileserver (initial)
-------------------------------

    Number of entities:   5,120,641
    Number of files:      4,385,287
    Total data set size:      7.2TB

                         Bup          Obnam          Attic
    ------------------------------------------------------
    Time           256:00:00  DNF: 41:36:44+     100:58:47
    No. files         12,136  DNF:2,700,000+     1,149,668
    Backup size        5.7TB  DNF:    689GB+         5.6TB
    Cache size          44GB              -          9.7GB

I had already run Bup back when I thought it would be the tool of choice, so
my time is approximate. I recorded that it took 10.5 days but I did not have
the minutes and seconds hence have rounded the hours.

Attic is more than twice as quick as Bup on this dataset. It does however
suffer from using a large number of files (not a critical concern) and I did
encounter some problems along the way (my first attempt failed and I had to
use the latest code from master branch). It's also notable how much smaller
Attic's cache is when compared to Bup.

At this point Obnam falls by the wayside. After almost two days it had only
scanned 1.3TB of data, and hence was set to take longer than Bup. It also
uses a *colossal* number of files. There were no errors, but I stopped the
test because it was apparent that it is not a contender. Hence the values
are where it got to when it was stopped, and not final values, so cannot be
used for comparison.

The next test was to try and restore some files. I had a particular
directory of around 6GB which I had tried to restore with Bup, and had been
horrified by the time taken (in my view, it is extremely important that
restorations should be as quick as possible). Time to try Attic:


8.  Restore fileserver (specific folder)
----------------------------------------

    Number of entities:        4198
    Number of files:           2933
    Total data set size:      6.2GB

                         Bup          Obnam          Attic
    ------------------------------------------------------
    Time            03:34:51              -          05:58

Wow. That's right - just under six *minutes*! That's impressive. What's even
more impressive is that it restored 99% of that data in less than one
minute, and presumably spent the other five checking stuff.

On large data sets, where large measures in the terabytes, I cannot use Bup
- restoration is too slow to be usable, even if I was willing to put up with
the slower backup speed. I cannot even think about using Obnam. Attic wins,
hands down.


9.  Restore fileserver (everything, after sync)
-----------------------------------------------

The next step was meant to be doing a sync (i.e. running another backup to
pick up any changed files) and then restoring the entire repository and
running a diff on the result, to satisfy myself that everything works
correctly. Unfortunately, Attic failed this test. After performing the
second backup, I encountered errors (data corruption reported) during the
restoration. Running cleanup did not help. Deleting the backup fixed the
issue, but then when I ran another backup the problem returned. So I cannot
currently benchmark this scenario or verify data integrity.


COMMANDS USED
=============

For those that are interested, here is a list of commands used with the
various programs:

cp
    time cp /data/testdata . -aR

rsync
    time rsync -aAX /data/testdata/ rsync/

Bup
    mkdir bup
    BUP_DIR=$(pwd)/bup bup init
    BUP_DIR=$(pwd)/bup time bup index /data/testdata
    BUP_DIR=$(pwd)/bup time bup save -n test /data/testdata

    BUP_DIR=$(pwd)/bup time bup restore -C restored.bup test/latest

Obnam
    time obnam backup -r obnam /data/testdata
    time obnam backup --deduplicate=verify -r obnam.v /data/testdata
    time obnam backup --compress-with=deflate -r obnam.c /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify -r
obnam.cv /data/testdata
    time obnam backup --compress-with=deflate --deduplicate=verify
--lru-size=1024 --upload-queue-size=8192 -r obnam.cvt /data/testdata

    time obnam restore -r obnam --to restored.obnam

Attic
    HOME=$(pwd)/attic.home attic init attic
    HOME=$(pwd)/attic.home time attic create attic::First /data/testdata

    mkdir restored.attic; cd restored.attic
    HOME=$(pwd)/../attic.home time attic extract ../attic::Second

Sparse file
    dd if=/dev/urandom of=sparse bs=1G count=1
    truncate -s 5G sparse
    du -sh sparse
    du -sh --apparent-size sparse


Overall I am very impressed with Attic, and I think with a couple of bugs
fixed and perhaps some minor features added it will be the best tool of this
kind around.

Thanks, Jonas - and more recently, Thomas, too - for your hard work on
Attic!

Cheers

Dan







-- 

Thiago Coutinho

"O povo não deveria temer o governo. O governo é quem deveria temer o povo."
V de Vingança