librelist archives

« back to archive

Surprising performance behaviour on root path change => 13x times slower incremental backup

Surprising performance behaviour on root path change => 13x times slower incremental backup

From:
Laurent Guerby
Date:
2015-09-05 @ 07:52
Hi,

I'm testing attic to replace my rsync --link-dest archive scripts.

While testing how to best convert my rsync archive which has
one directory "/.../guerby-YYYYMMDDHHMMSS/" per day to an attic repo
I came to notice the following behaviour with Attic 0.13 packaged from
debian 8.1 :

1/ without renaming the top directory (653646 files, 249GB) :

root@pc2:/mnt/s3000a# attic init /mnt/s3000a/attic/guerby.attic
root@pc2:/mnt/s3000a# date;time attic create --stats 
/mnt/s3000a/attic/guerby.attic::guerby-20130410T115407 
/mnt/s2000h/guerby/rsync-archive/guerby-20130410T115407;echo $?;date
Fri Sep  4 12:42:53 CEST 2015
Initializing cache...
------------------------------------------------------------------------------
Archive name: guerby-20130410T115407
Archive fingerprint: 
9888a5b8f5dfcfc31d7f7c52bd5aa24fd989f1e41060c87c89b4836048dd5ae0
Start time: Fri Sep  4 12:42:54 2015
End time: Fri Sep  4 16:56:55 2015
Duration: 4 hours 14 minutes 1.09 seconds
Number of files: 630549

                       Original size      Compressed size    Deduplicated size
This archive:              249.04 GB            101.40 GB             89.85 GB
All archives:              249.04 GB            101.40 GB             89.85 GB
------------------------------------------------------------------------------

real        254m2.860s
user        149m50.792s
sys        6m28.532s
0
Fri Sep  4 16:56:56 CEST 2015
root@pc2:/mnt/s3000a# date;time attic create --stats 
/mnt/s3000a/attic/guerby.attic::guerby-20130412T033001 
/mnt/s2000h/guerby/rsync-archive/guerby-20130412T033001;echo $?;date
Fri Sep  4 17:15:37 CEST 2015
------------------------------------------------------------------------------
Archive name: guerby-20130412T033001
Archive fingerprint: 
00a46ebfe5edeffb4e398e709662ca4ccb566f5b959b27012209e6960be5758a
Start time: Fri Sep  4 17:15:39 2015
End time: Fri Sep  4 18:48:37 2015
Duration: 1 hours 32 minutes 58.59 seconds <=== quite long
Number of files: 630340

                       Original size      Compressed size    Deduplicated size
This archive:              249.23 GB            101.44 GB            291.64 MB
All archives:              498.26 GB            202.84 GB             90.15 GB
------------------------------------------------------------------------------

real        93m0.427s
user        40m58.552s
sys        3m8.336s
0
Fri Sep  4 18:48:38 CEST 2015

2/ renaming top directory to "guerby" before attic :

root@pc2:/mnt/s3000a# rm -rf /mnt/s3000a/attic/guerby.attic
root@pc2:/mnt/s3000a# attic init /mnt/s3000a/attic/guerby.attic
root@pc2:/mnt/s3000a# mv  
/mnt/s2000h/guerby/rsync-archive/guerby-20130410T115407 
/mnt/s2000h/guerby/rsync-archive/guerby
root@pc2:/mnt/s3000a# date;time attic create --stats 
/mnt/s3000a/attic/guerby.attic::guerby-20130410T115407 
/mnt/s2000h/guerby/rsync-archive/guerby;echo $?;date
Fri Sep  4 22:09:02 CEST 2015
Initializing cache...
------------------------------------------------------------------------------
Archive name: guerby-20130410T115407
Archive fingerprint: 
a156366179449d115561cb21f4924f6f1e4bfacce05feebef9bb1b3afdb93d9f
Start time: Fri Sep  4 22:09:02 2015
End time: Sat Sep  5 02:45:54 2015
Duration: 4 hours 36 minutes 52.86 seconds
Number of files: 630549

                       Original size      Compressed size    Deduplicated size
This archive:              249.03 GB            101.40 GB             89.85 GB
All archives:              249.03 GB            101.40 GB             89.85 GB
------------------------------------------------------------------------------

real        276m56.271s
user        159m53.308s
sys        6m45.368s
0
Sat Sep  5 02:45:58 CEST 2015
root@pc2:/mnt/s3000a# mv  /mnt/s2000h/guerby/rsync-archive/guerby 
/mnt/s2000h/guerby/rsync-archive/guerby-20130410T115407
root@pc2:/mnt/s3000a# mv 
/mnt/s2000h/guerby/rsync-archive/guerby-20130412T033001 
/mnt/s2000h/guerby/rsync-archive/guerby
root@pc2:/mnt/s3000a# date;time attic create --stats 
/mnt/s3000a/attic/guerby.attic::guerby-20130412T033001 
/mnt/s2000h/guerby/rsync-archive/guerby;echo $?;date
Sat Sep  5 06:07:00 CEST 2015
------------------------------------------------------------------------------
Archive name: guerby-20130412T033001
Archive fingerprint: 
2b6aceeab37ef77b2c01d21c745a256766fc87de843985dec5e9984619575956
Start time: Sat Sep  5 06:07:01 2015
End time: Sat Sep  5 06:14:07 2015
Duration: 7 minutes 6.43 seconds   <===== vs 1 hours 32 minutes 58.59 seconds
Number of files: 630340

                       Original size      Compressed size    Deduplicated size
This archive:              249.21 GB            101.44 GB            
141.40 MB <==== vs 291MB
All archives:              498.24 GB            202.84 GB             90.00 GB
------------------------------------------------------------------------------

real        7m8.101s
user        3m18.680s
sys        0m18.576s
0
Sat Sep  5 06:14:08 CEST 2015
root@pc2:/mnt/s3000a# mv /mnt/s2000h/guerby/rsync-archive/guerby 
/mnt/s2000h/guerby/rsync-archive/guerby-20130412T033001

If the top directory path isn't the same the incremental backup takes 93
minutes while if I rename the top directory of my archive to be the same
between first and incremental backup it runs in about 7 minutes, 13x
faster. The resulting deduplicated size is also 2 times smaller (141 vs
291 MB, +229 byte/file, root path length being 56 bytes).

I assume there is a fast excution path on "same full path file name in
archive" situation but I'm surprised by the magnitude.

Would a latter attic versions have a better performance behaviour
in this particular case ?

I couldn't find an attic option to substitute top directory names before
archiving, eg: /mnt/s2000h/guerby/rsync-archive/guerby-20130412T033001
=> /home/guerby , is there one ? 

Here the "real" path of the data is obviously /home/guerby but I'm using
it on my machine so it's not really practical do remount physically old
backups at the right place: I probably have to use another machine if I
go this path.

Also I'm wondering if I move one day a big data set from one directory
to another, will I pay the performance cost only the next day backup or
on all following backups forever using attic?

Thanks for your time,

Sincerely,

Laurent

Re: [attic] Surprising performance behaviour on root path change => 13x times slower incremental backup

From:
Date:
2015-09-07 @ 05:21
> 
> I'm testing attic to replace my rsync --link-dest archive scripts.
> 

> 
> Would a latter attic versions have a better performance behaviour
> in this particular case ?

that would mean disabling full path matching and matching relative path 
and inode/dev numbers (like tar does).
I already mentioned this problem on the list some time ago and this is 
something that probably won't happen.

> I couldn't find an attic option to substitute top directory names before
> archiving, eg: /mnt/s2000h/guerby/rsync-archive/guerby-20130412T033001
> => /home/guerby , is there one ? 

only on extract -> --strip-components

> 
> Here the "real" path of the data is obviously /home/guerby but I'm using
> it on my machine so it's not really practical do remount physically old
> backups at the right place: I probably have to use another machine if I
> go this path.
> 
> Also I'm wondering if I move one day a big data set from one directory
> to another, will I pay the performance cost only the next day backup or
> on all following backups forever using attic?

I'd think only on the next backup (as the paths in the cache will be 
updated accordingly). Generally moving files around causes pretty much all 
backup schemes to at least rescan (rsnapshot will probably also re-copy 
and need an extra "hardlink" pass after that).

Best Regards
 Heiko