librelist archives

« back to archive

Question about non-updated files

Question about non-updated files

From:
Date:
2014-02-21 @ 09:29
I already asked about it because I thought this was a missing feature (see 
https://github.com/jborg/attic/issues/44 ) - but it looks like it doesn't 
work on my machine.

Attic version is 0.10

I'm seeing very slow backup times on my VM backups with attic and I'm not 
sure if it is working like intended:

Here's a short log:
hel@helpc-13:~/vms$ time attic create -s /tmp/testrepo::test1 
helvm-13/HELVM-13-s001.vmdk 
Initializing cache...
----------------------------------------
Archive name: test1
Archive fingerprint: 
48c90b2bbc3e4573e8bc16aa337fbe3078f56668cdc5ea38e9bf2c0c7341736f
Start time: Fri Feb 21 10:07:02 2014
End time: Fri Feb 21 10:08:19 2014
Duration: 1 minutes 17.78 seconds
Number of files: 1
Original size: 2035290989 (1.90 GB)
Compressed size: 908655500 (866.56 MB)
Unique data: 737945196 (703.76 MB)
----------------------------------------

real    1m17.886s
user    1m0.295s
sys     0m1.398s
hel@helpc-13:~/vms$ time attic create -s /tmp/testrepo::test2 
helvm-13/HELVM-13-s001.vmdk 
----------------------------------------
Archive name: test2
Archive fingerprint: 
97d98b81f016091ffb372abcd67962eb6213908f5aa9b6e95312dcff367a51c4
Start time: Fri Feb 21 10:08:46 2014
End time: Fri Feb 21 10:08:59 2014
Duration: 13.22 seconds
Number of files: 1
Original size: 2035290989 (1.90 GB)
Compressed size: 908655500 (866.56 MB)
Unique data: 695 (695 B)
----------------------------------------

real    0m13.329s
user    0m12.855s
sys     0m0.344s
hel@helpc-13:~/vms$ time attic create -s /tmp/testrepo::test3 
helvm-13/HELVM-13-s001.vmdk 
----------------------------------------
Archive name: test3
Archive fingerprint: 
fb52ebf71e870dbe9cbd8c576397c25dcac87d2a188e235823e95d2879e436c6
Start time: Fri Feb 21 10:09:17 2014
End time: Fri Feb 21 10:09:30 2014
Duration: 13.29 seconds
Number of files: 1
Original size: 2035290989 (1.90 GB)
Compressed size: 908655499 (866.56 MB)
Unique data: 694 (694 B)
----------------------------------------

real    0m13.377s
user    0m12.908s
sys     0m0.321s

The intitial run is ok, but the second and third run are interesting. 
Those should be significantly faster. It looks like attic is reading the 
compete file (and not skipping it like it should).
If i touch the vmdk (thus changing mtime), runtime doesn't change:

hel@helpc-13:~/vms$ touch helvm-13/HELVM-13-s001.vmdk 
hel@helpc-13:~/vms$ time attic create -s /tmp/testrepo::test5 
helvm-13/HELVM-13-s001.vmdk 
----------------------------------------
Archive name: test5
Archive fingerprint: 
585ae13c82c0d26da4ff813d009bd3c3a020243b1bfefaab77aa2f58679d54bf
Start time: Fri Feb 21 10:23:53 2014
End time: Fri Feb 21 10:24:06 2014
Duration: 13.56 seconds
Number of files: 1
Original size: 2035290989 (1.90 GB)
Compressed size: 908655500 (866.56 MB)
Unique data: 196143 (191.55 kB)
----------------------------------------

real    0m13.655s
user    0m13.138s
sys     0m0.331s

How do I diagnose what metadata attic is comparing?

Best Regards
 Heiko Helmle

Re: [attic] Question about non-updated files

From:
Jonas Borgström
Date:
2014-02-21 @ 19:43
could not decode message

Re: [attic] Question about non-updated files

From:
Date:
2014-02-24 @ 09:43
Hello Jonas

> I've attached a patch that will make debugging this easier.
> 
> If you apply this patch and run create in verbose mode you will get the
> following output for each file:
> 
> - For unmodified files: "Found in file cache"
> - For modified files: "Does not match file cache: (inode, size, mtime)
> != (x, y, z)"
> - For new/unknown files "Not found in cache"
> 

Thanks for the patch - i applied and tested the same commands again.

It always registers as "Not found in cache".

(attictest)hel@helpc-13:~/vms$ attic create -v /tmp/testrepo::test_4 
helvm-13/HELVM-13-s001.vmdk 
helvm-13/HELVM-13-s001.vmdk
Not found in cache
(attictest)hel@helpc-13:~/vms$ attic create -v /tmp/testrepo::test_5 
helvm-13/HELVM-13-s001.vmdk 
helvm-13/HELVM-13-s001.vmdk
Not found in cache
(attictest)hel@helpc-13:~/vms$ attic create -v /tmp/testrepo::test_6 
helvm-13/HELVM-13-s001.vmdk 
helvm-13/HELVM-13-s001.vmdk
Not found in cache

Best Regards
 Heiko Helmle

Re: [attic] Question about non-updated files

From:
Jonas Borgström
Date:
2014-02-24 @ 19:55
On 2014-02-24 10:43, heiko.helmle@horiba.com wrote:
> Hello Jonas
> 
>> I've attached a patch that will make debugging this easier.
>>
>> If you apply this patch and run create in verbose mode you will get the
>> following output for each file:
>>
>> - For unmodified files: "Found in file cache"
>> - For modified files: "Does not match file cache: (inode, size, mtime)
>> != (x, y, z)"
>> - For new/unknown files "Not found in cache"
>>
> 
> Thanks for the patch - i applied and tested the same commands again.
> 
> It always registers as "Not found in cache".

I think I've finally figured out what's happening. Attic explicitly
ignores (does not save) the file cache entry with the newest mtime.
The exact reason for this is a bit hard to explain but it's done to
avoid a potential problem with detecting file modifications on files
that are being modified while a filesystem snapshot is created.

This is usually not a big deal since only a single unmodified file will
be unnecessarily re-read. But in your case it's less than ideal and I'll
have figure out if there's a better way to deal with this.

But until then the following should work:

$ touch foo
$ attic create -v /tmp/testrepo::test_x foo helvm-13/HELVM-13-s001.vmdk

/ Jonas

Re: [attic] Question about non-updated files

From:
Date:
2014-02-25 @ 07:24
> > It always registers as "Not found in cache".
> 
> This is usually not a big deal since only a single unmodified file will
> be unnecessarily re-read. But in your case it's less than ideal and I'll
> have figure out if there's a better way to deal with this.
> 
> But until then the following should work:
> 
> $ touch foo
> $ attic create -v /tmp/testrepo::test_x foo helvm-13/HELVM-13-s001.vmdk
> 

hmm... that means my testing was flawed. Unfortunately on my full system 
backups, every single file is "Not found in cache".

I suspect something - does attic store the full path in the cache or just 
the relative path (if given on the command line).

Because if it stores the full filesystem path in the cache, then my backup 
script procedure might be responsible for all the cache misses... it goes 
pretty much like this:

Snapshot filesystems (btrfs or lvm)
mount/bind-mount to temporary filesystem (random path (mktemp -d))
cd to temporary filesystem
attic create <repo> .

Best Regards & Thanks for helping
 Heiko

Re: [attic] Question about non-updated files

From:
Jonas Borgström
Date:
2014-02-25 @ 11:46
On 2014-02-25 08:24 , heiko.helmle@horiba.com wrote:
>> > It always registers as "Not found in cache".
>>
>> This is usually not a big deal since only a single unmodified file will
>> be unnecessarily re-read. But in your case it's less than ideal and I'll
>> have figure out if there's a better way to deal with this.
>>
>> But until then the following should work:
>>
>> $ touch foo
>> $ attic create -v /tmp/testrepo::test_x foo helvm-13/HELVM-13-s001.vmdk
>>
> 
> hmm... that means my testing was flawed. Unfortunately on my full system
> backups, every single file is "Not found in cache".
> 
> I suspect something - does attic store the full path in the cache or
> just the relative path (if given on the command line).
> 
> Because if it stores the full filesystem path in the cache, then my
> backup script procedure might be responsible for all the cache misses...
> it goes pretty much like this:
> 
> Snapshot filesystems (btrfs or lvm)
> mount/bind-mount to temporary filesystem (random path (mktemp -d))
> cd to temporary filesystem
> attic create <repo> .

Ah, that explains it. You're right, the cache uses a hash of the full
filesystem path. It could of course use the relative path instead but
that would open up for false positives, which could be really bad...

Anyway, hopefully it's possible to adapt your script to mount/bind your
snapshot to a constant path. If not, let me know and we'll figure
something out.

/ Jonas

Re: [attic] Question about non-updated files

From:
Date:
2014-02-25 @ 14:42
> > backup script procedure might be responsible for all the cache 
misses...
> > it goes pretty much like this:
> > 
> > Snapshot filesystems (btrfs or lvm)
> > mount/bind-mount to temporary filesystem (random path (mktemp -d))
> > cd to temporary filesystem
> > attic create <repo> .
> 
> Ah, that explains it. You're right, the cache uses a hash of the full
> filesystem path. It could of course use the relative path instead but
> that would open up for false positives, which could be really bad...

hmm... false positives would mean inode AND times AND paths are equal... 

> Anyway, hopefully it's possible to adapt your script to mount/bind your
> snapshot to a constant path. If not, let me know and we'll figure
> something out.

Script is adapted and backups are now from 90 minutes down to 5. Thanks 
again for the help.

I'd propose making this cache optional though - I can still imagine one or 
two use cases where the user might want to force processing the files.

Best Regards
 Heiko

Re: [attic] Question about non-updated files

From:
Jonas Borgström
Date:
2014-02-26 @ 12:29
On 2014-02-25 15:42 , heiko.helmle@horiba.com wrote:
>> > backup script procedure might be responsible for all the cache misses...
>> > it goes pretty much like this:
>> >
>> > Snapshot filesystems (btrfs or lvm)
>> > mount/bind-mount to temporary filesystem (random path (mktemp -d))
>> > cd to temporary filesystem
>> > attic create <repo> .
>>
>> Ah, that explains it. You're right, the cache uses a hash of the full
>> filesystem path. It could of course use the relative path instead but
>> that would open up for false positives, which could be really bad...
> 
> hmm... false positives would mean inode AND times AND paths are equal...

Yes, extremely unlikely but might be exploitable by a malicious user for
certain setups...

>> Anyway, hopefully it's possible to adapt your script to mount/bind your
>> snapshot to a constant path. If not, let me know and we'll figure
>> something out.
> 
> Script is adapted and backups are now from 90 minutes down to 5. Thanks
> again for the help.

Cool.

> 
> I'd propose making this cache optional though - I can still imagine one
> or two use cases where the user might want to force processing the files.

There's a ticket for that:
https://github.com/jborg/attic/issues/45

/ Jonas