librelist archives

« back to archive

Questions about hardening borg repositories

Questions about hardening borg repositories

From:
Ed Blackman
Date:
2015-08-26 @ 16:16
I'm evaluating borg as a possible replacement of my current backup 
system that uses duplicity.  Generally, I backup system files for about 
90 days, and user files (with subdir-based exclusions) effectively 
forever.

However, borg's deduplication means that corruption within the 
repository would affect all of the archives.

My plan is to back up to a local disk, then do some scripting to create 
par2 parity files (Reed Solomon coding, 
https://github.com/Parchive/par2cmdline) for each segment, then sync the 
repository and parity files to Amazon S3, providing protection against 
individual file corruption and also failure of the local disk.

Questions:
- Is it safe to create files with non-numeric parts in the segment 
directory (eg "532.par2" and "532.volxxx+nnn.par2" for segment file 
532)?  par2cmdline has a heritage as a 'post binaries to Usenet' 
utility, so it wants to operate on files in the same directory as the 
parity files.  borg check --repair creates %d.beforerecover, so I think 
the answer is yes except for a negligible chance that borg will want to 
use those extensions.

- Do segment files ever change content between being first written and 
being removed?

- Is there any value in generating parity for or syncing the index.%d 
and hints.%d files?  It looks like they are rewritten on most operations 
and can be trivially regenerated by borg check.

- Are there any plans to add par2 support to borg?  Duplicity offers a 
par2 backend wrapper that creates parity files when duplicity creates 
files, removes the parity files when duplicity removes the corresponding 
core duplicity files, etc.  I didn't see a Github issue for it, but 
maybe someone has thoughts along that line.

-- 
Ed Blackman

Re: [borgbackup] Questions about hardening borg repositories

From:
Thomas Waldmann
Date:
2015-08-26 @ 17:14
> However, borg's deduplication means that corruption within the 
> repository would affect all of the archives.

Could, yes (if the archive refers to the corrupt chunk).

> My plan is to back up to a local disk, then do some scripting to create 
> par2 parity files (Reed Solomon coding, 
> https://github.com/Parchive/par2cmdline) for each segment, then sync the 
> repository and parity files to Amazon S3, providing protection against 
> individual file corruption and also failure of the local disk.

In the FAQ I argued against adding redundancy in borg (see there).

> Questions:
> - Is it safe to create files with non-numeric parts in the segment 
> directory (eg "532.par2" and "532.volxxx+nnn.par2" for segment file 
> 532)?  par2cmdline has a heritage as a 'post binaries to Usenet' 
> utility, so it wants to operate on files in the same directory as the 
> parity files.

Putting stuff into same directory is a bit unclean of course.
Currently, the segment iterator only works on purely numerical
directories and files, so I'ld guess it doesn't cause an issue now.

> - Do segment files ever change content between being first written and 
> being removed?

"create" won't touch full segment files again, just create new ones.
I am not totally sure about the very last ("not full" segment file),
maybe observe that yourself and tell us.

"check" might delete and add segments when repairing a repo/archive.

"delete" will delete and add segments.

> - Is there any value in generating parity for or syncing the index.%d 
> and hints.%d files?  It looks like they are rewritten on most operations 
> and can be trivially regenerated by borg check.

AFAIK "no".

> - Are there any plans to add par2 support to borg?

See FAQ.

If you can bring up good arguments for it that do not lead to "false
promises" and do not require information we do not have, I may
reconsider it.

Cheers,

Thomas

-- 

GPG ID: FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393

Re: [borgbackup] Questions about hardening borg repositories

From:
Ed Blackman
Date:
2015-08-26 @ 21:35
On Wed, Aug 26, 2015 at 07:14:52PM +0200, Thomas Waldmann wrote:
>> My plan is to back up to a local disk, then do some scripting to create
>> par2 parity files (Reed Solomon coding,
>> https://github.com/Parchive/par2cmdline) for each segment, then sync the
>> repository and parity files to Amazon S3, providing protection against
>> individual file corruption and also failure of the local disk.
>
>In the FAQ I argued against adding redundancy in borg (see there).

I read the FAQ, but missed that or forgot it was there.  I understand 
your reasoning even if I wish it were otherwise.

>> Questions:
>> - Is it safe to create files with non-numeric parts in the segment
>> directory (eg "532.par2" and "532.volxxx+nnn.par2" for segment file
>> 532)?  par2cmdline has a heritage as a 'post binaries to Usenet'
>> utility, so it wants to operate on files in the same directory as the
>> parity files.
>
>Putting stuff into same directory is a bit unclean of course.
>Currently, the segment iterator only works on purely numerical
>directories and files, so I'ld guess it doesn't cause an issue now.

Yeah, I'd prefer that the parity files live on a separate disk, too, not 
just for cleanliness but for safety too.

There might be some way to do it with par2cmdline that I haven't figured 
out.  In my experiments I could create the par2 files separate from the 
data, and with difficulty verify the par2 files separate from the data, 
but attempting to repair lead to the repaired file being created in the 
CWD, not where the file was.

>> - Do segment files ever change content between being first written and
>> being removed?
>
>"create" won't touch full segment files again, just create new ones.
>I am not totally sure about the very last ("not full" segment file),
>maybe observe that yourself and tell us.

Will do and report back.

>"check" might delete and add segments when repairing a repo/archive.
>
>"delete" will delete and add segments.

But not change?  That is, once data/0/532 is full, can it ever be 
changed or deleted and later recreated, or will it always have the same 
content until it's deleted?

>If you can bring up good arguments for it that do not lead to "false
>promises" and do not require information we do not have, I may
>reconsider it.

My understanding of your objection is that if a user has sectors go bad 
in the disk holding the repository, there's no way to prevent the bad 
sectors from also corrupting the parity blocks too.

Well, if the parity blocks could be kept in a different directory (set 
up at init time?), and that directory was mounted on a different disk, 
then the fact that sectors go bad in the repository wouldn't affect the 
parity blocks.

Alternately, are there plans to implement pluggable backends?  Duplicity 
provides an easy interface for adding different backends, leading to a 
great number of them.  See 

http://bazaar.launchpad.net/~duplicity-team/duplicity/0.7-series/files/head:/duplicity/backends/
starting with the README.  If borg separated the 'create segment file' 
from the 'store segment file' logic, with the latter in a backend, I 
could steal logic from duplicity's par2 meta-backend to do it myself, 
but with considerably higher reliability.

-- 
Ed Blackman