librelist archives

« back to archive

deduplication for backup of largely identical systems

deduplication for backup of largely identical systems

From:
Marc Haber
Date:
2015-07-27 @ 15:16
Hi,

most of my machines are running Debian stable. It is therefore likely
that parts of the file system such as /usr will be largely identical
over most of my system.

To take advantage of borg's deduplication feature in this scale, it
sounds enticing to have all backups run into the same repository (borg
init ssh://backuphost//repository once and borg create
ssh:/backuphost//repository::localhost-date / for the actual backups).

Is that a recommended procedure? I guess that risk of repository loss
is higher that way, only one backup can write to the repository at a
single time, and all machines would be able to read each other's
backups since it's the same repository with the same key.

Are there any other implications that I need to be aware of before
engaging in borg deduplicating backup in scale?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

Re: [borgbackup] deduplication for backup of largely identical systems

From:
Thomas Waldmann
Date:
2015-07-27 @ 16:02
Hi Marc,

> most of my machines are running Debian stable. It is therefore likely
> that parts of the file system such as /usr will be largely identical
> over most of my system.

Yeah (plus some other parts, too).

> To take advantage of borg's deduplication feature in this scale, it
> sounds enticing to have all backups run into the same repository (borg
> init ssh://backuphost//repository once and borg create
> ssh:/backuphost//repository::localhost-date / for the actual backups).
> 
> Is that a recommended procedure?

You can backup multiple machines to same repo, but:
- be careful with prune (use prefix option, use dry-run)
- there is an exclusive write lock, so they will run sequential, not in
parallel
- the local cache on each machine will need a resync each time another
machine has updated the repo to bring it in sync again with the repo state

> I guess that risk of repository loss
> is higher that way, only one backup can write to the repository at a
> single time, and all machines would be able to read each other's
> backups since it's the same repository with the same key.

Exactly.

> Are there any other implications that I need to be aware of before
> engaging in borg deduplicating backup in scale?

Only what we have above. (AFAIK) ^^

Cheers,

Thomas