librelist archives

« back to archive

Questions and suggestions about inner working of Attic

Questions and suggestions about inner working of Attic

From:
Cyril Roussillon
Date:
2014-05-06 @ 15:55
Hi Jonas,

After a long search for a nice backup tool with robust encryption,
optimal bandwidth and disk usage, easy pruning, and easy accessibility
(mount), I think I've eventually found the perfect tool :-). So first
thank you for your great work !

However I would like to know a bit more about the inner working of attic
before making it my default backup tool. I noticed a few issues opened
to create the documentation, but I was too impatient and I started
digging in the source code. I'd like to explain here some things I have
found so that it helps other people and you can correct me if I'm wrong,
but I also have some questions about points that I could not clarify
easily (I'm not very used to python), and some suggestions.


-- Security

About the encryption, AES is used with CTR mode of operation (so no need
of padding). A 8 bytes initialization vector is used, a HMAC-SHA256 is
computed on the encrypted chunk (including nonce) and both are stored in
the chunk. The header of each chunk is actually : TYPE(1) + HMAC(32) +
NONCE(8). Encryption and HMAC use two different keys.

*Question*: I'm not sure to understand how the IV/nonce setup works. Are
you generating a random IV for each chunk, that you store as the nonce,
and start the counter at 0 for each chunk, or are you using the same IV
for all chunks and store the counter as the nonce to start for each
chunk ? According to the max repo size of 295 exabytes that you mention
in the class AESKeyBase, it seems that it would be the second solution.
Then "the first 8 bytes are always zeros" means that your IV is actually
0, and is not randomly generated ?

Keyfile (config/keys) content : repository_id, enc_key, enc_hmac_key,
id_key, chunk_seed

When a keyfile is generated it is also possible to encrypt it with a
passphrase, which is not very clear in the documentation. So you can
actually leave the keyfile on the backup server, so that you don't risk
to lose it, but still need a passphrase and importantly as well can
change your passphrase (Attic's user interface allows that).


-- Chunker

Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
with a minimum of 1024, and triggers when the last 16 bits of the
checksum are null, producing chunks of 64kB on average. All these
parameters are fixed. The buzhash table is altered by xoring it with a
seed randomly generated once for the archive, and stored encrypted in
the keyfile.

*Question*: is it possible to make the average chunk size a parameter,
to allow smaller or larger chunks, to tune memory usage ? Doubling the
chunks size will yield a bit more data transferred and data stored, but
will divide memory usage by almost 2.


-- Indexes and memory usage

I'm a bit worried about the memory usage of Attic with huge repos. In
issue 26 (https://github.com/jborg/attic/issues/26) you stated that the
memory usage is :

    Repository index: 40 bytes x N ~ 200MB (If a remote repository is
used this will be allocated on the remote side)
    Chunk lookup index: 44 bytes x N ~ 220MB
    File chunk cache: probably 80-100 bytes x N ~ 400MB

I have two remarks that could explain why people are experiencing more
memory used :
- 64kB is the average chunk size that the rolling checksum generates,
but files shorter than this or ending shorter generate shorter chunks,
so volume of data divided by 64kB is actually a minimum chunk count, and
there are more.
- The containers have (possibly a lot of) memory overhead

The chunk lookup index (chunk hash -> reference count, size, ciphered
size ; in file cache/chunk) and the repository index (chunk hash ->
segment, offset ; in file repo/index.%d) are stored in a sort of hash
table, directly mapped in memory from the file content, with only one
slot per bucket, but that spreads the collisions to the following
buckets. As a consequence the hash is just a start position for a linear
search, and if the element is not in the table the index is linearly
crossed until an empty bucket is found. When the table is full at 90%
its size is doubled, when it's empty at 25% its size is halfed. So
operations on it have a variable complexity between constant and linear
with low factor, and memory overhead varies between 10% and 300%.

*Question* : wouldn't it be more interesting to use a low overhead map
with logarithmic complexity for operations instead ? (for instance I'm
currently developping one with 1 bit of overhead per element and which
is just slightly slower than std::map, using hierarchically sorted
vectors - it takes around 15s to insert 8M elements and 10s to make 8M
searches in it, for elements of 48 bytes).

The file chunk cache (file path hash -> age, inode number, size,
mtime_ns, chunks hashes ; in file cache/files) is stored as a python
associative array storing python objects, which generate a lot of
overhead. I benchmarked around 240 bytes per file without the chunk
list, to be compared to at most 64 bytes of real data (depending on data
alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
of ~250 bytes even if only one chunck hash.

*Same question* : why not changing for a C structure (hash table or low
overhead map) that would save a lot of memory (176 bytes per file + 48
bytes per chunk, > 60%) ? If it's because of the variable array of
chunks hashes, it could be loaded from and saved to a file with a map as
well, if we drop the mmap access.

If I were to provide a pull request to use a low overhead map for the 3
indexes, would you merge it ?

Other suggestions to save memory :
- Do you have a particular reason for using SHA256 over SHA1 ? No
accidental collision has ever been found with SHA1, and is extremely
unlikely to happen (lot of tools such as git are using it). It would
save 4*12 bytes per chunk/file (around 20-25%).
- Is it mandatory to store st_ino to check if the file changed, in
addition to size and mtime ? Is it required for hard links management or
is it only used to help detect changes ?


-- Repository structure

"Filesystem based transactional key value store"

Objects referenced by a key (256bits id/hash) are stored in line in
files (segments) of size approx 5MB in repo/data. They contain : header
size, crc, size, tag, key, data. Tag is either put, delete, or commit.
Segments seem to be built locally, and then uploaded.

*Question*: what do the tags mean ?

The manifest is an object with an id of only zeros (32 bytes), that
references all the archives. It contains : version, list of archives,
timestamp, config. Each archive contains: name, id, time. It is the last
object stored, in the last segment, and is replaced each time.

An archive is an object that contain metadata : version, name, items
list, cmdline, hostname, username, time. Each item represents a file or
directory or symlink and contains: path, list of chunks, user, group,
uid, gid, mode (item type + permissions), source (for links), rdev (for
devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
no chunk (their entries are known because path are stored as full paths).

*Question*: why not storing and restoring ctime ? It's something we may
want to preserve.
*Question*: what happens if there are so many files that the object size
reaches MAX_OBJECT_SIZE = 20MB ?

A chunk is an object as well, of course, and its id is the hash of its
(unencrypted and uncompressed) content.

Hints are stored in a file (repo/hints) and contain: version, list of
segments, compact (?)

Not all files seem to be listed in every archive, but some are listed
several times even if they did not change and no parent directory
changed, so it's not incremental and it's not full, I don't really
understand (if I have 6 directories, I delete only one file in one
directory, then the content of 3 directories is listed again in the new
archive and the content of the 3 others is not)...

If it's incremental, when mounting an archive in a fuse filesystem,
attic has to download and parse all the archive objects to find all
files and create the directory structure, but according to what you said
on the mailing list before it is faster to load one archive than all of
them...


-- Network

A last question : what's exactly the difference between using attic on
the server and an sshfs mount, in term of things that are downloaded or
uploaded in addition ?


Thanks for your help !

-- 
Cyril Roussillon
http://crteknologies.fr/

Re: [attic] Questions and suggestions about inner working of Attic

From:
Jonas Borgström
Date:
2014-05-06 @ 19:58
On 2014-05-06 17:55, Cyril Roussillon wrote:
> Hi Jonas,
> 
> After a long search for a nice backup tool with robust encryption,
> optimal bandwidth and disk usage, easy pruning, and easy accessibility
> (mount), I think I've eventually found the perfect tool :-). So first
> thank you for your great work !

Thanks!

> However I would like to know a bit more about the inner working of attic
> before making it my default backup tool. I noticed a few issues opened
> to create the documentation, but I was too impatient and I started
> digging in the source code. I'd like to explain here some things I have
> found so that it helps other people and you can correct me if I'm wrong,
> but I also have some questions about points that I could not clarify
> easily (I'm not very used to python), and some suggestions.

Ok, cool

> -- Security
> 
> About the encryption, AES is used with CTR mode of operation (so no need
> of padding). A 8 bytes initialization vector is used, a HMAC-SHA256 is
> computed on the encrypted chunk (including nonce) and both are stored in
> the chunk. The header of each chunk is actually : TYPE(1) + HMAC(32) +
> NONCE(8). Encryption and HMAC use two different keys.
> 
> *Question*: I'm not sure to understand how the IV/nonce setup works. Are
> you generating a random IV for each chunk, that you store as the nonce,
> and start the counter at 0 for each chunk, or are you using the same IV
> for all chunks and store the counter as the nonce to start for each
> chunk ? According to the max repo size of 295 exabytes that you mention
> in the class AESKeyBase, it seems that it would be the second solution.
> Then "the first 8 bytes are always zeros" means that your IV is actually
> 0, and is not randomly generated ?

In AES CTR mode you can think of the IV as the start value for the
counter. The counter itself is incremented by one after each 16 byte
block. The IV/counter is not required to be random but it must NEVER be
reused. So to accomplish this Attic initializes the encryption counter
to be higher than any previously used counter value before encrypting
new data.
To save some space the counter/nonce is stored as a 64 bit integer which
means the counter will wrap after 2**64 * 16 bytes instead of the
theoretical maximum 2**128 * 16 bytes.


> Keyfile (config/keys) content : repository_id, enc_key, enc_hmac_key,
> id_key, chunk_seed
> 
> When a keyfile is generated it is also possible to encrypt it with a
> passphrase, which is not very clear in the documentation. So you can
> actually leave the keyfile on the backup server, so that you don't risk
> to lose it, but still need a passphrase and importantly as well can
> change your passphrase (Attic's user interface allows that).

Correct. You can also pass the passphrase to Attic using the
ATTIC_PASSPHRASE environment variable.

> -- Chunker
> 
> Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
> with a minimum of 1024, and triggers when the last 16 bits of the
> checksum are null, producing chunks of 64kB on average. All these
> parameters are fixed. The buzhash table is altered by xoring it with a
> seed randomly generated once for the archive, and stored encrypted in
> the keyfile.
> 
> *Question*: is it possible to make the average chunk size a parameter,
> to allow smaller or larger chunks, to tune memory usage ? Doubling the
> chunks size will yield a bit more data transferred and data stored, but
> will divide memory usage by almost 2.

Nothing stops us from supporting that in the future. But I generally try
to avoid making this configurable just because it's possible. It's
usually a sign that the solution itself isn't good enough and increases
the code complexity and makes the product more confusing to use.

But on the other hand, if it can be shown that this is something that
would allow Attic to support other (common) workloads without running
into other limitations and bottlenecks it's definitely worth considering.

> -- Indexes and memory usage
> 
> I'm a bit worried about the memory usage of Attic with huge repos. In
> issue 26 (https://github.com/jborg/attic/issues/26) you stated that the
> memory usage is :
> 
>     Repository index: 40 bytes x N ~ 200MB (If a remote repository is
> used this will be allocated on the remote side)
>     Chunk lookup index: 44 bytes x N ~ 220MB
>     File chunk cache: probably 80-100 bytes x N ~ 400MB
>
> I have two remarks that could explain why people are experiencing more
> memory used :
> - 64kB is the average chunk size that the rolling checksum generates,
> but files shorter than this or ending shorter generate shorter chunks,
> so volume of data divided by 64kB is actually a minimum chunk count, and
> there are more.
> - The containers have (possibly a lot of) memory overhead
> 
> The chunk lookup index (chunk hash -> reference count, size, ciphered
> size ; in file cache/chunk) and the repository index (chunk hash ->
> segment, offset ; in file repo/index.%d) are stored in a sort of hash
> table, directly mapped in memory from the file content, with only one
> slot per bucket, but that spreads the collisions to the following
> buckets. As a consequence the hash is just a start position for a linear
> search, and if the element is not in the table the index is linearly
> crossed until an empty bucket is found. When the table is full at 90%
> its size is doubled, when it's empty at 25% its size is halfed. So
> operations on it have a variable complexity between constant and linear
> with low factor, and memory overhead varies between 10% and 300%.
> 
> *Question* : wouldn't it be more interesting to use a low overhead map
> with logarithmic complexity for operations instead ? (for instance I'm
> currently developping one with 1 bit of overhead per element and which
> is just slightly slower than std::map, using hierarchically sorted
> vectors - it takes around 15s to insert 8M elements and 10s to make 8M
> searches in it, for elements of 48 bytes).

It would definitely be interesting to benchmark the two solutions but
the current hashindex implementation is surprisingly fast and efficient.

> The file chunk cache (file path hash -> age, inode number, size,
> mtime_ns, chunks hashes ; in file cache/files) is stored as a python
> associative array storing python objects, which generate a lot of
> overhead. I benchmarked around 240 bytes per file without the chunk
> list, to be compared to at most 64 bytes of real data (depending on data
> alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
> of ~250 bytes even if only one chunck hash.
> 
> *Same question* : why not changing for a C structure (hash table or low
> overhead map) that would save a lot of memory (176 bytes per file + 48
> bytes per chunk, > 60%) ? If it's because of the variable array of
> chunks hashes, it could be loaded from and saved to a file with a map as
> well, if we drop the mmap access.

The file chunk cache is far too memory hungry right now. Replacing that
with something more efficient is at the top of my todo list right now.
But I've not yet decided which approach to take.

> If I were to provide a pull request to use a low overhead map for the 3
> indexes, would you merge it ?

I would definitely be interested in seeing some benchmarks and examples
of the low overhead map code and how easy it is to integrate with
python. And depending on the results we can figure out if and how and
what to integrate.

> Other suggestions to save memory :
> - Do you have a particular reason for using SHA256 over SHA1 ? No
> accidental collision has ever been found with SHA1, and is extremely
> unlikely to happen (lot of tools such as git are using it). It would
> save 4*12 bytes per chunk/file (around 20-25%).

Mostly to have some extra security margin since it's non-trivial to
change the hash size later on.

> - Is it mandatory to store st_ino to check if the file changed, in
> addition to size and mtime ? Is it required for hard links management or
> is it only used to help detect changes ?

It's to uniquely identify files. In some setups the path might not be
unique between different archives.

> -- Repository structure
> 
> "Filesystem based transactional key value store"
> 
> Objects referenced by a key (256bits id/hash) are stored in line in
> files (segments) of size approx 5MB in repo/data. They contain : header
> size, crc, size, tag, key, data. Tag is either put, delete, or commit.
> Segments seem to be built locally, and then uploaded.
> 
> *Question*: what do the tags mean ?

A segment file is basically a transaction log where each repository
operation is appended to the file. So if an object is written to the
repository a "PUT"-tag is written to the file followed by the object id
and data. And if an object is deleted a "DELETE" tag is appended
followed by the object id.

> The manifest is an object with an id of only zeros (32 bytes), that
> references all the archives. It contains : version, list of archives,
> timestamp, config. Each archive contains: name, id, time. It is the last
> object stored, in the last segment, and is replaced each time.
> 
> An archive is an object that contain metadata : version, name, items
> list, cmdline, hostname, username, time. Each item represents a file or
> directory or symlink and contains: path, list of chunks, user, group,
> uid, gid, mode (item type + permissions), source (for links), rdev (for
> devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
> no chunk (their entries are known because path are stored as full paths).

Not exactly, the archive metadata does not contain the file items
directly. Only references to other objects that contain that data.

Each file/directory is represented by an "item" dictionary that contains
path, list of chunks, user, group, uid, gid, and other metadata.

All items are serialized using msgpack and the resulting byte stream is
fed into the same chunker used for regular file data and turned into
deduplicated chunks. The reference to these chunks is then added to the
archvive metadata.


> *Question*: why not storing and restoring ctime ? It's something we may
> want to preserve.

There is no way to restore ctime (change time). ctime is updated every
time an inode's metadata is changed and there's no api to set it.

> *Question*: what happens if there are so many files that the object size
> reaches MAX_OBJECT_SIZE = 20MB ?

As I explained above the file items are stored in multiple chunks/objects.

> A chunk is an object as well, of course, and its id is the hash of its
> (unencrypted and uncompressed) content.
> 
> Hints are stored in a file (repo/hints) and contain: version, list of
> segments, compact (?)
> 
> Not all files seem to be listed in every archive, but some are listed
> several times even if they did not change and no parent directory
> changed, so it's not incremental and it's not full, I don't really
> understand (if I have 6 directories, I delete only one file in one
> directory, then the content of 3 directories is listed again in the new
> archive and the content of the 3 others is not)...

I'm not sure I follow you here. All archives are "full" backups. They
should include all files and directories that existed at the time the
archive was created.

> If it's incremental, when mounting an archive in a fuse filesystem,
> attic has to download and parse all the archive objects to find all
> files and create the directory structure, but according to what you said
> on the mailing list before it is faster to load one archive than all of
> them...

It's not incremental.

> -- Network
> 
> A last question : what's exactly the difference between using attic on
> the server and an sshfs mount, in term of things that are downloaded or
> uploaded in addition ?

A remote attic repository is much more efficient than using sshfs. When
using sshfs the repository index files will also be accessed through
sshfs which is a lot slower than when running attic on the server where
the index files will be on the local filesystem.

"attic check --repository-only" is also a lot faster since it's a server
side operation when not using sshfs.

Thanks for looking at Attic! let me know if there's anything more I can
clarify.

/ Jonas

Re: [attic] Questions and suggestions about inner working of Attic

From:
Cyril Roussillon
Date:
2014-05-07 @ 08:47
On 2014/05/06 21:58, Jonas Borgström wrote:
> On 2014-05-06 17:55, Cyril Roussillon wrote:
> <snip>
>> -- Security
>>
>> About the encryption, AES is used with CTR mode of operation (so no need
>> of padding). A 8 bytes initialization vector is used, a HMAC-SHA256 is
>> computed on the encrypted chunk (including nonce) and both are stored in
>> the chunk. The header of each chunk is actually : TYPE(1) + HMAC(32) +
>> NONCE(8). Encryption and HMAC use two different keys.
>>
>> *Question*: I'm not sure to understand how the IV/nonce setup works. Are
>> you generating a random IV for each chunk, that you store as the nonce,
>> and start the counter at 0 for each chunk, or are you using the same IV
>> for all chunks and store the counter as the nonce to start for each
>> chunk ? According to the max repo size of 295 exabytes that you mention
>> in the class AESKeyBase, it seems that it would be the second solution.
>> Then "the first 8 bytes are always zeros" means that your IV is actually
>> 0, and is not randomly generated ?
> In AES CTR mode you can think of the IV as the start value for the
> counter. The counter itself is incremented by one after each 16 byte
> block. The IV/counter is not required to be random but it must NEVER be
> reused. So to accomplish this Attic initializes the encryption counter
> to be higher than any previously used counter value before encrypting
> new data.
> To save some space the counter/nonce is stored as a 64 bit integer which
> means the counter will wrap after 2**64 * 16 bytes instead of the
> theoretical maximum 2**128 * 16 bytes.

Ok it's indeed the second solution, but you are right it's perfectly
fine. I was feeling strange to use a null IV, but IV is useless if
encrypting only one message with the same key, and by always increasing
the counter you're simulating a single huge message.

> <snip>
>> -- Chunker
>>
>> Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
>> with a minimum of 1024, and triggers when the last 16 bits of the
>> checksum are null, producing chunks of 64kB on average. All these
>> parameters are fixed. The buzhash table is altered by xoring it with a
>> seed randomly generated once for the archive, and stored encrypted in
>> the keyfile.
>>
>> *Question*: is it possible to make the average chunk size a parameter,
>> to allow smaller or larger chunks, to tune memory usage ? Doubling the
>> chunks size will yield a bit more data transferred and data stored, but
>> will divide memory usage by almost 2.
> Nothing stops us from supporting that in the future. But I generally try
> to avoid making this configurable just because it's possible. It's
> usually a sign that the solution itself isn't good enough and increases
> the code complexity and makes the product more confusing to use.
>
> But on the other hand, if it can be shown that this is something that
> would allow Attic to support other (common) workloads without running
> into other limitations and bottlenecks it's definitely worth considering.

I understand, I will see how it behaves with my whole repo.

>> -- Indexes and memory usage
>>
>> I'm a bit worried about the memory usage of Attic with huge repos. In
>> issue 26 (https://github.com/jborg/attic/issues/26) you stated that the
>> memory usage is :
>>
>>     Repository index: 40 bytes x N ~ 200MB (If a remote repository is
>> used this will be allocated on the remote side)
>>     Chunk lookup index: 44 bytes x N ~ 220MB
>>     File chunk cache: probably 80-100 bytes x N ~ 400MB
>>
>> I have two remarks that could explain why people are experiencing more
>> memory used :
>> - 64kB is the average chunk size that the rolling checksum generates,
>> but files shorter than this or ending shorter generate shorter chunks,
>> so volume of data divided by 64kB is actually a minimum chunk count, and
>> there are more.
>> - The containers have (possibly a lot of) memory overhead
>>
>> The chunk lookup index (chunk hash -> reference count, size, ciphered
>> size ; in file cache/chunk) and the repository index (chunk hash ->
>> segment, offset ; in file repo/index.%d) are stored in a sort of hash
>> table, directly mapped in memory from the file content, with only one
>> slot per bucket, but that spreads the collisions to the following
>> buckets. As a consequence the hash is just a start position for a linear
>> search, and if the element is not in the table the index is linearly
>> crossed until an empty bucket is found. When the table is full at 90%
>> its size is doubled, when it's empty at 25% its size is halfed. So
>> operations on it have a variable complexity between constant and linear
>> with low factor, and memory overhead varies between 10% and 300%.
>>
>> *Question* : wouldn't it be more interesting to use a low overhead map
>> with logarithmic complexity for operations instead ? (for instance I'm
>> currently developping one with 1 bit of overhead per element and which
>> is just slightly slower than std::map, using hierarchically sorted
>> vectors - it takes around 15s to insert 8M elements and 10s to make 8M
>> searches in it, for elements of 48 bytes).
> It would definitely be interesting to benchmark the two solutions but
> the current hashindex implementation is surprisingly fast and efficient.
>
>> The file chunk cache (file path hash -> age, inode number, size,
>> mtime_ns, chunks hashes ; in file cache/files) is stored as a python
>> associative array storing python objects, which generate a lot of
>> overhead. I benchmarked around 240 bytes per file without the chunk
>> list, to be compared to at most 64 bytes of real data (depending on data
>> alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
>> of ~250 bytes even if only one chunck hash.
>>
>> *Same question* : why not changing for a C structure (hash table or low
>> overhead map) that would save a lot of memory (176 bytes per file + 48
>> bytes per chunk, > 60%) ? If it's because of the variable array of
>> chunks hashes, it could be loaded from and saved to a file with a map as
>> well, if we drop the mmap access.
> The file chunk cache is far too memory hungry right now. Replacing that
> with something more efficient is at the top of my todo list right now.
> But I've not yet decided which approach to take.
>
>> If I were to provide a pull request to use a low overhead map for the 3
>> indexes, would you merge it ?
> I would definitely be interested in seeing some benchmarks and examples
> of the low overhead map code and how easy it is to integrate with
> python. And depending on the results we can figure out if and how and
> what to integrate.

I already have a few benchmarks for insertion and find only (no erase
yet), and compared to std::map it's at worst 1.5 times slower for 8M
elements. As soon as I am done with it I will make the source code
available, and benchmark it with usages similar to Attic (I will try to
make Attic log all the operations it is doing on the tables, to have
real examples), to compare with std::map and your hash index implementation.

For now the interface mimics std::map interface, it would possible to
convert it to C code but it would be better for maintainability to wrap
directly the C++ code, and I believe this is possible with Cython.

> <snip>
>> - Is it mandatory to store st_ino to check if the file changed, in
>> addition to size and mtime ? Is it required for hard links management or
>> is it only used to help detect changes ?
> It's to uniquely identify files. In some setups the path might not be
> unique between different archives.

Ok I see, like if one of the subdirectories is a mount point, with
different things mounted between different archives ?

>> -- Repository structure
>>
>> "Filesystem based transactional key value store"
>>
>> Objects referenced by a key (256bits id/hash) are stored in line in
>> files (segments) of size approx 5MB in repo/data. They contain : header
>> size, crc, size, tag, key, data. Tag is either put, delete, or commit.
>> Segments seem to be built locally, and then uploaded.
>>
>> *Question*: what do the tags mean ?
> A segment file is basically a transaction log where each repository
> operation is appended to the file. So if an object is written to the
> repository a "PUT"-tag is written to the file followed by the object id
> and data. And if an object is deleted a "DELETE" tag is appended
> followed by the object id.

Ok, but what about the commit tag ?

>> The manifest is an object with an id of only zeros (32 bytes), that
>> references all the archives. It contains : version, list of archives,
>> timestamp, config. Each archive contains: name, id, time. It is the last
>> object stored, in the last segment, and is replaced each time.
>>
>> An archive is an object that contain metadata : version, name, items
>> list, cmdline, hostname, username, time. Each item represents a file or
>> directory or symlink and contains: path, list of chunks, user, group,
>> uid, gid, mode (item type + permissions), source (for links), rdev (for
>> devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
>> no chunk (their entries are known because path are stored as full paths).
> Not exactly, the archive metadata does not contain the file items
> directly. Only references to other objects that contain that data.
>
> Each file/directory is represented by an "item" dictionary that contains
> path, list of chunks, user, group, uid, gid, and other metadata.
>
> All items are serialized using msgpack and the resulting byte stream is
> fed into the same chunker used for regular file data and turned into
> deduplicated chunks. The reference to these chunks is then added to the
> archvive metadata.

Great I understand better now, I missed the fact that it points to
chunks that contain the file items.
It actually answers several of my next interrogations : no problem with
max object size ; this is the reason why only  some part of the item
list is repeated for a new archive (only the chunk that was modified by
the file removal has to be recreated) ; and it clarifies that all
updates are full and that you only have to download one archive metadata
for the fuse mount.

However I feel that deduplicating metadata is a bit risky. Deduplication
by nature increases the risk of data loss by removing duplication. So if
a data chunk or segment is corrupted, only a few files in all archives
that refer to this or these chunks will be corrupted. That's bad but
there is no choice if you want deduplication. However if a single
metadata chunk is corrupted, then you can loose a lot of files in all
archives. That's very bad, and as it represents a very small amount of
data compared to the whole repo we can protect it better without
defeiting the purpose of deduplication.
It is actually a good idea to deduplicate metadata, because if you have
n archives you probably don't want n copies of the metadata of old
files. But you may want 2 copies, and maybe a third one on another disk
or partition. So what about storing them in a different store, in order
to keep a copy of it, and to allow the user to backup it wherever they
want ?

>> *Question*: why not storing and restoring ctime ? It's something we may
>> want to preserve.
> There is no way to restore ctime (change time). ctime is updated every
> time an inode's metadata is changed and there's no api to set it.

Oh good to know!

> <snip> 

--
Cyril Roussillon
http://crteknologies.fr/

Re: [attic] Questions and suggestions about inner working of Attic

From:
Jonas Borgström
Date:
2014-05-07 @ 21:18
On 2014-05-07 10:47, Cyril Roussillon wrote:
> On 2014/05/06 21:58, Jonas Borgström wrote:
>> On 2014-05-06 17:55, Cyril Roussillon wrote:
>> <snip>
>>> -- Security
>>>
>>> About the encryption, AES is used with CTR mode of operation (so no need
>>> of padding). A 8 bytes initialization vector is used, a HMAC-SHA256 is
>>> computed on the encrypted chunk (including nonce) and both are stored in
>>> the chunk. The header of each chunk is actually : TYPE(1) + HMAC(32) +
>>> NONCE(8). Encryption and HMAC use two different keys.
>>>
>>> *Question*: I'm not sure to understand how the IV/nonce setup works. Are
>>> you generating a random IV for each chunk, that you store as the nonce,
>>> and start the counter at 0 for each chunk, or are you using the same IV
>>> for all chunks and store the counter as the nonce to start for each
>>> chunk ? According to the max repo size of 295 exabytes that you mention
>>> in the class AESKeyBase, it seems that it would be the second solution.
>>> Then "the first 8 bytes are always zeros" means that your IV is actually
>>> 0, and is not randomly generated ?
>> In AES CTR mode you can think of the IV as the start value for the
>> counter. The counter itself is incremented by one after each 16 byte
>> block. The IV/counter is not required to be random but it must NEVER be
>> reused. So to accomplish this Attic initializes the encryption counter
>> to be higher than any previously used counter value before encrypting
>> new data.
>> To save some space the counter/nonce is stored as a 64 bit integer which
>> means the counter will wrap after 2**64 * 16 bytes instead of the
>> theoretical maximum 2**128 * 16 bytes.
> 
> Ok it's indeed the second solution, but you are right it's perfectly
> fine. I was feeling strange to use a null IV, but IV is useless if
> encrypting only one message with the same key, and by always increasing
> the counter you're simulating a single huge message.
> 
>> <snip>
>>> -- Chunker
>>>
>>> Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
>>> with a minimum of 1024, and triggers when the last 16 bits of the
>>> checksum are null, producing chunks of 64kB on average. All these
>>> parameters are fixed. The buzhash table is altered by xoring it with a
>>> seed randomly generated once for the archive, and stored encrypted in
>>> the keyfile.
>>>
>>> *Question*: is it possible to make the average chunk size a parameter,
>>> to allow smaller or larger chunks, to tune memory usage ? Doubling the
>>> chunks size will yield a bit more data transferred and data stored, but
>>> will divide memory usage by almost 2.
>> Nothing stops us from supporting that in the future. But I generally try
>> to avoid making this configurable just because it's possible. It's
>> usually a sign that the solution itself isn't good enough and increases
>> the code complexity and makes the product more confusing to use.
>>
>> But on the other hand, if it can be shown that this is something that
>> would allow Attic to support other (common) workloads without running
>> into other limitations and bottlenecks it's definitely worth considering.
> 
> I understand, I will see how it behaves with my whole repo.

Can you give some more details about the use case you have in mind?
Like number of files, average file size and total uncompressed size.

>>> -- Indexes and memory usage
>>>
>>> I'm a bit worried about the memory usage of Attic with huge repos. In
>>> issue 26 (https://github.com/jborg/attic/issues/26) you stated that the
>>> memory usage is :
>>>
>>>     Repository index: 40 bytes x N ~ 200MB (If a remote repository is
>>> used this will be allocated on the remote side)
>>>     Chunk lookup index: 44 bytes x N ~ 220MB
>>>     File chunk cache: probably 80-100 bytes x N ~ 400MB
>>>
>>> I have two remarks that could explain why people are experiencing more
>>> memory used :
>>> - 64kB is the average chunk size that the rolling checksum generates,
>>> but files shorter than this or ending shorter generate shorter chunks,
>>> so volume of data divided by 64kB is actually a minimum chunk count, and
>>> there are more.
>>> - The containers have (possibly a lot of) memory overhead
>>>
>>> The chunk lookup index (chunk hash -> reference count, size, ciphered
>>> size ; in file cache/chunk) and the repository index (chunk hash ->
>>> segment, offset ; in file repo/index.%d) are stored in a sort of hash
>>> table, directly mapped in memory from the file content, with only one
>>> slot per bucket, but that spreads the collisions to the following
>>> buckets. As a consequence the hash is just a start position for a linear
>>> search, and if the element is not in the table the index is linearly
>>> crossed until an empty bucket is found. When the table is full at 90%
>>> its size is doubled, when it's empty at 25% its size is halfed. So
>>> operations on it have a variable complexity between constant and linear
>>> with low factor, and memory overhead varies between 10% and 300%.
>>>
>>> *Question* : wouldn't it be more interesting to use a low overhead map
>>> with logarithmic complexity for operations instead ? (for instance I'm
>>> currently developping one with 1 bit of overhead per element and which
>>> is just slightly slower than std::map, using hierarchically sorted
>>> vectors - it takes around 15s to insert 8M elements and 10s to make 8M
>>> searches in it, for elements of 48 bytes).
>> It would definitely be interesting to benchmark the two solutions but
>> the current hashindex implementation is surprisingly fast and efficient.
>>
>>> The file chunk cache (file path hash -> age, inode number, size,
>>> mtime_ns, chunks hashes ; in file cache/files) is stored as a python
>>> associative array storing python objects, which generate a lot of
>>> overhead. I benchmarked around 240 bytes per file without the chunk
>>> list, to be compared to at most 64 bytes of real data (depending on data
>>> alignment), and around 80 bytes per chunk hash (vs 32), with a minimum
>>> of ~250 bytes even if only one chunck hash.
>>>
>>> *Same question* : why not changing for a C structure (hash table or low
>>> overhead map) that would save a lot of memory (176 bytes per file + 48
>>> bytes per chunk, > 60%) ? If it's because of the variable array of
>>> chunks hashes, it could be loaded from and saved to a file with a map as
>>> well, if we drop the mmap access.
>> The file chunk cache is far too memory hungry right now. Replacing that
>> with something more efficient is at the top of my todo list right now.
>> But I've not yet decided which approach to take.
>>
>>> If I were to provide a pull request to use a low overhead map for the 3
>>> indexes, would you merge it ?
>> I would definitely be interested in seeing some benchmarks and examples
>> of the low overhead map code and how easy it is to integrate with
>> python. And depending on the results we can figure out if and how and
>> what to integrate.
> 
> I already have a few benchmarks for insertion and find only (no erase
> yet), and compared to std::map it's at worst 1.5 times slower for 8M
> elements. As soon as I am done with it I will make the source code
> available, and benchmark it with usages similar to Attic (I will try to
> make Attic log all the operations it is doing on the tables, to have
> real examples), to compare with std::map and your hash index implementation.
> 
> For now the interface mimics std::map interface, it would possible to
> convert it to C code but it would be better for maintainability to wrap
> directly the C++ code, and I believe this is possible with Cython.
> 
>> <snip>
>>> - Is it mandatory to store st_ino to check if the file changed, in
>>> addition to size and mtime ? Is it required for hard links management or
>>> is it only used to help detect changes ?
>> It's to uniquely identify files. In some setups the path might not be
>> unique between different archives.
> 
> Ok I see, like if one of the subdirectories is a mount point, with
> different things mounted between different archives ?
> 
>>> -- Repository structure
>>>
>>> "Filesystem based transactional key value store"
>>>
>>> Objects referenced by a key (256bits id/hash) are stored in line in
>>> files (segments) of size approx 5MB in repo/data. They contain : header
>>> size, crc, size, tag, key, data. Tag is either put, delete, or commit.
>>> Segments seem to be built locally, and then uploaded.
>>>
>>> *Question*: what do the tags mean ?
>> A segment file is basically a transaction log where each repository
>> operation is appended to the file. So if an object is written to the
>> repository a "PUT"-tag is written to the file followed by the object id
>> and data. And if an object is deleted a "DELETE" tag is appended
>> followed by the object id.
> 
> Ok, but what about the commit tag ?

The commit tag is written when a repository transaction is committed.
When a repository is opened any put/delete operations not followed by a
commit tag are discarded since they are part of a partial/uncommitted
transaction.

>>> The manifest is an object with an id of only zeros (32 bytes), that
>>> references all the archives. It contains : version, list of archives,
>>> timestamp, config. Each archive contains: name, id, time. It is the last
>>> object stored, in the last segment, and is replaced each time.
>>>
>>> An archive is an object that contain metadata : version, name, items
>>> list, cmdline, hostname, username, time. Each item represents a file or
>>> directory or symlink and contains: path, list of chunks, user, group,
>>> uid, gid, mode (item type + permissions), source (for links), rdev (for
>>> devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
>>> no chunk (their entries are known because path are stored as full paths).
>> Not exactly, the archive metadata does not contain the file items
>> directly. Only references to other objects that contain that data.
>>
>> Each file/directory is represented by an "item" dictionary that contains
>> path, list of chunks, user, group, uid, gid, and other metadata.
>>
>> All items are serialized using msgpack and the resulting byte stream is
>> fed into the same chunker used for regular file data and turned into
>> deduplicated chunks. The reference to these chunks is then added to the
>> archvive metadata.
> 
> Great I understand better now, I missed the fact that it points to
> chunks that contain the file items.
> It actually answers several of my next interrogations : no problem with
> max object size ; this is the reason why only  some part of the item
> list is repeated for a new archive (only the chunk that was modified by
> the file removal has to be recreated) ; and it clarifies that all
> updates are full and that you only have to download one archive metadata
> for the fuse mount.
> 
> However I feel that deduplicating metadata is a bit risky. Deduplication
> by nature increases the risk of data loss by removing duplication. So if
> a data chunk or segment is corrupted, only a few files in all archives
> that refer to this or these chunks will be corrupted. That's bad but
> there is no choice if you want deduplication. However if a single
> metadata chunk is corrupted, then you can loose a lot of files in all
> archives. That's very bad, and as it represents a very small amount of
> data compared to the whole repo we can protect it better without
> defeiting the purpose of deduplication.
> It is actually a good idea to deduplicate metadata, because if you have
> n archives you probably don't want n copies of the metadata of old
> files. But you may want 2 copies, and maybe a third one on another disk
> or partition. So what about storing them in a different store, in order
> to keep a copy of it, and to allow the user to backup it wherever they
> want ?

The repository format is designed to be as compact and robust as
possible. But I've deliberately avoided trying to compensate for faulty
hardware by implementing things like erasure coding since I think that
would be a layer violation and should be handled by the OS, raid system
and/or filesystem.

Attic is however very good at detecting repository corruption or other
inconsistencies so the following command can be used to make sure the
repository is in good working order before periodically rsyncing it to a
remote location:

attic check --repository-only foo.attic

/ Jonas

>>> *Question*: why not storing and restoring ctime ? It's something we may
>>> want to preserve.
>> There is no way to restore ctime (change time). ctime is updated every
>> time an inode's metadata is changed and there's no api to set it.
> 
> Oh good to know!
> 
>> <snip> 
> 
> --
> Cyril Roussillon
> http://crteknologies.fr/
> 
> 

Re: [attic] Questions and suggestions about inner working of Attic

From:
Christian Neukirchen
Date:
2014-05-08 @ 10:46
"Jonas Borgström" <jonas@borgstrom.se> writes:

> Attic is however very good at detecting repository corruption or other
> inconsistencies so the following command can be used to make sure the
> repository is in good working order before periodically rsyncing it to a
> remote location:

BTW, how hard/costly would it be to modify the format in such a way that
files are only added but never rewritten?  This would make replication
even easier.

-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org

Re: [attic] Questions and suggestions about inner working of Attic

From:
Jonas Borgström
Date:
2014-05-08 @ 15:34
On 08/05/14 12:46, Christian Neukirchen wrote:
> "Jonas Borgström" <jonas@borgstrom.se> writes:
> 
>> Attic is however very good at detecting repository corruption or other
>> inconsistencies so the following command can be used to make sure the
>> repository is in good working order before periodically rsyncing it to a
>> remote location:
> 
> BTW, how hard/costly would it be to modify the format in such a way that
> files are only added but never rewritten?  This would make replication
> even easier.

All essential files (the segment files in the "data" directory) are
strictly append only and only modified once.
The index.X file is random access but that file can be recreated if
damaged or lost using "attic check --repair".

But as long you make sure no one is modifying the repository while
running rsync you should be fine. Or even better, take a posix read lock
on "$REPOSITORY/config" before you start.





Re: [attic] Questions and suggestions about inner working of Attic

From:
Cyril Roussillon
Date:
2014-05-08 @ 11:11
On 2014/05/07 23:18, Jonas Borgström wrote:
> On 2014-05-07 10:47, Cyril Roussillon wrote:
>> On 2014/05/06 21:58, Jonas Borgström wrote:
>>> On 2014-05-06 17:55, Cyril Roussillon wrote:
>>>> -- Chunker
>>>>
>>>> Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
>>>> with a minimum of 1024, and triggers when the last 16 bits of the
>>>> checksum are null, producing chunks of 64kB on average. All these
>>>> parameters are fixed. The buzhash table is altered by xoring it with a
>>>> seed randomly generated once for the archive, and stored encrypted in
>>>> the keyfile.
>>>>
>>>> *Question*: is it possible to make the average chunk size a parameter,
>>>> to allow smaller or larger chunks, to tune memory usage ? Doubling the
>>>> chunks size will yield a bit more data transferred and data stored, but
>>>> will divide memory usage by almost 2.
>>> Nothing stops us from supporting that in the future. But I generally try
>>> to avoid making this configurable just because it's possible. It's
>>> usually a sign that the solution itself isn't good enough and increases
>>> the code complexity and makes the product more confusing to use.
>>>
>>> But on the other hand, if it can be shown that this is something that
>>> would allow Attic to support other (common) workloads without running
>>> into other limitations and bottlenecks it's definitely worth considering.
>> I understand, I will see how it behaves with my whole repo.
> Can you give some more details about the use case you have in mind?
> Like number of files, average file size and total uncompressed size.

Yes it is 1.6M files and 360GB of total uncompressed size, thus an
average file size of 236kB. However there are quite a lot of tiny files
(< 20 bytes, few hundreds of thousands).

>>>> The manifest is an object with an id of only zeros (32 bytes), that
>>>> references all the archives. It contains : version, list of archives,
>>>> timestamp, config. Each archive contains: name, id, time. It is the last
>>>> object stored, in the last segment, and is replaced each time.
>>>>
>>>> An archive is an object that contain metadata : version, name, items
>>>> list, cmdline, hostname, username, time. Each item represents a file or
>>>> directory or symlink and contains: path, list of chunks, user, group,
>>>> uid, gid, mode (item type + permissions), source (for links), rdev (for
>>>> devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
>>>> no chunk (their entries are known because path are stored as full paths).
>>> Not exactly, the archive metadata does not contain the file items
>>> directly. Only references to other objects that contain that data.
>>>
>>> Each file/directory is represented by an "item" dictionary that contains
>>> path, list of chunks, user, group, uid, gid, and other metadata.
>>>
>>> All items are serialized using msgpack and the resulting byte stream is
>>> fed into the same chunker used for regular file data and turned into
>>> deduplicated chunks. The reference to these chunks is then added to the
>>> archvive metadata.
>> Great I understand better now, I missed the fact that it points to
>> chunks that contain the file items.
>> It actually answers several of my next interrogations : no problem with
>> max object size ; this is the reason why only  some part of the item
>> list is repeated for a new archive (only the chunk that was modified by
>> the file removal has to be recreated) ; and it clarifies that all
>> updates are full and that you only have to download one archive metadata
>> for the fuse mount.
>>
>> However I feel that deduplicating metadata is a bit risky. Deduplication
>> by nature increases the risk of data loss by removing duplication. So if
>> a data chunk or segment is corrupted, only a few files in all archives
>> that refer to this or these chunks will be corrupted. That's bad but
>> there is no choice if you want deduplication. However if a single
>> metadata chunk is corrupted, then you can loose a lot of files in all
>> archives. That's very bad, and as it represents a very small amount of
>> data compared to the whole repo we can protect it better without
>> defeiting the purpose of deduplication.
>> It is actually a good idea to deduplicate metadata, because if you have
>> n archives you probably don't want n copies of the metadata of old
>> files. But you may want 2 copies, and maybe a third one on another disk
>> or partition. So what about storing them in a different store, in order
>> to keep a copy of it, and to allow the user to backup it wherever they
>> want ?
> The repository format is designed to be as compact and robust as
> possible. But I've deliberately avoided trying to compensate for faulty
> hardware by implementing things like erasure coding since I think that
> would be a layer violation and should be handled by the OS, raid system
> and/or filesystem.
>
> Attic is however very good at detecting repository corruption or other
> inconsistencies so the following command can be used to make sure the
> repository is in good working order before periodically rsyncing it to a
> remote location:
>
> attic check --repository-only foo.attic

I see, but detecting a corruption is not repairing it. Even if you
detect the corruption before actually needing the backup, you may still
lose some history that you don't have anymore on your working copy.

I understand that you don't want to implement something as complicated
as erasure coding, but I was suggesting to separate the metadata store
because it is easy to implement, and would allow users that don't have a
raid system to take an extra precaution, as bad sectors and filesystem
corruptions still happen.


Re: [attic] Questions and suggestions about inner working of Attic

From:
Jonas Borgström
Date:
2014-05-08 @ 15:34
On 08/05/14 13:11, Cyril Roussillon wrote:
> 
> On 2014/05/07 23:18, Jonas Borgström wrote:
>> On 2014-05-07 10:47, Cyril Roussillon wrote:
>>> On 2014/05/06 21:58, Jonas Borgström wrote:
>>>> On 2014-05-06 17:55, Cyril Roussillon wrote:
>>>>> -- Chunker
>>>>>
>>>>> Rolling checksum with Buzhash algorithm, with window size of 4095 bytes,
>>>>> with a minimum of 1024, and triggers when the last 16 bits of the
>>>>> checksum are null, producing chunks of 64kB on average. All these
>>>>> parameters are fixed. The buzhash table is altered by xoring it with a
>>>>> seed randomly generated once for the archive, and stored encrypted in
>>>>> the keyfile.
>>>>>
>>>>> *Question*: is it possible to make the average chunk size a parameter,
>>>>> to allow smaller or larger chunks, to tune memory usage ? Doubling the
>>>>> chunks size will yield a bit more data transferred and data stored, but
>>>>> will divide memory usage by almost 2.
>>>> Nothing stops us from supporting that in the future. But I generally try
>>>> to avoid making this configurable just because it's possible. It's
>>>> usually a sign that the solution itself isn't good enough and increases
>>>> the code complexity and makes the product more confusing to use.
>>>>
>>>> But on the other hand, if it can be shown that this is something that
>>>> would allow Attic to support other (common) workloads without running
>>>> into other limitations and bottlenecks it's definitely worth considering.
>>> I understand, I will see how it behaves with my whole repo.
>> Can you give some more details about the use case you have in mind?
>> Like number of files, average file size and total uncompressed size.
> 
> Yes it is 1.6M files and 360GB of total uncompressed size, thus an
> average file size of 236kB. However there are quite a lot of tiny files
> (< 20 bytes, few hundreds of thousands).
> 
>>>>> The manifest is an object with an id of only zeros (32 bytes), that
>>>>> references all the archives. It contains : version, list of archives,
>>>>> timestamp, config. Each archive contains: name, id, time. It is the last
>>>>> object stored, in the last segment, and is replaced each time.
>>>>>
>>>>> An archive is an object that contain metadata : version, name, items
>>>>> list, cmdline, hostname, username, time. Each item represents a file or
>>>>> directory or symlink and contains: path, list of chunks, user, group,
>>>>> uid, gid, mode (item type + permissions), source (for links), rdev (for
>>>>> devices), mtime, xattrs, acl, bsdfiles. Directories have no content so
>>>>> no chunk (their entries are known because path are stored as full paths).
>>>> Not exactly, the archive metadata does not contain the file items
>>>> directly. Only references to other objects that contain that data.
>>>>
>>>> Each file/directory is represented by an "item" dictionary that contains
>>>> path, list of chunks, user, group, uid, gid, and other metadata.
>>>>
>>>> All items are serialized using msgpack and the resulting byte stream is
>>>> fed into the same chunker used for regular file data and turned into
>>>> deduplicated chunks. The reference to these chunks is then added to the
>>>> archvive metadata.
>>> Great I understand better now, I missed the fact that it points to
>>> chunks that contain the file items.
>>> It actually answers several of my next interrogations : no problem with
>>> max object size ; this is the reason why only  some part of the item
>>> list is repeated for a new archive (only the chunk that was modified by
>>> the file removal has to be recreated) ; and it clarifies that all
>>> updates are full and that you only have to download one archive metadata
>>> for the fuse mount.
>>>
>>> However I feel that deduplicating metadata is a bit risky. Deduplication
>>> by nature increases the risk of data loss by removing duplication. So if
>>> a data chunk or segment is corrupted, only a few files in all archives
>>> that refer to this or these chunks will be corrupted. That's bad but
>>> there is no choice if you want deduplication. However if a single
>>> metadata chunk is corrupted, then you can loose a lot of files in all
>>> archives. That's very bad, and as it represents a very small amount of
>>> data compared to the whole repo we can protect it better without
>>> defeiting the purpose of deduplication.
>>> It is actually a good idea to deduplicate metadata, because if you have
>>> n archives you probably don't want n copies of the metadata of old
>>> files. But you may want 2 copies, and maybe a third one on another disk
>>> or partition. So what about storing them in a different store, in order
>>> to keep a copy of it, and to allow the user to backup it wherever they
>>> want ?
>> The repository format is designed to be as compact and robust as
>> possible. But I've deliberately avoided trying to compensate for faulty
>> hardware by implementing things like erasure coding since I think that
>> would be a layer violation and should be handled by the OS, raid system
>> and/or filesystem.
>>
>> Attic is however very good at detecting repository corruption or other
>> inconsistencies so the following command can be used to make sure the
>> repository is in good working order before periodically rsyncing it to a
>> remote location:
>>
>> attic check --repository-only foo.attic
> 
> I see, but detecting a corruption is not repairing it. Even if you
> detect the corruption before actually needing the backup, you may still
> lose some history that you don't have anymore on your working copy.
> 
> I understand that you don't want to implement something as complicated
> as erasure coding, but I was suggesting to separate the metadata store
> because it is easy to implement, and would allow users that don't have a
> raid system to take an extra precaution, as bad sectors and filesystem
> corruptions still happen.

Disks and entire file systems can fail in so many ways. I'm not
convinced that this is worth it. You should have multiple copies (on
different physical locations) of all data you care about.

/ Jonas

Re: [attic] Questions and suggestions about inner working of Attic

From:
Jeremy Maitin-Shepard
Date:
2014-05-08 @ 20:28
On Thu, May 8, 2014 at 8:34 AM, Jonas Borgström <jonas@borgstrom.se> wrote:

> Disks and entire file systems can fail in so many ways. I'm not
> convinced that this is worth it. You should have multiple copies (on
> different physical locations) of all data you care about.
>
>

attic check could be extended, though, to operate on multiple copies of the
repository that are supposed to be identical.  If any entry is corrupted in
one copy, it can be obtained from another copy.

Additionally, it seems it could be useful to store the metadata in what
would effectively be a separate repository:
 - this would allow for better data locality in accessing the metadata,
which would presumably be accessed more frequently and more completely than
the data
 - this would also allow the metadata to be replicated more times or
checked more frequently, which may be desirable as a corruption of the
metadata could render a large amount of data difficult to access.  Btrfs,
for instance, has an option to replicate the metadata more times than the
data.  Furthermore, it is relatively inexpensive to have extra copies of
the metadata, since it is much smaller than the data.