librelist archives

« back to archive

Data integrity error due to index mismatch for key

Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 09:14
Hi all

I have been testing Attic's performance and behaviour before giving it the
green light for general usage in our organisation.

I have tested it with small repositories (under 20GB) and it works
perfectly. However, when using it on a much larger repository (over 7TB) I
have run into a "data integrity error" error, which, upon further
investigation, is apparently caused by an "index mismatch".

I have not found any bug reports that describe this, but figured it would be
best to message the mailing list before creating a ticket.


STEPS TO REPRODUCE
==================

In short, the steps to reproduce it are:

    1.   Create a new repository from a large data set (I used 7.2TB, across
4m files and 5m entities).
    2.   Change some files and run a second backup.

At this point, attempting a full restore fails with "Data integrity error".
Running `attic check` reports "Index mismatch for key". Running `attic check
--repair` does not fix the issue.

After removing the second backup, an `attic check` reports no errors.
However, upon creating another backup, the errors return exactly as above.


ENVIRONMENT DETAILS
===================

I am using the latest version of Attic from master branch of the Git
repository.

There is no real possibility of actual data corruption from the filesystem,
as the source data and Attic repository both reside on a RAID 6 array.

The system is Debian 8 (Jessie) 64-bit, if that helps at all.


DETAILED STEPS WITH OUTPUT
==========================

Here are the actual commands I ran, with their output.

1.  Initial backup
------------------

The initial backup was created with the following command - it was
successful. However, I did not capture all the output at that point.

# date; HOME=/mnt/lsi0/.attic time attic create --stats
/mnt/lsi0/backup/local/tardis/attic::tardis/$(date +%F) /etc /mnt/lsi0/git
/mnt/lsi0/shares /root; date 


2.  Extraction (partial)
------------------------

This worked fine - no errors.

# date; HOME=/mnt/lsi0/.attic time attic extract
/mnt/lsi0/backup/local/tardis/attic::tardis/2015-03-16 mnt/lsi0/git; date
Fri 20 Mar 18:21:08 GMT 2015
195.33user 11.05system 5:57.84elapsed 57%CPU (0avgtext+0avgdata
5519440maxresident)k
21718136inputs+12970368outputs (2major+1811562minor)pagefaults 0swaps
Fri 20 Mar 18:27:06 GMT 2015


3.  Second backup
-----------------

This second backup was to sync any changed files - there weren't many.

# date; HOME=/mnt/lsi0/.attic time attic create --stats
/mnt/lsi0/backup/local/tardis/attic::tardis/$(date +%F) /etc /mnt/lsi0/git
/mnt/lsi0/shares /root; date 
Sat 21 Mar 09:57:29 GMT 2015
----------------------------------------------------------------------------
--
Archive name: tardis/2015-03-21
Archive fingerprint:
1c900f1490f7fef3b2ce78eca2909c55e96fcc5de6293282cdf28f9fea8a2ca3
Start time: Sat Mar 21 09:57:29 2015
End time: Sat Mar 21 10:49:32 2015
Duration: 52 minutes 2.68 seconds
Number of files: 4384741

                       Original size      Compressed size    Deduplicated
size
This archive:                7.81 TB              6.97 TB            114.22
MB
All archives:               15.61 TB             13.94 TB              6.10
TB
----------------------------------------------------------------------------
--
1017.30user 327.74system 52:06.24elapsed 43%CPU (0avgtext+0avgdata
16533176maxresident)k
178235032inputs+407023640outputs (175major+19663106minor)pagefaults 0swaps
Sat 21 Mar 10:49:36 GMT 2015


4.  List backup sets
--------------------

Both backup sets reported as existing and no problems seen.

# HOME=/mnt/lsi0/.attic time attic list /mnt/lsi0/backup/local/tardis/attic
tardis/2015-03-16                    Fri Mar 20 16:43:56 2015
tardis/2015-03-21                    Sat Mar 21 10:48:42 2015
0.20user 2.77system 0:05.74elapsed 51%CPU (0avgtext+0avgdata
5281780maxresident)k
5371536inputs+0outputs (36major+1322226minor)pagefaults 0swaps


5.  Extraction (full)
---------------------

(I should note that the current working directory was emptied between
restore commands.)

# date; HOME=/mnt/lsi0/.attic time attic extract
/mnt/lsi0/backup/local/tardis/attic::tardis/2015-03-21; date
Mon 23 Mar 13:58:12 GMT 2015
attic: Error: Data integrity error
Command exited with non-zero status 1
2962.62user 380.72system 1:12:41elapsed 76%CPU (0avgtext+0avgdata
5478780maxresident)k
358242808inputs+753630256outputs (41major+1632191minor)pagefaults 0swaps
Mon 23 Mar 15:10:54 GMT 2015


6.  Repository check
--------------------

Ouch!

# date; HOME=/mnt/lsi0/.attic time attic check
/mnt/lsi0/backup/local/tardis/attic
Mon 23 Mar 17:42:37 GMT 2015
Starting repository check...
Index mismatch for key
b'\xa3*;y\xe2\x9c\xabw\x9f-f\x15\x8cG\xf1\xc3\xe6\xa8\x88X
\x81z\xf5\x8e\xfb-\x12\x1b\xda\xafL'. (284324, 3431159) != (-1, -1)
Index mismatch for key
b'\t\x9dsJ\xb4O\xb8\x1c`\xb6\xac\xe4\xb2\x91\xf4\xf0$(\x01\xa8\xca\xc2Mf\x97
\xebB\xa1\x03G\xcd\x04'. (896973, 2792174) != (-1, -1)
Index mismatch for key
b'\xd6\xaf\xd3\nI\xea\xa3Ju\xf11\xaaumB\x0e[E(\xeb\x95\xa8e\xa0<\x02\x8c\xbb
f\xabH\x9d'. (64814, 3158785) != (60718, 3158785)
Index mismatch for key
b'\xd6\x1d"\xfbf&\xb7\x9eg#\xe2\xd7\x1at3\xd2I\xf2\xcd\x80\x1b\'\x89\xd6O\xe
3\'/./&\xb8'. (932547, 1968998) != (928451, 1968998)
Index mismatch for key
b'\xd6Ae3\x82d\xd7?\x9a\xfe\xe5O\xe9\x9f\x15v\x19\xfc\xfcj\x8b\xf1\xe1\xee\x
d2QD\xd2)I\xbc\xaa'. (151909, 424462) != (147813, 424462)
Index mismatch for key
b'\xd6\xdd\xc1\xe3Y\x9f+\xce\xa1O\xe9{\x116\x0ex\x84\xf3\xa3:\x9a\xbb\xf4"\x
d6\x0e\x13\x1c\xe3\x8b\x8c\x16'. (31049, 4234035) != (26953, 4234035)
Index mismatch for key
b'\xa2Lye\x18\x0cE\x9c@\xa9=\xb8j\xfb\xcc\xd5H+A\xd7(\xda|j\xc1\x1e\x89\xb1\
xd4\xa6L/'. (50865, 102435) != (-1, -1)
attic: Exiting with failure status due to previous errors
Command exited with non-zero status 1
9083.93user 2191.83system 6:05:05elapsed 51%CPU (0avgtext+0avgdata
13394448maxresident)k
11931422072inputs+0outputs (30major+4071002minor)pagefaults 0swaps


7.  Repository repair
---------------------

I tried running `attic check --repair` to see if it helped at all - it
didn't. I still have all the original data, so running potentially-risky
operations is not a problem whilst trying to find the cause of the problem.

Note that the first backup appears to be fine in this check; the error seems
to be encountered in the second backup.

# date; HOME=/mnt/lsi0/.attic time attic check --repair
/mnt/lsi0/backup/local/tardis/attic
Tue 24 Mar 12:49:21 GMT 2015
attic: Warning: 'check --repair' is an experimental feature that might
result
in data loss.
  
Type "Yes I am sure" if you understand this and want to continue.
  
Do you want to continue? Yes I am sure
Starting repository check...
Repository check complete, no problems found.
Starting archive consistency check...
Analyzing archive tardis/2015-03-16 (1/2)
Analyzing archive tardis/2015-03-21 (2/2)
attic: Error: Data integrity error
Command exited with non-zero status 1
10177.83user 2157.50system 6:21:42elapsed 53%CPU (0avgtext+0avgdata
13378152maxresident)k
11953745528inputs+19954344outputs (35major+10261162minor)pagefaults 0swaps


8.  Remove second backup
------------------------

No errors reported...

# date; HOME=/mnt/lsi0/.attic time attic delete
/mnt/lsi0/backup/local/tardis/attic::tardis/2015-03-21; date
Thu 26 Mar 10:11:23 GMT 2015
318.83user 30.28system 9:19.93elapsed 62%CPU (0avgtext+0avgdata
11369940maxresident)k
41607640inputs+42205112outputs (34major+3335066minor)pagefaults 0swaps
Thu 26 Mar 10:20:43 GMT 2015


9.  List backup sets
--------------------

...and yup, it's gone.

# date; HOME=/mnt/lsi0/.attic time attic list
/mnt/lsi0/backup/local/tardis/attic
Thu 26 Mar 16:22:00 GMT 2015
tardis/2015-03-16                    Fri Mar 20 16:43:56 2015
0.18user 1.74system 0:02.37elapsed 81%CPU (0avgtext+0avgdata
5269672maxresident)k
10984inputs+0outputs (27major+1318418minor)pagefaults 0swaps


10. Repository check
--------------------

No problems this time, confirming that the problem occurred in the second
backup set.

# date; HOME=/mnt/lsi0/.attic time attic check
/mnt/lsi0/backup/local/tardis/attic
Thu 26 Mar 16:22:22 GMT 2015
Starting repository check...
Repository check complete, no problems found.
Starting archive consistency check...
Analyzing archive tardis/2015-03-16 (1/1)
Archive consistency check complete, no problems found.
9785.26user 2155.90system 6:16:57elapsed 52%CPU (0avgtext+0avgdata
13378892maxresident)k
11941453632inputs+8outputs (0major+8857085minor)pagefaults 0swaps


11. Second backup (again)
-------------------------

Once more, no problems reported. Not much difference in the changed files in
between the two occasions this was done.

# date; HOME=/mnt/lsi0/.attic time attic create --stats
/mnt/lsi0/backup/local/tardis/attic::tardis/$(date +%F) /etc /mnt/lsi0/git
/mnt/lsi0/shares /root; date
Fri 27 Mar 12:12:30 GMT 2015
----------------------------------------------------------------------------
--
Archive name: tardis/2015-03-27
Archive fingerprint:
99b56abb3413a7dd88a08796994cbb38cbd242ce08d6993785211caac59bb918
Start time: Fri Mar 27 12:12:31 2015
End time: Fri Mar 27 13:03:57 2015
Duration: 51 minutes 26.51 seconds
Number of files: 4384741
  
                       Original size      Compressed size    Deduplicated
size
This archive:                7.81 TB              6.97 TB            114.95
MB
All archives:               15.61 TB             13.94 TB              6.10
TB
----------------------------------------------------------------------------
--
982.53user 330.76system 51:30.23elapsed 42%CPU (0avgtext+0avgdata
16568576maxresident)k
234467296inputs+407040208outputs (349major+19967580minor)pagefaults 0swaps
Fri 27 Mar 13:04:01 GMT 2015


12. Repository check
--------------------

...and the error's back. Interestingly, last time there were seven
mismatches on the backup set, and this time there is just one. Still,
something's going wrong!

# date; HOME=/mnt/lsi0/.attic time attic check
/mnt/lsi0/backup/local/tardis/attic
Fri 27 Mar 13:30:13 GMT 2015
Starting repository check...
Index mismatch for key
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'. (-4097, 0) !=
(1153907, 3864090)
attic: Exiting with failure status due to previous errors
Command exited with non-zero status 1
8989.32user 2143.78system 6:00:53elapsed 51%CPU (0avgtext+0avgdata
13395500maxresident)k
11931614400inputs+0outputs (32major+4071397minor)pagefaults 0swaps


That's all of the useful information I've got at present. Anything else I
should try? Any details I should investigate? I checked the branch that
Thomas has been merging things into, and there's nothing that would appear
to address this, so I have stuck with the official master branch.

Cheers

Dan


Re: [attic] Data integrity error due to index mismatch for key

From:
Thomas Waldmann
Date:
2015-04-01 @ 15:51
>   1.   Create a new repository from a large data set (I used 7.2TB,
across 4m files and 5m entities).

Hmm, testing with this data set must be really slow for you.

To ease the pain a little (in case you do not want to just use merge-all :D
), I could offer a trivial 1-line patch that modifies the compression in
attic 0.14 to be either zlib level 0 (== no compression, fast) or 1 (low
compression, quite fast). Would only affect backup, restore does
auto-detect the compression and uncompresses correctly no matter what level
it was.

It won't be as fast as merge-all branch (which has highspeed parallelized
lz4 compression), but I think it could maybe double the speed of the
initial backup.

5.  Extraction (full)
> ---------------------
>
> # date; HOME=/mnt/lsi0/.attic time attic extract
> /mnt/lsi0/backup/local/tardis/attic::tardis/2015-03-21; date
> Mon 23 Mar 13:58:12 GMT 2015
> attic: Error: Data integrity error
> Command exited with non-zero status 1
>

One would need a lot more info here. Could provide a minimal patch that
gives a full traceback (if you don't want to use ...).


> 6.  Repository check
> --------------------
> # date; HOME=/mnt/lsi0/.attic time attic check
> /mnt/lsi0/backup/local/tardis/attic
> Mon 23 Mar 17:42:37 GMT 2015
> Starting repository check...
> Index mismatch for key
> b'\xa3*;y\xe2\x9c\xabw\x9f-f\x15\x8cG\xf1\xc3\xe6\xa8\x88X
> \x81z\xf5\x8e\xfb-\x12\x1b\xda\xafL'. (284324, 3431159) != (-1, -1)
>

Strange. :|

12. Repository check
> --------------------
> # date; HOME=/mnt/lsi0/.attic time attic check
> /mnt/lsi0/backup/local/tardis/attic
> Fri 27 Mar 13:30:13 GMT 2015
> Starting repository check...
> Index mismatch for key
>
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
> 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'. (-4097, 0) !=
> (1153907, 3864090)
> attic: Exiting with failure status due to previous errors
>

That one is very strange. The all-zero key is special and means the
repository's manifest.

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 16:32
Thomas, if I use the merge-all branch, will that give additional 
trace-back information that will help you?

 

If I use it, will it be compatible with my existing repository? Or will I 
have to do a new backup? (I will do a new backup with it in any case, to 
compare performance, but I would like to use it against the repository 
that I am having trouble with.)

 

I will be able to test it is a day or two once the current extract attempt
has finished or failed. Due to the size of the data set, almost everything
I do with it involves starting a process (in screen!) and then coming back
to check on it every day or so!

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Thomas
Waldmann
Sent: 01 April 2015 16:52
To: attic@librelist.com
Subject: Re: [attic] Data integrity error due to index mismatch for key

 

>   1.   Create a new repository from a large data set (I used 7.2TB, 
across 4m files and 5m entities).

Hmm, testing with this data set must be really slow for you.

To ease the pain a little (in case you do not want to just use merge-all 
:D ), I could offer a trivial 1-line patch that modifies the compression 
in attic 0.14 to be either zlib level 0 (== no compression, fast) or 1 
(low compression, quite fast). Would only affect backup, restore does 
auto-detect the compression and uncompresses correctly no matter what 
level it was.

It won't be as fast as merge-all branch (which has highspeed parallelized 
lz4 compression), but I think it could maybe double the speed of the 
initial backup.

5.  Extraction (full)
---------------------

# date; HOME=/mnt/lsi0/.attic time attic extract
/mnt/lsi0/backup/local/tardis/attic::tardis/2015-03-21; date
Mon 23 Mar 13:58:12 GMT 2015
attic: Error: Data integrity error
Command exited with non-zero status 1

 

One would need a lot more info here. Could provide a minimal patch that 
gives a full traceback (if you don't want to use ...).
 

6.  Repository check
--------------------
# date; HOME=/mnt/lsi0/.attic time attic check
/mnt/lsi0/backup/local/tardis/attic
Mon 23 Mar 17:42:37 GMT 2015
Starting repository check...
Index mismatch for key
b'\xa3*;y\xe2\x9c\xabw\x9f-f\x15\x8cG\xf1\xc3\xe6\xa8\x88X
\x81z\xf5\x8e\xfb-\x12\x1b\xda\xafL'. (284324, 3431159) != (-1, -1)

 

Strange. :|

12. Repository check
--------------------
# date; HOME=/mnt/lsi0/.attic time attic check
/mnt/lsi0/backup/local/tardis/attic
Fri 27 Mar 13:30:13 GMT 2015
Starting repository check...
Index mismatch for key
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'. (-4097, 0) !=
(1153907, 3864090)
attic: Exiting with failure status due to previous errors

 

That one is very strange. The all-zero key is special and means the 
repository's manifest.
 

Re: [attic] Data integrity error due to index mismatch for key

From:
Thomas Waldmann
Date:
2015-04-01 @ 21:17
> Thomas, if I use the merge-all branch, will that give additional
trace-back information that will help you?

At some places yes (esp. if using a remote repo the remote side raises
exceptions), at others: no.



> If I use it, will it be compatible with my existing repository?
>

It should be able to read "legacy" attic repos, but as soon as you write to
them, "legacy" attic won't be able any more to work with them (merge-all
attic uses a new more flexible chunk header format by default).

Here is the patch against master for showing tracebacks:


https://github.com/ThomasWaldmann/attic/commit/14d91a25fcfc68a16cfb29e8eb0ab203d4d236c0

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Christensen
Date:
2015-04-01 @ 17:03
Dan,

I wonder if it's worth trying to trigger the problem with a smaller data
set, e.g. 1, 2 or 4TB.  If you can reproduce it with a smaller data set,
that will reduce the time needed for debugging and testing proposed
fixes.  Also, since the merge-all branch is much faster (especially with
appropriate choices of compressor) that will help with the debug cycle
too, if the bug is still triggered there.

Dan

Dan Williams <dan@dotfive.co.uk> writes:

> Thomas, if I use the merge-all branch, will that give additional
> trace-back information that will help you?
>
> If I use it, will it be compatible with my existing repository? Or
> will I have to do a new backup? (I will do a new backup with it in any
> case, to compare performance, but I would like to use it against the
> repository that I am having trouble with.)
>
> I will be able to test it is a day or two once the current extract
> attempt has finished or failed. Due to the size of the data set,
> almost everything I do with it involves starting a process (in
> screen!) and then coming back to check on it every day or so!

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 17:32
Very true. Do we have any theories as to why the issue is occurring, which
could lead to a guess as to the data set size to try? Otherwise, it may take
longer hunting in the dark for the trigger size... even by bisecting until
found :o)

Worth a try though...


 >> -----Original Message-----
 >> From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Dan
 >> Christensen
 >> Sent: 01 April 2015 18:04
 >> To: attic@librelist.com
 >> Subject: Re: [attic] Data integrity error due to index mismatch for key
 >> 
 >> Dan,
 >> 
 >> I wonder if it's worth trying to trigger the problem with a smaller data
 >> set, e.g. 1, 2 or 4TB.  If you can reproduce it with a smaller data set,
 >> that will reduce the time needed for debugging and testing proposed
 >> fixes.  Also, since the merge-all branch is much faster (especially with
 >> appropriate choices of compressor) that will help with the debug cycle
 >> too, if the bug is still triggered there.
 >> 
 >> Dan
 >> 

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-06 @ 10:53
An update on this: the extraction of backup set #1 succeeded with no errors.
Trying to extract backup set #2 again still fails, and running a repository
check also still fails. So it is definitely set #2 that is causing the
problem.

Unless there are any other ideas, I think the next thing to try will either
be a) Thomas's branch, or b) trying different repository sizes. Thomas's
branch will apparently give more helpful errors, but will make the
repository incompatible with official Attic. Whereas, trying different
repository sizes may take quite some time...

I am not sure which direction to go for?


 >> -----Original Message-----
 >> From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Dan
 >> Williams
 >> Sent: 01 April 2015 18:33
 >> To: attic@librelist.com
 >> Subject: RE: [attic] Data integrity error due to index mismatch for key
 >> 
 >> Very true. Do we have any theories as to why the issue is occurring,
 >> which
 >> could lead to a guess as to the data set size to try? Otherwise, it may
 >> take
 >> longer hunting in the dark for the trigger size... even by bisecting
 >> until
 >> found :o)
 >> 
 >> Worth a try though...
 >> 
 >> 
 >>  >> -----Original Message-----
 >>  >> From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of
 >> Dan
 >>  >> Christensen
 >>  >> Sent: 01 April 2015 18:04
 >>  >> To: attic@librelist.com
 >>  >> Subject: Re: [attic] Data integrity error due to index mismatch for
 >> key
 >>  >>
 >>  >> Dan,
 >>  >>
 >>  >> I wonder if it's worth trying to trigger the problem with a smaller
 >> data
 >>  >> set, e.g. 1, 2 or 4TB.  If you can reproduce it with a smaller data
 >> set,
 >>  >> that will reduce the time needed for debugging and testing proposed
 >>  >> fixes.  Also, since the merge-all branch is much faster (especially
 >> with
 >>  >> appropriate choices of compressor) that will help with the debug
 >> cycle
 >>  >> too, if the bug is still triggered there.
 >>  >>
 >>  >> Dan
 >>  >>

Re: [attic] Data integrity error due to index mismatch for key

From:
Yuri D'Elia
Date:
2015-04-06 @ 11:45
On 04/06/2015 12:53 PM, Dan Williams wrote:
> Unless there are any other ideas, I think the next thing to try will either
> be a) Thomas's branch, or b) trying different repository sizes. Thomas's
> branch will apparently give more helpful errors, but will make the
> repository incompatible with official Attic. Whereas, trying different
> repository sizes may take quite some time...
> 
> I am not sure which direction to go for?

I'm following this on and off unfortunately, but I'd try to reduce the
repository size and keep changes to a minimum.

I don't know your setup, but I would just write a quick script to
perform both steps. If the results fail, move away 10% of the space into
a different tree, and try again until the backup/restore succeeds.

I would have some servers where I could run this, but unfortunately they
don't have the space necessary.

Sv: [attic] Data integrity error due to index mismatch for key

From:
Petter Gunnerud
Date:
2015-04-01 @ 20:00
I'm also curious if this is a total repo size issue or a onerun size 
issue.A way to test this with your 7TB dataset could be:1) figure out the 
order folders on the initial failed backup were scanned2) move away all 
data in /dataset/* except for /dataset/firstfolder
3) backup /dataset4) do minor changes to /dataset/firstfolder
5) second backup /dataset6) check repo7) move 
/tempdatasetlocation/secondfolder to /dataset/secondfolder8) backup 
/dataset9) do minor changes to /dataset/secondfolder10) second backup 
/dataset11) check repo12) move /tempdatasetlocation/thirdfolder to 
/dataset/thirdfolder....

My wild guess on what's causing the issue is one (or more) of the special 
files. What are they? A test that would be interesting is to do the second
backup to currently failed repo excluding the special files.

(A question that came to my mind now was: What is the expected behavior if
attic is set to backup and restore /dev/urandom ?)

PG


    

  Fra: Dan Williams <dan@dotfive.co.uk>
 Til: attic@librelist.com 
 Sendt: Onsdag, 1. april 2015 19.32
 Emne: RE: [attic] Data integrity error due to index mismatch for key
   
Very true. Do we have any theories as to why the issue is occurring, which
could lead to a guess as to the data set size to try? Otherwise, it may take
longer hunting in the dark for the trigger size... even by bisecting until
found :o)

Worth a try though...



   

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 20:36
It may well be worth my building up the size incrementally. It could also
turn out to be rather time-consuming. I'll add it to the list :o)

 

I don't believe there are any special files in this data set, beyond sparse
files. I am only backing up data directories - it's not a full system backup
or anything. But doing the steps you mention could help to establish that.

 

Regarding your /dev/urandom question, I am under the impression that Attic
recognises dev-type files, i.e. actual Linux devices, although I haven't
personally tested it. It would seem odd if no-one had attempted to backup an
entire filesystem whilst testing during development? But I don't know what
it *does* with the files - I assume/expect that it does not, for instance,
read the contents of the device (based on my observation of behaviour so
far).But does it ignore it? Does it restore it? Hmmmm. I can test tomorrow,
but otherwise I'm sure Thomas knows. :o)

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 21:01
To: attic@librelist.com
Subject: Sv: [attic] Data integrity error due to index mismatch for key

 

I'm also curious if this is a total repo size issue or a onerun size issue.

A way to test this with your 7TB dataset could be:

1) figure out the order folders on the initial failed backup were scanned

2) move away all data in /dataset/* except for /dataset/firstfolder



3) backup /dataset

4) do minor changes to /dataset/firstfolder



5) second backup /dataset

6) check repo

7) move /tempdatasetlocation/secondfolder to /dataset/secondfolder

8) backup /dataset

9) do minor changes to /dataset/secondfolder

10) second back up /dataset

11) check repo

12) move /tempdatasetlocation/thirdfolder to /dataset/thirdfolder

....

 

 

My wild guess on what's causing the issue is one (or more) of the special
files. What are they? A test that would be interesting is to do the second
backup to currently failed repo excluding the special files.

 

(A question that came to my mind now was: What is the expected behavior if
attic is set to backup and restore /dev/urandom ?)

 

PG

 





 

  _____  

Fra: Dan Williams <dan@dotfive.co.uk>
Til: attic@librelist.com 
Sendt: Onsdag, 1. april 2015 19.32
Emne: RE: [attic] Data integrity error due to index mismatch for key


Very true. Do we have any theories as to why the issue is occurring, which
could lead to a guess as to the data set size to try? Otherwise, it may take
longer hunting in the dark for the trigger size... even by bisecting until
found :o)

Worth a try though...



Re: [attic] Data integrity error due to index mismatch for key

From:
Thomas Waldmann
Date:
2015-04-01 @ 20:48
> Regarding your /dev/urandom question, I am under the impression that
Attic recognises dev-type files, i.e. actual Linux devices,

Sure it does. The special files are metadata-only, there is no content that
is backed up (especially: it does not try to actually read from e.g.
/dev/random).

> It would seem odd if no-one had attempted to backup an entire filesystem
whilst testing during development?

I did full-system backups already, didn't see anything special regarding
"special files".

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 17:27
Note: This issue is now on the Github issue list for "original" Attic as
issue #264.

https://github.com/jborg/attic/issues/264



 >> -----Original Message-----
 >> From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Dan
 >> Williams
 >> Sent: 01 April 2015 10:14
 >> To: attic@librelist.com
 >> Subject: [attic] Data integrity error due to index mismatch for key
 >> 

Sv: [attic] Data integrity error due to index mismatch for key

From:
Petter Gunnerud
Date:
2015-04-01 @ 10:14
Hi Dan
Are you able to restore from initial backup? Does it matter if you do the 
restore from initial backup before or after deleting the second backup?

I'm thinking, if initial backup has failed in a way that is not discovered
by the backup processing, nor the check processing, looking for issues in 
the second backup is going down the wrong path.
PG






   

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 10:39
Hi Peter

 

That's a very good point. My workflow was backup -> another backup (to sync)
-> restore -> diff, because this would allow me to verify integrity. So when
I hit the problem with the second backup, I stopped, as there was no way for
me to verify integrity of the whole set any more.

 

I did try extracting a subset of the first backup, and that was successful.
But if there is a problem area, that may have missed it.

 

I will now start an extract on the entirety of the first backup, just to see
if it completes successfully. That will however take a couple of days!

 

I will then create the second backup again, and once more extract the first
backup, to see if that has an error, as you have suggested.

 

Cheers

 

Dan

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 11:15
To: attic@librelist.com
Subject: Sv: [attic] Data integrity error due to index mismatch for key

 

Hi Dan

Are you able to restore from initial backup? Does it matter if you do the
restore from initial backup before or after deleting the second backup?

 

I'm thinking, if initial backup has failed in a way that is not discovered
by the backup processing, nor the check processing, looking for issues in
the second backup is going down the wrong path.

 

PG

 

 

 

 

Sv: [attic] Data integrity error due to index mismatch for key

From:
Petter Gunnerud
Date:
2015-04-01 @ 14:50
Another way to test if the subset extracted from initial backup covered 
the problem area could be to extract the same subset from second backup. 
Such test would not cover if there are multiple problem areas thou.
If I remember right from your first mail, you got the restore error after 
about two hours. If your new test runs for more than two hours, that could
be the first sign that initial backup is ok.
May I ask what kind of data you're testing on? My understanding is that 
those 7,2TB includes sparse files. Are those 7,2TB including all unused 
parts of the sparse files or just their size on disk?

PG
      Fra: Dan Williams <dan@dotfive.co.uk>
 Til: attic@librelist.com 
 Sendt: Onsdag, 1. april 2015 12.39
 Emne: RE: [attic] Data integrity error due to index mismatch for key
   
#yiv0208225949 #yiv0208225949 -- _filtered #yiv0208225949 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered 
#yiv0208225949 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} 
_filtered #yiv0208225949 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 
4;} _filtered #yiv0208225949 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4
2 4;}#yiv0208225949 #yiv0208225949 p.yiv0208225949MsoNormal, 
#yiv0208225949 li.yiv0208225949MsoNormal, #yiv0208225949 
div.yiv0208225949MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv0208225949 a:link,
#yiv0208225949 span.yiv0208225949MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv0208225949 a:visited, 
#yiv0208225949 span.yiv0208225949MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv0208225949 
span.yiv0208225949EmailStyle17 {color:#1F497D;}#yiv0208225949 
.yiv0208225949MsoChpDefault {font-size:10.0pt;} _filtered #yiv0208225949 
{margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv0208225949 
div.yiv0208225949WordSection1 {}#yiv0208225949 Hi Peter  That’s a very 
good point. My workflow was backup -> another backup (to sync) -> restore 
-> diff, because this would allow me to verify integrity. So when I hit 
the problem with the second backup, I stopped, as there was no way for me 
to verify integrity of the whole set any more.  I did try extracting a 
subset of the first backup, and that was successful. But if there is a 
problem area, that may have missed it.  I will now start an extract on the
entirety of the first backup, just to see if it completes successfully. 
That will however take a couple of days!  I will then create the second 
backup again, and once more extract the first backup, to see if that has 
an error, as you have suggested.  Cheers  Dan 
  

Re: [attic] Data integrity error due to index mismatch for key

From:
Dan Williams
Date:
2015-04-01 @ 15:10
Hi Peter

 

That’s a good question.

 

The 7.2TB backup is a mish-mash of different types of data: Git 
repositories, large media files, documents, software, all kinds of stuff. 
I decided not to run any tests specifically against the array that holds 
the virtual machines, because if Attic doesn’t handle sparse files it is 
not suitable for me to use for that purpose. However, now you’ve asked the
question, I realise that there are some backup images in the filesystem I 
am testing, and they are sparse. Not many, but some, certainly.

 

The 7.2TB therefore represents the actual size used on disk of all the 
files. A quick poke around suggests that it may be around 7.8TB if the 
unused portions of the sparse files are included. I would have to spend a 
bit longer checking to explicitly verify that.

 

I am currently running an extract on the first backup set, without 
deleting the second. If all goes well this would show that it is just the 
second backup set causing the problem. If it fails, I will delete the 
second backup set and try again. So far it has been running for over three
hours – no problems yet, but the problem occurred at 1h12m the first time,
and 6h00m the second time, so it’s still too early to tell yet. I am 
unsure why there were seven mismatched keys the first time and only one 
the second.

 

Dan

 

 

 

From: attic@librelist.com [mailto:attic@librelist.com] On Behalf Of Petter
Gunnerud
Sent: 01 April 2015 15:51
To: attic@librelist.com
Subject: Sv: [attic] Data integrity error due to index mismatch for key

 

Another way to test if the subset extracted from initial backup covered 
the problem area could be to extract the same subset from second backup. 
Such test would not cover if there are multiple problem areas thou.





If I remember right from your first mail, you got the restore error after 
about two hours. If your new test runs for more than two hours, that could
be the first sign that initial backup is ok.





May I ask what kind of data you're testing on? My understanding is that 
those 7,2TB includes sparse files. Are those 7,2TB including all unused 
parts of the sparse files or just their size on disk?



 

PG

 

  _____  

Fra: Dan Williams <dan@dotfive.co.uk>
Til: attic@ librelist.com 
Sendt: Onsdag, 1. april 2015 12.39
Emne: RE: [attic] Data integrity error due to index mismatch for key

 

Hi Peter

 

That’s a very good point. My workflow was backup -> another backup (to 
sync) -> restore -> diff, because this would allow me to verify integrity.
So when I hit the problem with the second backup, I stopped, as there was 
no way for me to verify integrity of the whole set any more.

 

I did try extracting a subset of the first backup, and that was 
successful. But if there is a problem area, that may have missed it.

 

I will now start an extract on the entirety of the first backup, just to 
see if it completes successfully. That will however take a couple of days!

 

I will then create the second backup again, and once more extract the 
first backup, to see if that has an error, as you have suggested.

 

Cheers

 

Dan