librelist archives

« back to archive

exclude patterns

exclude patterns

From:
Andreas Olsson
Date:
2015-01-06 @ 12:42
Greetings

While I love most things about attic, its support for exclude patterns
does feel a bit limited.

For example, can't seem to find a way exclude '/home/<anyuser>/.cache'
without also excluding '/home/<anyuser>/foo/bar/.cache'.

Personally I kind of like the way some software makes the distinction
between '*' and '**', where the single '*' does stop for path separators
while the double '**' has no such limitation. Yet, right or wrong, I
doubt that it's desirable to change existing matching rules.

How do we feel about having a new --exclude-regexp option, allowing for
the use of regular expression to specify ones excludes? While my Python
skills are somewhat limited, I'd be willing to take a shot at
implementing such an option.

Of course, what I'm really hoping for is that I have underestimated the
existing exclude patterns, and that someone can set me straight in
regards to my initial example.

// Andreas

Re: [attic] exclude patterns

From:
SanskritFritz
Date:
2015-01-07 @ 08:40
On Tue, Jan 6, 2015 at 1:42 PM, Andreas Olsson <andreas@arrakis.se> wrote:

>
> How do we feel about having a new --exclude-regexp option, allowing for
> the use of regular expression to specify ones excludes? While my Python
> skills are somewhat limited, I'd be willing to take a shot at
> implementing such an option.
>

I'd vote for that. Clean and backwards compatible.

Re: [attic] exclude patterns

From:
SanskritFritz
Date:
2015-01-06 @ 13:16
On Tue, Jan 6, 2015 at 1:42 PM, Andreas Olsson <andreas@arrakis.se> wrote:

>
> For example, can't seem to find a way exclude '/home/<anyuser>/.cache'
> without also excluding '/home/<anyuser>/foo/bar/.cache'.
>

You can use the --exclude-caches option as a workaround. Of course you have
to mark those directories first. Yes, one by one or via script.

Re: [attic] exclude patterns

From:
Andreas Olsson
Date:
2015-01-06 @ 13:39
tis 2015-01-06 klockan 14:16 +0100 skrev SanskritFritz:
> On Tue, Jan 6, 2015 at 1:42 PM, Andreas Olsson <andreas@arrakis.se> wrote:
> >
> > For example, can't seem to find a way exclude '/home/<anyuser>/.cache'
> > without also excluding '/home/<anyuser>/foo/bar/.cache'.
> >
> 
> You can use the --exclude-caches option as a workaround. Of course you have
> to mark those directories first. Yes, one by one or via script.

I had a bit of a discussion with myself whatever I in the initial mail
should bring up the --exclude-caches option or not :-)

While I can see the CACHEDIR.TAG file being useful, I'm not convinced of
it being enough of a general solution. Especially not when there are
other users involved, as well as additional paths.

// Andreas

Re: [attic] exclude patterns

From:
SanskritFritz
Date:
2015-01-06 @ 13:51
On Tue, Jan 6, 2015 at 2:39 PM, Andreas Olsson <andreas@arrakis.se> wrote:

> While I can see the CACHEDIR.TAG file being useful, I'm not convinced of
> it being enough of a general solution. Especially not when there are
> other users involved, as well as additional paths.
>
> That is definitely true, I just proposed it as a workaround, because at
least that works. It would be probably better if the exclude patterns were
regular expressions.

Re: [attic] exclude patterns

From:
Stephen
Date:
2015-01-07 @ 13:49
I'd never actually tried using wildcards in the exclusion, however,
based on the problem you've described, I would love to see a regexp
solution implemented :)

On Tue, 06 Jan 2015, SanskritFritz wrote:

> On Tue, Jan 6, 2015 at 2:39 PM, Andreas Olsson <andreas@arrakis.se> wrote:
> 
> > While I can see the CACHEDIR.TAG file being useful, I'm not convinced of
> > it being enough of a general solution. Especially not when there are
> > other users involved, as well as additional paths.
> >
> > That is definitely true, I just proposed it as a workaround, because at
> least that works. It would be probably better if the exclude patterns were
> regular expressions.

Re: [attic] exclude patterns

From:
Jools Wills
Date:
2015-01-11 @ 16:18
See

https://github.com/jborg/attic/issues/97#issuecomment-69495788

for two different patches to add ** / * to the pattern matching.

This changes existing functionality, but makes it more logical as I
think most people would expect * to work as it does in the shell / with
extended globbing and not match path separators also.

It also makes the functionality the same as other backup tools like
rdiff-backup that people may be migrating from.

Best Regards

Jools

Re: [attic] exclude patterns

From:
Andreas Olsson
Date:
2015-01-11 @ 18:09
sön 2015-01-11 klockan 16:18 +0000 skrev Jools Wills:
> https://github.com/jborg/attic/issues/97#issuecomment-69495788
> 
> for two different patches to add ** / * to the pattern matching.
> 
> This changes existing functionality, but makes it more logical as I
> think most people would expect * to work as it does in the shell / with
> extended globbing and not match path separators also.
> 
> It also makes the functionality the same as other backup tools like
> rdiff-backup that people may be migrating from.

I'd be more than happy with a solution based on * vs. **.

If nothing else because I'm one of those former rdiff-backup users :)

// Andreas

Re: [attic] exclude patterns

From:
Petros Moisiadis
Date:
2015-01-06 @ 18:17
On 01/06/2015 02:42 PM, Andreas Olsson wrote:
> Greetings
>
> While I love most things about attic, its support for exclude patterns
> does feel a bit limited.
>
> For example, can't seem to find a way exclude '/home/<anyuser>/.cache'
> without also excluding '/home/<anyuser>/foo/bar/.cache'.
>
> Personally I kind of like the way some software makes the distinction
> between '*' and '**', where the single '*' does stop for path separators
> while the double '**' has no such limitation. Yet, right or wrong, I
> doubt that it's desirable to change existing matching rules.
>
> How do we feel about having a new --exclude-regexp option, allowing for
> the use of regular expression to specify ones excludes? While my Python
> skills are somewhat limited, I'd be willing to take a shot at
> implementing such an option.
>
> Of course, what I'm really hoping for is that I have underestimated the
> existing exclude patterns, and that someone can set me straight in
> regards to my initial example.
>
> // Andreas
>
>

Instead of having two different pattern formats, the current pattern
format could be extended a little to allow a '[]' character selector to
be repeated. '+' could be used for matching at least one occurrence of
the preceding '[]' character selector. For example, in your case you
would write: '/home/[!/]+/.cache'.

Also, I doubt there is currently anyone using a pattern that looks like
the above, so I guess nobody would be hurt.

Re: [attic] exclude patterns

From:
Andreas Olsson
Date:
2015-01-06 @ 18:31
tis 2015-01-06 klockan 20:17 +0200 skrev Petros Moisiadis:
> Instead of having two different pattern formats, the current pattern
> format could be extended a little to allow a '[]' character selector to
> be repeated. '+' could be used for matching at least one occurrence of
> the preceding '[]' character selector. For example, in your case you
> would write: '/home/[!/]+/.cache'.

...and suddenly we end up have something which is kind of like regexps,
except that the * asterisk still performs simple pattern matching. Not
like that will confuse anyone? :-)

// Andreas

Re: [attic] exclude patterns

From:
Petros Moisiadis
Date:
2015-01-07 @ 07:21
On 01/06/2015 08:31 PM, Andreas Olsson wrote:
> tis 2015-01-06 klockan 20:17 +0200 skrev Petros Moisiadis:
>> Instead of having two different pattern formats, the current pattern
>> format could be extended a little to allow a '[]' character selector to
>> be repeated. '+' could be used for matching at least one occurrence of
>> the preceding '[]' character selector. For example, in your case you
>> would write: '/home/[!/]+/.cache'.
> ...and suddenly we end up have something which is kind of like regexps,
> except that the * asterisk still performs simple pattern matching. Not
> like that will confuse anyone? :-)
>
> // Andreas

Current pattern format is not like regexps. It's very limited in
comparison to them. It's actually shell-style wildcards for unix
filename pattern matching with  */? matching all characters, even path
separators. I think my suggestion still keeps this simple enough, as it
adds just one more special character, '+', to repeat a preceding '[]'
character selector one or more times. That would be enough to catch your
(popular) case, and possibly some other (corner) cases.

Re: [attic] exclude patterns

From:
Date:
2015-01-07 @ 06:59
> ...and suddenly we end up have something which is kind of like regexps,
> except that the * asterisk still performs simple pattern matching. Not
> like that will confuse anyone? :-)

*if* it is extended, how about adopting a syntax that's already there, 
understood and widely used?

there's a few that come to mind:
* bash globbing
* tar (which looks very similar to bash)
* rsync (widely used and very versatile (the ** comes from there))
* duplicity (again not much difference to rsync)

I'd prefer one of these over anything new, except there's a good reason to 
invent something new.

Regexps are fine too though.

Best Regards
 Heiko

Re: [attic] exclude patterns

From:
Petros Moisiadis
Date:
2015-01-07 @ 07:44
On 01/07/2015 08:59 AM, heiko.helmle@horiba.com wrote:
> > ...and suddenly we end up have something which is kind of like regexps,
> > except that the * asterisk still performs simple pattern matching. Not
> > like that will confuse anyone? :-)
>
> *if* it is extended, how about adopting a syntax that's already there,
> understood and widely used?
>
> there's a few that come to mind:
> * bash globbing
bash globbing has the opposite limitation. '/home/*/.cache' matches only
one level. You can't match paths deeper recursively.

> * tar (which looks very similar to bash)
It does shell-style globbing and has an extra option to make wildcards
not match (or match) '/'. It does the job, but I don't think an extra
option is needed.

>
> * rsync (widely used and very versatile (the ** comes from there))
> * duplicity (again not much difference to rsync)
>
Rsync filtering rules and patterns are very powerful, but I guess they
are tricky to implement right. Also they would break current usage.
Maybe just borrowing the */** notation would be enough, but that would
break current usage too.

> I'd prefer one of these over anything new, except there's a good
> reason to invent something new.
>
> Regexps are fine too though.
>
> Best Regards
>  Heiko