librelist archives

« back to archive

pruning

pruning

From:
Dan Christensen
Date:
2014-02-02 @ 01:29
I was just looking at the prune_split code to understand how it works,
and I have a question:

def prune_split(archives, pattern, n, skip=[]):
    items = {}
    keep = []
    for a in archives:
        key = to_localtime(a.ts).strftime(pattern)
        items.setdefault(key, [])
        items[key].append(a)
    for key, values in sorted(items.items(), reverse=True):
        if n and values[0] not in skip:
            values.sort(key=attrgetter('ts'), reverse=True)
            keep.append(values[0])
            n -= 1
    return keep

I was wondering if the four lines inside the last loop should instead
read:

        if n:
            values.sort(key=attrgetter('ts'), reverse=True)
            if values[0] not in skip:
                 keep.append(values[0])
                 n -= 1

(In fact, it would be slightly more efficient to use:

        values.sort(key=attrgetter('ts'), reverse=True)
        if values[0] not in skip:
             keep.append(values[0])
             n -= 1
             if n == 0: break
)

Also, based on my understanding, maybe the output of prune -h could
contain a bit more detail.  For example:

  Prune repository archives according to specified rules.  That is,
  delete all archives except those selected by the rules.  As an
  example, "-d 7" means to keep the latest backup on each day for 7
  days.  Days without backups do not count towards the total.  The rules
  are applied from hourly to yearly, and backups selected by previous
  rules do not count towards those of later rules.  Dates and times are
  interpreted in the local timezone, and weeks go from Monday to Sunday.
  If a prefix is set with -p, then only archives that start with the
  prefix are considered for deletion and only those archives count
  towards the totals specified by the rules.

On second thought, that may be a bit long for the default -h output.
Maybe each command should have a brief -h output, which says to 
specify -h -v for more details, and the longer description could
be what's shown for each command under the title Description at 

  https://pythonhosted.org/Attic/usage.html

?

Dan

Re: pruning

From:
Dan Christensen
Date:
2014-02-02 @ 02:30
Maybe this is even cleaner?

def prune_split(archives, pattern, n, skip=[]):
    done = set()
    keep = []
    for a in sorted(archives, key=attrgetter('ts'), reverse=True):
        period = to_localtime(a.ts).strftime(pattern)
        if period not in done:
            done.add(period)
            if a not in skip:
                keep.append(a)
                if len(keep) == n: break
    return keep

If this looks good, I can try to come up with some unit tests and a
proper pull request, but I may need a bit of help.

Dan

Re: [attic] Re: pruning

From:
Jonas Borgström
Date:
2014-02-02 @ 13:02
On 2014-02-02 03:30, Dan Christensen wrote:
> Maybe this is even cleaner?
> 
> def prune_split(archives, pattern, n, skip=[]):
>     done = set()
>     keep = []
>     for a in sorted(archives, key=attrgetter('ts'), reverse=True):
>         period = to_localtime(a.ts).strftime(pattern)
>         if period not in done:
>             done.add(period)
>             if a not in skip:
>                 keep.append(a)
>                 if len(keep) == n: break
>     return keep
> 
> If this looks good, I can try to come up with some unit tests and a
> proper pull request, but I may need a bit of help.

Yes, this definitely looks a lot cleaner. I think the reason the current
code works is because the archives list is already sorted by archiver.py
so it doesn't matter that values is sorted too late.

Anyway I think it looks good, let me know if you need help with writing
the unit tests.

/ Jonas

Re: [attic] Re: pruning

From:
Jonas Borgström
Date:
2014-02-02 @ 14:08
On 2014-02-02 14:54, Dan Christensen wrote:
> Jonas Borgström <jonas@borgstrom.se> writes:
> 
>> Yes, this definitely looks a lot cleaner. I think the reason the
>> current code works is because the archives list is already sorted by
>> archiver.py so it doesn't matter that values is sorted too late.
> 
> I suspected that must be the case.  We could change prune_split to
> assume that the archives are sorted.  But maybe playing it safe is best.
> Here's my next iteration, which handles n == 0 correctly (which
> shouldn't matter, but is probably good practice) and avoids using a set:

Agreed, the code shouldn't assume the archive list is always sorted.
But please write "if n == 0: return keep" as two lines. I think It's
more readable that way.

> 
> def prune_split(archives, pattern, n, skip=[]):
>     last = None
>     keep = []
>     if n == 0: return keep
>     for a in sorted(archives, key=attrgetter('ts'), reverse=True):
>         period = to_localtime(a.ts).strftime(pattern)
>         if period != last:
>             last = period
>             if a not in skip:
>                 keep.append(a)
>                 if len(keep) == n: break
>     return keep
> 
> One question:  Both the original code and this version treat negative
> numbers as infinity.  Maybe this should be documented as a feature?
> E.g. "-m -1" means to keep one per month forever?

Heh, I didn't even know about that :) But if it's useful it should be
documented.

> Regarding tests, I was thinking of using a fake archive class like
> 
> class dummy_archive(object):
>     def __init__(self, ts):
>         self.ts = ts
>     def __repr__(self):
>         return repr(self.ts)
> 
> Would that be reasonable?

Yeah, just remember to use CamelCase when naming classes. Perhaps
MockArchive?

One more thing, librelist only accepts TO headers and not CC, so your
original message probably didn't end up in the archive.

/ Jonas

Re: [attic] pruning

From:
Dan Christensen
Date:
2014-02-03 @ 21:47
I was wondering where to mention that negative numbers mean infinity.
One way to do it would be:

        prune_epilog = '''Specifying a negative number of archives to
        keep means that there is no limit.'''
        subparser = subparsers.add_parser('prune', parents=[common_parser],
                                          description=self.do_prune.__doc__,
                                          epilog=prune_epilog)

which produces

    usage: archiver.py prune [-h] [-v] [-H HOURLY] [-d DAILY] [-w WEEKLY]
                             [-m MONTHLY] [-y YEARLY] [-p PREFIX]
                             REPOSITORY
    
    Prune repository archives according to specified rules
    
    positional arguments:
      REPOSITORY            repository to prune
    
    optional arguments:
      -h, --help            show this help message and exit
      -v, --verbose         verbose output
      -H HOURLY, --hourly HOURLY
                            number of hourly archives to keep
...
      -p PREFIX, --prefix PREFIX
                            only consider archive names starting with this prefix
    
    Specifying a negative number of archives to keep means that there is no limit.

Would it make sense to put the even longer explanation mentioned below
as the epilog?  Should the longer descriptions on usage.html also be
available from the command line?

Dan

Dan Christensen <jdc@uwo.ca> writes:

> Also, based on my understanding, maybe the output of prune -h could
> contain a bit more detail.  For example:
>
>   Prune repository archives according to specified rules.  That is,
>   delete all archives except those selected by the rules.  As an
>   example, "-d 7" means to keep the latest backup on each day for 7
>   days.  Days without backups do not count towards the total.  The rules
>   are applied from hourly to yearly, and backups selected by previous
>   rules do not count towards those of later rules.  Dates and times are
>   interpreted in the local timezone, and weeks go from Monday to Sunday.
>   If a prefix is set with -p, then only archives that start with the
>   prefix are considered for deletion and only those archives count
>   towards the totals specified by the rules.
>
> On second thought, that may be a bit long for the default -h output.
> Maybe each command should have a brief -h output, which says to 
> specify -h -v for more details, and the longer description could
> be what's shown for each command under the title Description at 
>
>   https://pythonhosted.org/Attic/usage.html
>
> ?

Re: [attic] pruning

From:
Jonas Borgström
Date:
2014-02-03 @ 22:17
On 2014-02-03 22:47, Dan Christensen wrote:
> I was wondering where to mention that negative numbers mean infinity.
> One way to do it would be:
> 
>         prune_epilog = '''Specifying a negative number of archives to
>         keep means that there is no limit.'''
>         subparser = subparsers.add_parser('prune', parents=[common_parser],
>                                           description=self.do_prune.__doc__,
>                                           epilog=prune_epilog)
> 
> which produces
> 
>     usage: archiver.py prune [-h] [-v] [-H HOURLY] [-d DAILY] [-w WEEKLY]
>                              [-m MONTHLY] [-y YEARLY] [-p PREFIX]
>                              REPOSITORY
>     
>     Prune repository archives according to specified rules
>     
>     positional arguments:
>       REPOSITORY            repository to prune
>     
>     optional arguments:
>       -h, --help            show this help message and exit
>       -v, --verbose         verbose output
>       -H HOURLY, --hourly HOURLY
>                             number of hourly archives to keep
> ...
>       -p PREFIX, --prefix PREFIX
>                             only consider archive names starting with 
this prefix
>     
>     Specifying a negative number of archives to keep means that there is
no limit.
> 
> Would it make sense to put the even longer explanation mentioned below
> as the epilog?  Should the longer descriptions on usage.html also be
> available from the command line?

Yes, I think including a more detailed description of each command in
the epilog sounds like a good idea.
Your description below is much better than mine and would probably work
as an epilog and as a replacement for what's already in usage.rst.

/ Jonas

> 
> Dan
> 
> Dan Christensen <jdc@uwo.ca> writes:
> 
>> Also, based on my understanding, maybe the output of prune -h could
>> contain a bit more detail.  For example:
>>
>>   Prune repository archives according to specified rules.  That is,
>>   delete all archives except those selected by the rules.  As an
>>   example, "-d 7" means to keep the latest backup on each day for 7
>>   days.  Days without backups do not count towards the total.  The rules
>>   are applied from hourly to yearly, and backups selected by previous
>>   rules do not count towards those of later rules.  Dates and times are
>>   interpreted in the local timezone, and weeks go from Monday to Sunday.
>>   If a prefix is set with -p, then only archives that start with the
>>   prefix are considered for deletion and only those archives count
>>   towards the totals specified by the rules.
>>
>> On second thought, that may be a bit long for the default -h output.
>> Maybe each command should have a brief -h output, which says to 
>> specify -h -v for more details, and the longer description could
>> be what's shown for each command under the title Description at 
>>
>>   https://pythonhosted.org/Attic/usage.html
>>
>> ?