librelist archives

« back to archive

Pygit2 tree entry lifetime management

Pygit2 tree entry lifetime management

From:
Han-Wen Nienhuys
Date:
2012-03-09 @ 13:27
I think the design of pygit2 for dealing with tree-entries needs to be
rethought.

In libgit, tree entries are owned by their respective tree(builders).
In pygit2, the tree entries are wrapped (const TreeEntry*) pointers,
whose meaning becomes useless after the builder/tree has died. This
leads to surprising behavior (see below).

There are 2 ways to fix this:

1. make a python reference in TreeEntry that points back to the owning object

2. on creation, copy data from the C git_tree_entry to the TreeEntry object.

I propose 2, since this model is simpler, and the data to copy is
small (24 bytes + length of name).

[hanwen@haring pygit2]$ cat build.py
import pygit2 as p
r = p.Repository('.')
t = r[r.lookup_reference('HEAD').resolve().hex].tree
entries = [e for e in t]
for e in entries:
    print e.name

r = None
t = None

print 'AFTER'
for e in entries:
    print e.name

[hanwen@haring pygit2]$ python build.py
.gitignore
COPYING
README.rst
TODO.txt
pygit2.c
setup.py
test
AFTER

�3c	@4[	@4[	@4[	�`	��b	|b	��b	��`	��`	x_	X8\	8��I8��I`<\	D[	H|b	H|b
    �b      �b	pM\	�][	��`	��`	h��Ih��Ip��Ip��I�4_	�R[	�_	�_	
�_	��`	p�b	p�b	���I���I���I���I���I���I��b	��b	
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�c	(�`
�3c	@4[	(3[	(3[	�`	��b	|b	��b	��`	��`	x_	X8\	8��I8��I`<\	D[	H|b	H|b
    �b      �b	pM\	�][	��`	��`	h��Ih��Ip��Ip��I�4_	�R[
D[	��`	
���I���I��b	��b	
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�c	(�`
x_	��I
�`	ȑb	�



-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Vicent Marti
Date:
2012-03-09 @ 17:45
Hey Han-Wen,

this is a very valid point. The entries are indeed owned by the Tree
or Builder that generated them. Given the simplicity of the elements
in a Tree Entry, the approach we took on Rugged is creating a native
Ruby Hash object for each entry. These are small and fast.

I'd strongly encourage the same thing for Python, with a dictionary.
Our first attempt in Rugged to trick the garbage collector by adding
back-references worked nicely, but it was overly complex.

Let's see what jdavid says.

Cheers,
Vicent

On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:
> I think the design of pygit2 for dealing with tree-entries needs to be
> rethought.
>
> In libgit, tree entries are owned by their respective tree(builders).
> In pygit2, the tree entries are wrapped (const TreeEntry*) pointers,
> whose meaning becomes useless after the builder/tree has died. This
> leads to surprising behavior (see below).
>
> There are 2 ways to fix this:
>
> 1. make a python reference in TreeEntry that points back to the owning object
>
> 2. on creation, copy data from the C git_tree_entry to the TreeEntry object.
>
> I propose 2, since this model is simpler, and the data to copy is
> small (24 bytes + length of name).
>
> [hanwen@haring pygit2]$ cat build.py
> import pygit2 as p
> r = p.Repository('.')
> t = r[r.lookup_reference('HEAD').resolve().hex].tree
> entries = [e for e in t]
> for e in entries:
>    print e.name
>
> r = None
> t = None
>
> print 'AFTER'
> for e in entries:
>    print e.name
>
> [hanwen@haring pygit2]$ python build.py
> .gitignore
> COPYING
> README.rst
> TODO.txt
> pygit2.c
> setup.py
> test
> AFTER
>
> �3c     @4[     @4[     @4[      �`     ��b      |b     ��b     ��`     
��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b
>    �b      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_     
�R[     �_      �_
> �_      ��`     p�b     p�b     ���I���I���I���I���I���I��b     ��b     
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c     
(�`
> �3c     @4[     (3[     (3[      �`     ��b      |b     ��b     ��`     
��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b
>    �b      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_     �R[
>  D[     ��`
> ���I���I��b     ��b     
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c     
(�`
>  x_      ��I
>  �`     ȑb      �
>
>
>
> --
> Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-09 @ 19:41
Hello,

Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree
object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge.
This change was done in the context of issue #56, since tree-entries
can be owned by Tree or TreeBuilder objects (see comment by Carlos).

The solution I prefer now is for tree-entries to hold a reference not
to the tree or the tree-builder, but directly to the repo.

Regarding the size of a tree-entry, while it is small, there are a lot
in a repo. A lot of small things may be too much. So it is important
to keep them small.

Regarding the approach taking in Ruby, Vicent could you elaborate? I
don't get it.




On Fri, 9 Mar 2012 18:45:17 +0100
Vicent Marti <vicent@github.com> wrote:

> Hey Han-Wen,
> 
> this is a very valid point. The entries are indeed owned by the Tree
> or Builder that generated them. Given the simplicity of the elements
> in a Tree Entry, the approach we took on Rugged is creating a native
> Ruby Hash object for each entry. These are small and fast.
> 
> I'd strongly encourage the same thing for Python, with a dictionary.
> Our first attempt in Rugged to trick the garbage collector by adding
> back-references worked nicely, but it was overly complex.
> 
> Let's see what jdavid says.
> 
> Cheers,
> Vicent
> 
> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
> wrote:
> > I think the design of pygit2 for dealing with tree-entries needs to
> > be rethought.
> >
> > In libgit, tree entries are owned by their respective
> > tree(builders). In pygit2, the tree entries are wrapped (const
> > TreeEntry*) pointers, whose meaning becomes useless after the
> > builder/tree has died. This leads to surprising behavior (see
> > below).
> >
> > There are 2 ways to fix this:
> >
> > 1. make a python reference in TreeEntry that points back to the
> > owning object
> >
> > 2. on creation, copy data from the C git_tree_entry to the
> > TreeEntry object.
> >
> > I propose 2, since this model is simpler, and the data to copy is
> > small (24 bytes + length of name).
> >
> > [hanwen@haring pygit2]$ cat build.py
> > import pygit2 as p
> > r = p.Repository('.')
> > t = r[r.lookup_reference('HEAD').resolve().hex].tree
> > entries = [e for e in t]
> > for e in entries:
> >    print e.name
> >
> > r = None
> > t = None
> >
> > print 'AFTER'
> > for e in entries:
> >    print e.name
> >
> > [hanwen@haring pygit2]$ python build.py
> > .gitignore
> > COPYING
> > README.rst
> > TODO.txt
> > pygit2.c
> > setup.py
> > test
> > AFTER
> >
> > �3c     @4[     @4[     @4[      �`     ��b      |b     ��b     ��`
> >     ��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b �b
> >      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
> > �R[     �_      �_ �_      ��`     p�b     p�b
> > ���I���I���I���I���I���I��b     ��b
> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
> >     (�` �3c     @4[     (3[     (3[      �`     ��b      |b     ��b
> >     ��`     ��`      x_     X8\     8��I8��I`<\      D[     H|b
> > H|b �b      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
> >     �R[ D[     ��` ���I���I��b     ��b
> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
> >     (�` x_      ��I �`     ȑb      �
> >
> >
> >
> > --
> > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Han-Wen Nienhuys
Date:
2012-03-09 @ 22:01
On Fri, Mar 9, 2012 at 4:41 PM, J. David Ibanez <jdavid@itaapy.com> wrote:
> Hello,
>
> Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree
> object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge.
> This change was done in the context of issue #56, since tree-entries
> can be owned by Tree or TreeBuilder objects (see comment by Carlos).
>
> The solution I prefer now is for tree-entries to hold a reference not
> to the tree or the tree-builder, but directly to the repo.

I see why that might work for trees, but the docs are not explicit
that the lifetime of the entry in a tree is tied to the repository.

Also, how would that work for the lifetime of the entries of the
TreeBuilder?  In libgit, the tree builder is not tied to the
repository.

If you want to go native, a tuple is more pythonic,

for (name, mode, sha) in tree:
  ..



> On Fri, 9 Mar 2012 18:45:17 +0100
> Vicent Marti <vicent@github.com> wrote:
>
>> Hey Han-Wen,
>>
>> this is a very valid point. The entries are indeed owned by the Tree
>> or Builder that generated them. Given the simplicity of the elements
>> in a Tree Entry, the approach we took on Rugged is creating a native
>> Ruby Hash object for each entry. These are small and fast.
>>
>> I'd strongly encourage the same thing for Python, with a dictionary.
>> Our first attempt in Rugged to trick the garbage collector by adding
>> back-references worked nicely, but it was overly complex.
>>
>> Let's see what jdavid says.
>>
>> Cheers,
>> Vicent
>>
>> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
>> wrote:
>> > I think the design of pygit2 for dealing with tree-entries needs to
>> > be rethought.
>> >
>> > In libgit, tree entries are owned by their respective
>> > tree(builders). In pygit2, the tree entries are wrapped (const
>> > TreeEntry*) pointers, whose meaning becomes useless after the
>> > builder/tree has died. This leads to surprising behavior (see
>> > below).
>> >
>> > There are 2 ways to fix this:
>> >
>> > 1. make a python reference in TreeEntry that points back to the
>> > owning object
>> >
>> > 2. on creation, copy data from the C git_tree_entry to the
>> > TreeEntry object.
>> >
>> > I propose 2, since this model is simpler, and the data to copy is
>> > small (24 bytes + length of name).
>> >
>> > [hanwen@haring pygit2]$ cat build.py
>> > import pygit2 as p
>> > r = p.Repository('.')
>> > t = r[r.lookup_reference('HEAD').resolve().hex].tree
>> > entries = [e for e in t]
>> > for e in entries:
>> >    print e.name
>> >
>> > r = None
>> > t = None
>> >
>> > print 'AFTER'
>> > for e in entries:
>> >    print e.name
>> >
>> > [hanwen@haring pygit2]$ python build.py
>> > .gitignore
>> > COPYING
>> > README.rst
>> > TODO.txt
>> > pygit2.c
>> > setup.py
>> > test
>> > AFTER
>> >
>> > �3c     @4[     @4[     @4[      �`     ��b      |b     ��b     ��`
>> >     ��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b �b
>> >      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
>> > �R[     �_      �_ �_      ��`     p�b     p�b
>> > ���I���I���I���I���I���I��b     ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> >     (�` �3c     @4[     (3[     (3[      �`     ��b      |b     ��b
>> >     ��`     ��`      x_     X8\     8��I8��I`<\      D[     H|b
>> > H|b �b      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
>> >     �R[ D[     ��` ���I���I��b     ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> >     (�` x_      ��I �`     ȑb      �
>> >
>> >
>> >
>> > --
>> > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
>
>
>
> --
> J. David Ibáñez
> Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
> 9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88



-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Han-Wen Nienhuys
Date:
2012-03-09 @ 22:49
On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> Also, how would that work for the lifetime of the entries of the
> TreeBuilder?  In libgit, the tree builder is not tied to the
> repository.
>
> If you want to go native, a tuple is more pythonic,
>
> for (name, mode, sha) in tree:
>  ..

I should add that I vastly prefer a wrapped object with a precise set
of names, since it will less fragile.

I always forget the order in which arguments are returned from
functions like os.path.walk()

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Dave Borowitz
Date:
2012-03-10 @ 02:29
On Fri, Mar 9, 2012 at 14:49, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
> wrote:
>
> > Also, how would that work for the lifetime of the entries of the
> > TreeBuilder?  In libgit, the tree builder is not tied to the
> > repository.
> >
> > If you want to go native, a tuple is more pythonic,
> >
> > for (name, mode, sha) in tree:
> >  ..
>
> I should add that I vastly prefer a wrapped object with a precise set
> of names, since it will less fragile.
>

May I add a horror story in this particular case that Dulwich's Tree type
has (or at least had) two different iteration methods that yield these same
tuples, with a different order of members. So, +1 to wrapped objects, or at
the very least namedtuples :)


> I always forget the order in which arguments are returned from
> functions like os.path.walk()
>
> --
> Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
>

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-10 @ 09:35
Me too I prefer an extension type over a tuple (or a dict or any other
standard type).

Regarding the lifecycle, in issue #56 we (Carlos + me) have taken the
same path for tree-builders, to tie them to the repo.

Probably, if there is a need to move tree-entries or some other object
across different repos, we could handle that, some way. Does somebody
have a use case like that?




On Fri, 9 Mar 2012 19:49:17 -0300
Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
> wrote:
> 
> > Also, how would that work for the lifetime of the entries of the
> > TreeBuilder?  In libgit, the tree builder is not tied to the
> > repository.
> >
> > If you want to go native, a tuple is more pythonic,
> >
> > for (name, mode, sha) in tree:
> >  ..
> 
> I should add that I vastly prefer a wrapped object with a precise set
> of names, since it will less fragile.
> 
> I always forget the order in which arguments are returned from
> functions like os.path.walk()
> 



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Han-Wen Nienhuys
Date:
2012-03-10 @ 14:44
On Sat, Mar 10, 2012 at 6:35 AM, J. David Ibanez <jdavid@itaapy.com> wrote:
> Me too I prefer an extension type over a tuple (or a dict or any other
> standard type).
>
> Regarding the lifecycle, in issue #56 we (Carlos + me) have taken the
> same path for tree-builders, to tie them to the repo.

But the problem is not the builder or tree, but rather the tree
entries.  The following code assumes the builder comes out of the
repository:

r = Repo()
e1 = r[head_sha].tree['filename']
e2 = r.NewBuilder().insert(0100644, 'filename',
  blob_sha1)
r = None

The repository and the builder are dead after 'r = None' so e1 and e2
contain garbage, unless e1 and e2 link back to their builders/trees.

Perhaps I am missing something?

--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-11 @ 21:57
This lifecycle issue should be fixed with the latest commit (a6a36ad80352e)?
Could you check?


On Sat, 10 Mar 2012 11:44:33 -0300
Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Sat, Mar 10, 2012 at 6:35 AM, J. David Ibanez <jdavid@itaapy.com>
> wrote:
> > Me too I prefer an extension type over a tuple (or a dict or any
> > other standard type).
> >
> > Regarding the lifecycle, in issue #56 we (Carlos + me) have taken
> > the same path for tree-builders, to tie them to the repo.
> 
> But the problem is not the builder or tree, but rather the tree
> entries.  The following code assumes the builder comes out of the
> repository:
> 
> r = Repo()
> e1 = r[head_sha].tree['filename']
> e2 = r.NewBuilder().insert(0100644, 'filename',
>   blob_sha1)
> r = None
> 
> The repository and the builder are dead after 'r = None' so e1 and e2
> contain garbage, unless e1 and e2 link back to their builders/trees.
> 
> Perhaps I am missing something?
> 
> --
> Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Han-Wen Nienhuys
Date:
2012-03-12 @ 02:35
On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com> wrote:
> This lifecycle issue should be fixed with the latest commit (a6a36ad80352e)?
> Could you check?

I assume it works.

How would you create new tree entries in this design?

b = pygit2.TreeBuilder()
# assuming this gets implemented...
e = tree_builder.insert_new_entry(...)

to which repository would e point?


-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-12 @ 21:39
oh, ok, I have merged Carlos branch "treebuilder", so now tree-builders
are tied to the repo, this should answer the question



On Sun, 11 Mar 2012 23:35:02 -0300
Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com>
> wrote:
> > This lifecycle issue should be fixed with the latest commit
> > (a6a36ad80352e)? Could you check?
> 
> I assume it works.
> 
> How would you create new tree entries in this design?
> 
> b = pygit2.TreeBuilder()
> # assuming this gets implemented...
> e = tree_builder.insert_new_entry(...)
> 
> to which repository would e point?
> 
> 



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-12 @ 21:15
I see that you have raised about the same question in issue 56, so I
will follow there.


On Sun, 11 Mar 2012 23:35:02 -0300
Han-Wen Nienhuys <hanwenn@gmail.com> wrote:

> On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com>
> wrote:
> > This lifecycle issue should be fixed with the latest commit
> > (a6a36ad80352e)? Could you check?
> 
> I assume it works.
> 
> How would you create new tree entries in this design?
> 
> b = pygit2.TreeBuilder()
> # assuming this gets implemented...
> e = tree_builder.insert_new_entry(...)
> 
> to which repository would e point?
> 
> 



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
Vicent Marti
Date:
2012-03-09 @ 20:12
David: simply create a Python Dictionary and free the tree entry immediately.

{ "name" : "the_file.txt", "oid" : "deadbeef...", "attributes" : 040000}

On Fri, Mar 9, 2012 at 8:41 PM, J. David Ibanez <jdavid@itaapy.com> wrote:
> Hello,
>
> Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree
> object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge.
> This change was done in the context of issue #56, since tree-entries
> can be owned by Tree or TreeBuilder objects (see comment by Carlos).
>
> The solution I prefer now is for tree-entries to hold a reference not
> to the tree or the tree-builder, but directly to the repo.
>
> Regarding the size of a tree-entry, while it is small, there are a lot
> in a repo. A lot of small things may be too much. So it is important
> to keep them small.
>
> Regarding the approach taking in Ruby, Vicent could you elaborate? I
> don't get it.
>
>
>
>
> On Fri, 9 Mar 2012 18:45:17 +0100
> Vicent Marti <vicent@github.com> wrote:
>
>> Hey Han-Wen,
>>
>> this is a very valid point. The entries are indeed owned by the Tree
>> or Builder that generated them. Given the simplicity of the elements
>> in a Tree Entry, the approach we took on Rugged is creating a native
>> Ruby Hash object for each entry. These are small and fast.
>>
>> I'd strongly encourage the same thing for Python, with a dictionary.
>> Our first attempt in Rugged to trick the garbage collector by adding
>> back-references worked nicely, but it was overly complex.
>>
>> Let's see what jdavid says.
>>
>> Cheers,
>> Vicent
>>
>> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
>> wrote:
>> > I think the design of pygit2 for dealing with tree-entries needs to
>> > be rethought.
>> >
>> > In libgit, tree entries are owned by their respective
>> > tree(builders). In pygit2, the tree entries are wrapped (const
>> > TreeEntry*) pointers, whose meaning becomes useless after the
>> > builder/tree has died. This leads to surprising behavior (see
>> > below).
>> >
>> > There are 2 ways to fix this:
>> >
>> > 1. make a python reference in TreeEntry that points back to the
>> > owning object
>> >
>> > 2. on creation, copy data from the C git_tree_entry to the
>> > TreeEntry object.
>> >
>> > I propose 2, since this model is simpler, and the data to copy is
>> > small (24 bytes + length of name).
>> >
>> > [hanwen@haring pygit2]$ cat build.py
>> > import pygit2 as p
>> > r = p.Repository('.')
>> > t = r[r.lookup_reference('HEAD').resolve().hex].tree
>> > entries = [e for e in t]
>> > for e in entries:
>> >    print e.name
>> >
>> > r = None
>> > t = None
>> >
>> > print 'AFTER'
>> > for e in entries:
>> >    print e.name
>> >
>> > [hanwen@haring pygit2]$ python build.py
>> > .gitignore
>> > COPYING
>> > README.rst
>> > TODO.txt
>> > pygit2.c
>> > setup.py
>> > test
>> > AFTER
>> >
>> > �3c     @4[     @4[     @4[      �`     ��b      |b     ��b     ��`
>> >     ��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b �b
>> >      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
>> > �R[     �_      �_ �_      ��`     p�b     p�b
>> > ���I���I���I���I���I���I��b     ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> >     (�` �3c     @4[     (3[     (3[      �`     ��b      |b     ��b
>> >     ��`     ��`      x_     X8\     8��I8��I`<\      D[     H|b
>> > H|b �b      �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
>> >     �R[ D[     ��` ���I���I��b     ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> >     (�` x_      ��I �`     ȑb      �
>> >
>> >
>> >
>> > --
>> > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
>
>
>
> --
> J. David Ibáñez
> Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
> 9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88

Re: [libgit2] Pygit2 tree entry lifetime management

From:
J. David Ibanez
Date:
2012-03-09 @ 21:43
Ok.

I don't know how Ruby handles memory, but having a back-reference
won't be much overhead in Python. On the other hand, dicts take a
lot of space:

>>> import sys
>>> sys.getsizeof({})
280

This will likely make no good to the perf.

Plus, having a reference to the repo will allow to get back the
'to_object' (sugar) method.




On Fri, 9 Mar 2012 21:12:37 +0100
Vicent Marti <vicent@github.com> wrote:

> David: simply create a Python Dictionary and free the tree entry
> immediately.
> 
> { "name" : "the_file.txt", "oid" : "deadbeef...", "attributes" :
> 040000}
> 
> On Fri, Mar 9, 2012 at 8:41 PM, J. David Ibanez <jdavid@itaapy.com>
> wrote:
> > Hello,
> >
> > Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree
> > object. In commit 4cdb1a83b43 I removed this, a mistake I
> > acknowledge. This change was done in the context of issue #56,
> > since tree-entries can be owned by Tree or TreeBuilder objects (see
> > comment by Carlos).
> >
> > The solution I prefer now is for tree-entries to hold a reference
> > not to the tree or the tree-builder, but directly to the repo.
> >
> > Regarding the size of a tree-entry, while it is small, there are a
> > lot in a repo. A lot of small things may be too much. So it is
> > important to keep them small.
> >
> > Regarding the approach taking in Ruby, Vicent could you elaborate? I
> > don't get it.
> >
> >
> >
> >
> > On Fri, 9 Mar 2012 18:45:17 +0100
> > Vicent Marti <vicent@github.com> wrote:
> >
> >> Hey Han-Wen,
> >>
> >> this is a very valid point. The entries are indeed owned by the
> >> Tree or Builder that generated them. Given the simplicity of the
> >> elements in a Tree Entry, the approach we took on Rugged is
> >> creating a native Ruby Hash object for each entry. These are small
> >> and fast.
> >>
> >> I'd strongly encourage the same thing for Python, with a
> >> dictionary. Our first attempt in Rugged to trick the garbage
> >> collector by adding back-references worked nicely, but it was
> >> overly complex.
> >>
> >> Let's see what jdavid says.
> >>
> >> Cheers,
> >> Vicent
> >>
> >> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys
> >> <hanwenn@gmail.com> wrote:
> >> > I think the design of pygit2 for dealing with tree-entries needs
> >> > to be rethought.
> >> >
> >> > In libgit, tree entries are owned by their respective
> >> > tree(builders). In pygit2, the tree entries are wrapped (const
> >> > TreeEntry*) pointers, whose meaning becomes useless after the
> >> > builder/tree has died. This leads to surprising behavior (see
> >> > below).
> >> >
> >> > There are 2 ways to fix this:
> >> >
> >> > 1. make a python reference in TreeEntry that points back to the
> >> > owning object
> >> >
> >> > 2. on creation, copy data from the C git_tree_entry to the
> >> > TreeEntry object.
> >> >
> >> > I propose 2, since this model is simpler, and the data to copy is
> >> > small (24 bytes + length of name).
> >> >
> >> > [hanwen@haring pygit2]$ cat build.py
> >> > import pygit2 as p
> >> > r = p.Repository('.')
> >> > t = r[r.lookup_reference('HEAD').resolve().hex].tree
> >> > entries = [e for e in t]
> >> > for e in entries:
> >> >    print e.name
> >> >
> >> > r = None
> >> > t = None
> >> >
> >> > print 'AFTER'
> >> > for e in entries:
> >> >    print e.name
> >> >
> >> > [hanwen@haring pygit2]$ python build.py
> >> > .gitignore
> >> > COPYING
> >> > README.rst
> >> > TODO.txt
> >> > pygit2.c
> >> > setup.py
> >> > test
> >> > AFTER
> >> >
> >> > �3c     @4[     @4[     @4[      �`     ��b      |b     ��b
> >> > ��` ��`      x_     X8\     8��I8��I`<\      D[     H|b     H|b
> >> > �b �b  pM\     �][     ��`     ��`     h��Ih��Ip��Ip��I�4_
> >> > �R[     �_      �_ �_      ��`     p�b     p�b
> >> > ���I���I���I���I���I���I��b     ��b
> >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�
> >> > c (�` �3c     @4[     (3[     (3[      �`     ��b      |b     ��b
> >> >     ��`     ��`      x_     X8\     8��I8��I`<\      D[     H|b
> >> > H|b �b      �b  pM\     �][     ��`     ��`
> >> > h��Ih��Ip��Ip��I�4_ �R[ D[     ��` ���I���I��b     ��b
> >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�
> >> > c (�` x_      ��I �`     ȑb      �
> >> >
> >> >
> >> >
> >> > --
> >> > Han-Wen Nienhuys - hanwen@xs4all.nl -
> >> > http://www.xs4all.nl/~hanwen
> >
> >
> >
> > --
> > J. David Ibáñez
> > Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
> > 9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88



-- 
J. David Ibáñez
Itaapy <http://www.itaapy.com>         Tel +33 (0)1 42 23 67 45
9 rue Darwin, 75018 Paris              Fax +33 (0)1 53 28 27 88