I think the design of pygit2 for dealing with tree-entries needs to be
rethought.
In libgit, tree entries are owned by their respective tree(builders).
In pygit2, the tree entries are wrapped (const TreeEntry*) pointers,
whose meaning becomes useless after the builder/tree has died. This
leads to surprising behavior (see below).
There are 2 ways to fix this:
1. make a python reference in TreeEntry that points back to the owning object
2. on creation, copy data from the C git_tree_entry to the TreeEntry object.
I propose 2, since this model is simpler, and the data to copy is
small (24 bytes + length of name).
[hanwen@haring pygit2]$ cat build.py
import pygit2 as p
r = p.Repository('.')
t = r[r.lookup_reference('HEAD').resolve().hex].tree
entries = [e for e in t]
for e in entries:
print e.name
r = None
t = None
print 'AFTER'
for e in entries:
print e.name
[hanwen@haring pygit2]$ python build.py
.gitignore
COPYING
README.rst
TODO.txt
pygit2.c
setup.py
test
AFTER
�3c @4[ @4[ @4[ �` ��b |b ��b ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b
�b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ �R[ �_ �_
�_ ��` p�b p�b ���I���I���I���I���I���I��b ��b
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�c (�`
�3c @4[ (3[ (3[ �` ��b |b ��b ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b
�b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ �R[
D[ ��`
���I���I��b ��b
���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I�c (�`
x_ ��I
�` ȑb �
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
Hey Han-Wen, this is a very valid point. The entries are indeed owned by the Tree or Builder that generated them. Given the simplicity of the elements in a Tree Entry, the approach we took on Rugged is creating a native Ruby Hash object for each entry. These are small and fast. I'd strongly encourage the same thing for Python, with a dictionary. Our first attempt in Rugged to trick the garbage collector by adding back-references worked nicely, but it was overly complex. Let's see what jdavid says. Cheers, Vicent On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > I think the design of pygit2 for dealing with tree-entries needs to be > rethought. > > In libgit, tree entries are owned by their respective tree(builders). > In pygit2, the tree entries are wrapped (const TreeEntry*) pointers, > whose meaning becomes useless after the builder/tree has died. This > leads to surprising behavior (see below). > > There are 2 ways to fix this: > > 1. make a python reference in TreeEntry that points back to the owning object > > 2. on creation, copy data from the C git_tree_entry to the TreeEntry object. > > I propose 2, since this model is simpler, and the data to copy is > small (24 bytes + length of name). > > [hanwen@haring pygit2]$ cat build.py > import pygit2 as p > r = p.Repository('.') > t = r[r.lookup_reference('HEAD').resolve().hex].tree > entries = [e for e in t] > for e in entries: > print e.name > > r = None > t = None > > print 'AFTER' > for e in entries: > print e.name > > [hanwen@haring pygit2]$ python build.py > .gitignore > COPYING > README.rst > TODO.txt > pygit2.c > setup.py > test > AFTER > > �3c @4[ @4[ @4[ �` ��b |b ��b ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b > �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ �R[ �_ �_ > �_ ��` p�b p�b ���I���I���I���I���I���I��b ��b ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c (�` > �3c @4[ (3[ (3[ �` ��b |b ��b ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b > �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ �R[ > D[ ��` > ���I���I��b ��b ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c (�` > x_ ��I > �` ȑb � > > > > -- > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
Hello, Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge. This change was done in the context of issue #56, since tree-entries can be owned by Tree or TreeBuilder objects (see comment by Carlos). The solution I prefer now is for tree-entries to hold a reference not to the tree or the tree-builder, but directly to the repo. Regarding the size of a tree-entry, while it is small, there are a lot in a repo. A lot of small things may be too much. So it is important to keep them small. Regarding the approach taking in Ruby, Vicent could you elaborate? I don't get it. On Fri, 9 Mar 2012 18:45:17 +0100 Vicent Marti <vicent@github.com> wrote: > Hey Han-Wen, > > this is a very valid point. The entries are indeed owned by the Tree > or Builder that generated them. Given the simplicity of the elements > in a Tree Entry, the approach we took on Rugged is creating a native > Ruby Hash object for each entry. These are small and fast. > > I'd strongly encourage the same thing for Python, with a dictionary. > Our first attempt in Rugged to trick the garbage collector by adding > back-references worked nicely, but it was overly complex. > > Let's see what jdavid says. > > Cheers, > Vicent > > On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com> > wrote: > > I think the design of pygit2 for dealing with tree-entries needs to > > be rethought. > > > > In libgit, tree entries are owned by their respective > > tree(builders). In pygit2, the tree entries are wrapped (const > > TreeEntry*) pointers, whose meaning becomes useless after the > > builder/tree has died. This leads to surprising behavior (see > > below). > > > > There are 2 ways to fix this: > > > > 1. make a python reference in TreeEntry that points back to the > > owning object > > > > 2. on creation, copy data from the C git_tree_entry to the > > TreeEntry object. > > > > I propose 2, since this model is simpler, and the data to copy is > > small (24 bytes + length of name). > > > > [hanwen@haring pygit2]$ cat build.py > > import pygit2 as p > > r = p.Repository('.') > > t = r[r.lookup_reference('HEAD').resolve().hex].tree > > entries = [e for e in t] > > for e in entries: > > print e.name > > > > r = None > > t = None > > > > print 'AFTER' > > for e in entries: > > print e.name > > > > [hanwen@haring pygit2]$ python build.py > > .gitignore > > COPYING > > README.rst > > TODO.txt > > pygit2.c > > setup.py > > test > > AFTER > > > > �3c @4[ @4[ @4[ �` ��b |b ��b ��` > > ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b �b > > �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ > > �R[ �_ �_ �_ ��` p�b p�b > > ���I���I���I���I���I���I��b ��b > > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c > > (�` �3c @4[ (3[ (3[ �` ��b |b ��b > > ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b > > H|b �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ > > �R[ D[ ��` ���I���I��b ��b > > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c > > (�` x_ ��I �` ȑb � > > > > > > > > -- > > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
On Fri, Mar 9, 2012 at 4:41 PM, J. David Ibanez <jdavid@itaapy.com> wrote: > Hello, > > Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree > object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge. > This change was done in the context of issue #56, since tree-entries > can be owned by Tree or TreeBuilder objects (see comment by Carlos). > > The solution I prefer now is for tree-entries to hold a reference not > to the tree or the tree-builder, but directly to the repo. I see why that might work for trees, but the docs are not explicit that the lifetime of the entry in a tree is tied to the repository. Also, how would that work for the lifetime of the entries of the TreeBuilder? In libgit, the tree builder is not tied to the repository. If you want to go native, a tuple is more pythonic, for (name, mode, sha) in tree: .. > On Fri, 9 Mar 2012 18:45:17 +0100 > Vicent Marti <vicent@github.com> wrote: > >> Hey Han-Wen, >> >> this is a very valid point. The entries are indeed owned by the Tree >> or Builder that generated them. Given the simplicity of the elements >> in a Tree Entry, the approach we took on Rugged is creating a native >> Ruby Hash object for each entry. These are small and fast. >> >> I'd strongly encourage the same thing for Python, with a dictionary. >> Our first attempt in Rugged to trick the garbage collector by adding >> back-references worked nicely, but it was overly complex. >> >> Let's see what jdavid says. >> >> Cheers, >> Vicent >> >> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com> >> wrote: >> > I think the design of pygit2 for dealing with tree-entries needs to >> > be rethought. >> > >> > In libgit, tree entries are owned by their respective >> > tree(builders). In pygit2, the tree entries are wrapped (const >> > TreeEntry*) pointers, whose meaning becomes useless after the >> > builder/tree has died. This leads to surprising behavior (see >> > below). >> > >> > There are 2 ways to fix this: >> > >> > 1. make a python reference in TreeEntry that points back to the >> > owning object >> > >> > 2. on creation, copy data from the C git_tree_entry to the >> > TreeEntry object. >> > >> > I propose 2, since this model is simpler, and the data to copy is >> > small (24 bytes + length of name). >> > >> > [hanwen@haring pygit2]$ cat build.py >> > import pygit2 as p >> > r = p.Repository('.') >> > t = r[r.lookup_reference('HEAD').resolve().hex].tree >> > entries = [e for e in t] >> > for e in entries: >> > print e.name >> > >> > r = None >> > t = None >> > >> > print 'AFTER' >> > for e in entries: >> > print e.name >> > >> > [hanwen@haring pygit2]$ python build.py >> > .gitignore >> > COPYING >> > README.rst >> > TODO.txt >> > pygit2.c >> > setup.py >> > test >> > AFTER >> > >> > �3c @4[ @4[ @4[ �` ��b |b ��b ��` >> > ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b �b >> > �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ >> > �R[ �_ �_ �_ ��` p�b p�b >> > ���I���I���I���I���I���I��b ��b >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c >> > (�` �3c @4[ (3[ (3[ �` ��b |b ��b >> > ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b >> > H|b �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ >> > �R[ D[ ��` ���I���I��b ��b >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c >> > (�` x_ ��I �` ȑb � >> > >> > >> > >> > -- >> > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen > > > > -- > J. David Ibáñez > Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 > 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88 -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > Also, how would that work for the lifetime of the entries of the > TreeBuilder? In libgit, the tree builder is not tied to the > repository. > > If you want to go native, a tuple is more pythonic, > > for (name, mode, sha) in tree: > .. I should add that I vastly prefer a wrapped object with a precise set of names, since it will less fragile. I always forget the order in which arguments are returned from functions like os.path.walk() -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
On Fri, Mar 9, 2012 at 14:49, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com> > wrote: > > > Also, how would that work for the lifetime of the entries of the > > TreeBuilder? In libgit, the tree builder is not tied to the > > repository. > > > > If you want to go native, a tuple is more pythonic, > > > > for (name, mode, sha) in tree: > > .. > > I should add that I vastly prefer a wrapped object with a precise set > of names, since it will less fragile. > May I add a horror story in this particular case that Dulwich's Tree type has (or at least had) two different iteration methods that yield these same tuples, with a different order of members. So, +1 to wrapped objects, or at the very least namedtuples :) > I always forget the order in which arguments are returned from > functions like os.path.walk() > > -- > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen >
Me too I prefer an extension type over a tuple (or a dict or any other standard type). Regarding the lifecycle, in issue #56 we (Carlos + me) have taken the same path for tree-builders, to tie them to the repo. Probably, if there is a need to move tree-entries or some other object across different repos, we could handle that, some way. Does somebody have a use case like that? On Fri, 9 Mar 2012 19:49:17 -0300 Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > On Fri, Mar 9, 2012 at 7:01 PM, Han-Wen Nienhuys <hanwenn@gmail.com> > wrote: > > > Also, how would that work for the lifetime of the entries of the > > TreeBuilder? In libgit, the tree builder is not tied to the > > repository. > > > > If you want to go native, a tuple is more pythonic, > > > > for (name, mode, sha) in tree: > > .. > > I should add that I vastly prefer a wrapped object with a precise set > of names, since it will less fragile. > > I always forget the order in which arguments are returned from > functions like os.path.walk() > -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
On Sat, Mar 10, 2012 at 6:35 AM, J. David Ibanez <jdavid@itaapy.com> wrote: > Me too I prefer an extension type over a tuple (or a dict or any other > standard type). > > Regarding the lifecycle, in issue #56 we (Carlos + me) have taken the > same path for tree-builders, to tie them to the repo. But the problem is not the builder or tree, but rather the tree entries. The following code assumes the builder comes out of the repository: r = Repo() e1 = r[head_sha].tree['filename'] e2 = r.NewBuilder().insert(0100644, 'filename', blob_sha1) r = None The repository and the builder are dead after 'r = None' so e1 and e2 contain garbage, unless e1 and e2 link back to their builders/trees. Perhaps I am missing something? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
This lifecycle issue should be fixed with the latest commit (a6a36ad80352e)? Could you check? On Sat, 10 Mar 2012 11:44:33 -0300 Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > On Sat, Mar 10, 2012 at 6:35 AM, J. David Ibanez <jdavid@itaapy.com> > wrote: > > Me too I prefer an extension type over a tuple (or a dict or any > > other standard type). > > > > Regarding the lifecycle, in issue #56 we (Carlos + me) have taken > > the same path for tree-builders, to tie them to the repo. > > But the problem is not the builder or tree, but rather the tree > entries. The following code assumes the builder comes out of the > repository: > > r = Repo() > e1 = r[head_sha].tree['filename'] > e2 = r.NewBuilder().insert(0100644, 'filename', > blob_sha1) > r = None > > The repository and the builder are dead after 'r = None' so e1 and e2 > contain garbage, unless e1 and e2 link back to their builders/trees. > > Perhaps I am missing something? > > -- > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com> wrote: > This lifecycle issue should be fixed with the latest commit (a6a36ad80352e)? > Could you check? I assume it works. How would you create new tree entries in this design? b = pygit2.TreeBuilder() # assuming this gets implemented... e = tree_builder.insert_new_entry(...) to which repository would e point? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
oh, ok, I have merged Carlos branch "treebuilder", so now tree-builders are tied to the repo, this should answer the question On Sun, 11 Mar 2012 23:35:02 -0300 Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com> > wrote: > > This lifecycle issue should be fixed with the latest commit > > (a6a36ad80352e)? Could you check? > > I assume it works. > > How would you create new tree entries in this design? > > b = pygit2.TreeBuilder() > # assuming this gets implemented... > e = tree_builder.insert_new_entry(...) > > to which repository would e point? > > -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
I see that you have raised about the same question in issue 56, so I will follow there. On Sun, 11 Mar 2012 23:35:02 -0300 Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > On Sun, Mar 11, 2012 at 6:57 PM, J. David Ibanez <jdavid@itaapy.com> > wrote: > > This lifecycle issue should be fixed with the latest commit > > (a6a36ad80352e)? Could you check? > > I assume it works. > > How would you create new tree entries in this design? > > b = pygit2.TreeBuilder() > # assuming this gets implemented... > e = tree_builder.insert_new_entry(...) > > to which repository would e point? > > -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
David: simply create a Python Dictionary and free the tree entry immediately.
{ "name" : "the_file.txt", "oid" : "deadbeef...", "attributes" : 040000}
On Fri, Mar 9, 2012 at 8:41 PM, J. David Ibanez <jdavid@itaapy.com> wrote:
> Hello,
>
> Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree
> object. In commit 4cdb1a83b43 I removed this, a mistake I acknowledge.
> This change was done in the context of issue #56, since tree-entries
> can be owned by Tree or TreeBuilder objects (see comment by Carlos).
>
> The solution I prefer now is for tree-entries to hold a reference not
> to the tree or the tree-builder, but directly to the repo.
>
> Regarding the size of a tree-entry, while it is small, there are a lot
> in a repo. A lot of small things may be too much. So it is important
> to keep them small.
>
> Regarding the approach taking in Ruby, Vicent could you elaborate? I
> don't get it.
>
>
>
>
> On Fri, 9 Mar 2012 18:45:17 +0100
> Vicent Marti <vicent@github.com> wrote:
>
>> Hey Han-Wen,
>>
>> this is a very valid point. The entries are indeed owned by the Tree
>> or Builder that generated them. Given the simplicity of the elements
>> in a Tree Entry, the approach we took on Rugged is creating a native
>> Ruby Hash object for each entry. These are small and fast.
>>
>> I'd strongly encourage the same thing for Python, with a dictionary.
>> Our first attempt in Rugged to trick the garbage collector by adding
>> back-references worked nicely, but it was overly complex.
>>
>> Let's see what jdavid says.
>>
>> Cheers,
>> Vicent
>>
>> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys <hanwenn@gmail.com>
>> wrote:
>> > I think the design of pygit2 for dealing with tree-entries needs to
>> > be rethought.
>> >
>> > In libgit, tree entries are owned by their respective
>> > tree(builders). In pygit2, the tree entries are wrapped (const
>> > TreeEntry*) pointers, whose meaning becomes useless after the
>> > builder/tree has died. This leads to surprising behavior (see
>> > below).
>> >
>> > There are 2 ways to fix this:
>> >
>> > 1. make a python reference in TreeEntry that points back to the
>> > owning object
>> >
>> > 2. on creation, copy data from the C git_tree_entry to the
>> > TreeEntry object.
>> >
>> > I propose 2, since this model is simpler, and the data to copy is
>> > small (24 bytes + length of name).
>> >
>> > [hanwen@haring pygit2]$ cat build.py
>> > import pygit2 as p
>> > r = p.Repository('.')
>> > t = r[r.lookup_reference('HEAD').resolve().hex].tree
>> > entries = [e for e in t]
>> > for e in entries:
>> > print e.name
>> >
>> > r = None
>> > t = None
>> >
>> > print 'AFTER'
>> > for e in entries:
>> > print e.name
>> >
>> > [hanwen@haring pygit2]$ python build.py
>> > .gitignore
>> > COPYING
>> > README.rst
>> > TODO.txt
>> > pygit2.c
>> > setup.py
>> > test
>> > AFTER
>> >
>> > �3c @4[ @4[ @4[ �` ��b |b ��b ��`
>> > ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b �b
>> > �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_
>> > �R[ �_ �_ �_ ��` p�b p�b
>> > ���I���I���I���I���I���I��b ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> > (�` �3c @4[ (3[ (3[ �` ��b |b ��b
>> > ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b
>> > H|b �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_
>> > �R[ D[ ��` ���I���I��b ��b
>> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� c
>> > (�` x_ ��I �` ȑb �
>> >
>> >
>> >
>> > --
>> > Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
>
>
>
> --
> J. David Ibáñez
> Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45
> 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88
Ok. I don't know how Ruby handles memory, but having a back-reference won't be much overhead in Python. On the other hand, dicts take a lot of space: >>> import sys >>> sys.getsizeof({}) 280 This will likely make no good to the perf. Plus, having a reference to the repo will allow to get back the 'to_object' (sugar) method. On Fri, 9 Mar 2012 21:12:37 +0100 Vicent Marti <vicent@github.com> wrote: > David: simply create a Python Dictionary and free the tree entry > immediately. > > { "name" : "the_file.txt", "oid" : "deadbeef...", "attributes" : > 040000} > > On Fri, Mar 9, 2012 at 8:41 PM, J. David Ibanez <jdavid@itaapy.com> > wrote: > > Hello, > > > > Actually, in pygit2 v0.16.0 a TreeEntry has a reference to the Tree > > object. In commit 4cdb1a83b43 I removed this, a mistake I > > acknowledge. This change was done in the context of issue #56, > > since tree-entries can be owned by Tree or TreeBuilder objects (see > > comment by Carlos). > > > > The solution I prefer now is for tree-entries to hold a reference > > not to the tree or the tree-builder, but directly to the repo. > > > > Regarding the size of a tree-entry, while it is small, there are a > > lot in a repo. A lot of small things may be too much. So it is > > important to keep them small. > > > > Regarding the approach taking in Ruby, Vicent could you elaborate? I > > don't get it. > > > > > > > > > > On Fri, 9 Mar 2012 18:45:17 +0100 > > Vicent Marti <vicent@github.com> wrote: > > > >> Hey Han-Wen, > >> > >> this is a very valid point. The entries are indeed owned by the > >> Tree or Builder that generated them. Given the simplicity of the > >> elements in a Tree Entry, the approach we took on Rugged is > >> creating a native Ruby Hash object for each entry. These are small > >> and fast. > >> > >> I'd strongly encourage the same thing for Python, with a > >> dictionary. Our first attempt in Rugged to trick the garbage > >> collector by adding back-references worked nicely, but it was > >> overly complex. > >> > >> Let's see what jdavid says. > >> > >> Cheers, > >> Vicent > >> > >> On Fri, Mar 9, 2012 at 2:27 PM, Han-Wen Nienhuys > >> <hanwenn@gmail.com> wrote: > >> > I think the design of pygit2 for dealing with tree-entries needs > >> > to be rethought. > >> > > >> > In libgit, tree entries are owned by their respective > >> > tree(builders). In pygit2, the tree entries are wrapped (const > >> > TreeEntry*) pointers, whose meaning becomes useless after the > >> > builder/tree has died. This leads to surprising behavior (see > >> > below). > >> > > >> > There are 2 ways to fix this: > >> > > >> > 1. make a python reference in TreeEntry that points back to the > >> > owning object > >> > > >> > 2. on creation, copy data from the C git_tree_entry to the > >> > TreeEntry object. > >> > > >> > I propose 2, since this model is simpler, and the data to copy is > >> > small (24 bytes + length of name). > >> > > >> > [hanwen@haring pygit2]$ cat build.py > >> > import pygit2 as p > >> > r = p.Repository('.') > >> > t = r[r.lookup_reference('HEAD').resolve().hex].tree > >> > entries = [e for e in t] > >> > for e in entries: > >> > print e.name > >> > > >> > r = None > >> > t = None > >> > > >> > print 'AFTER' > >> > for e in entries: > >> > print e.name > >> > > >> > [hanwen@haring pygit2]$ python build.py > >> > .gitignore > >> > COPYING > >> > README.rst > >> > TODO.txt > >> > pygit2.c > >> > setup.py > >> > test > >> > AFTER > >> > > >> > �3c @4[ @4[ @4[ �` ��b |b ��b > >> > ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b H|b > >> > �b �b pM\ �][ ��` ��` h��Ih��Ip��Ip��I�4_ > >> > �R[ �_ �_ �_ ��` p�b p�b > >> > ���I���I���I���I���I���I��b ��b > >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� > >> > c (�` �3c @4[ (3[ (3[ �` ��b |b ��b > >> > ��` ��` x_ X8\ 8��I8��I`<\ D[ H|b > >> > H|b �b �b pM\ �][ ��` ��` > >> > h��Ih��Ip��Ip��I�4_ �R[ D[ ��` ���I���I��b ��b > >> > ���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I���I� > >> > c (�` x_ ��I �` ȑb � > >> > > >> > > >> > > >> > -- > >> > Han-Wen Nienhuys - hanwen@xs4all.nl - > >> > http://www.xs4all.nl/~hanwen > > > > > > > > -- > > J. David Ibáñez > > Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 > > 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88 -- J. David Ibáñez Itaapy <http://www.itaapy.com> Tel +33 (0)1 42 23 67 45 9 rue Darwin, 75018 Paris Fax +33 (0)1 53 28 27 88