Hi all, I'm sure you're sick of this question, but I was hoping to get a quick status update on pushing to a remote. I'd love to use libgit2 for my project, and this is the only missing piece for me right now... Thanks! -josh
Hey Josh, unfortunately we're still delayed on push. We're hoping that this year's Summer of Code student will finish it -- it's quite a bit of work, because it has pack-objects as a prerequisite. You're of course welcome to help ;) Cheers, Vicent On Sat, Mar 10, 2012 at 12:57 AM, Josh Bleecher Snyder <josharian@gmail.com> wrote: > Hi all, > > I'm sure you're sick of this question, but I was hoping to get a quick > status update on pushing to a remote. I'd love to use libgit2 for my > project, and this is the only missing piece for me right now... > > Thanks! > > -josh
Hi Vincent, Thanks for the update. I'd be interested in helping, but I would be coming in completely cold, both to this project and generally to git internals. Are there any relevant entry-level chunks I could tackle to get up to speed? Alternatively, is there a write-up anywhere of what needs to get done that I could use to orient myself? Feel free to reply by simply filing issues and pointing me at them. :) Cheers, Josh On Fri, Mar 9, 2012 at 5:44 PM, Vicent Marti <vicent@github.com> wrote: > Hey Josh, > > unfortunately we're still delayed on push. We're hoping that this > year's Summer of Code student will finish it -- it's quite a bit of > work, because it has pack-objects as a prerequisite. > > You're of course welcome to help ;) > > Cheers, > Vicent > > On Sat, Mar 10, 2012 at 12:57 AM, Josh Bleecher Snyder > <josharian@gmail.com> wrote: >> Hi all, >> >> I'm sure you're sick of this question, but I was hoping to get a quick >> status update on pushing to a remote. I'd love to use libgit2 for my >> project, and this is the only missing piece for me right now... >> >> Thanks! >> >> -josh
I was going to ask exactly the same question. Would be very willing, but probably not very able, to help by contributing any entry level parts of remote support. Matias On Sat, Mar 10, 2012 at 2:03 AM, Josh Bleecher Snyder <josharian@gmail.com>wrote: > Hi Vincent, > > Thanks for the update. I'd be interested in helping, but I would be > coming in completely cold, both to this project and generally to git > internals. > > Are there any relevant entry-level chunks I could tackle to get up to > speed? Alternatively, is there a write-up anywhere of what needs to > get done that I could use to orient myself? > > Feel free to reply by simply filing issues and pointing me at them. :) > > Cheers, > Josh > > > On Fri, Mar 9, 2012 at 5:44 PM, Vicent Marti <vicent@github.com> wrote: > > Hey Josh, > > > > unfortunately we're still delayed on push. We're hoping that this > > year's Summer of Code student will finish it -- it's quite a bit of > > work, because it has pack-objects as a prerequisite. > > > > You're of course welcome to help ;) > > > > Cheers, > > Vicent > > > > On Sat, Mar 10, 2012 at 12:57 AM, Josh Bleecher Snyder > > <josharian@gmail.com> wrote: > >> Hi all, > >> > >> I'm sure you're sick of this question, but I was hoping to get a quick > >> status update on pushing to a remote. I'd love to use libgit2 for my > >> project, and this is the only missing piece for me right now... > >> > >> Thanks! > >> > >> -josh >
Hi, On Fri, Mar 09, 2012 at 06:03:00PM -0800, Josh Bleecher Snyder wrote: > Are there any relevant entry-level chunks I could tackle to get up to > speed? Alternatively, is there a write-up anywhere of what needs to > get done that I could use to orient myself? These links helped me: * Git Concepts (http://schacon.github.com/git/user-manual.html#git-concepts) * Git magic - Secrets Revealed (http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html) * Hacking Git (http://schacon.github.com/git/user-manual.html#hacking-git) I highly recommend what the "Hacking Git" link suggests: Check out the first version of Git and read through some of the code. You'll discover that the push / pull functionality for git started out as a script that would rsync the objects directory (https://github.com/gitster/git/blob/839a7a06f35bf8cd563a41d6db97f453ab108129/git-pull-script)
Sean M. Collins wrote: > I highly recommend what the "Hacking Git" link suggests: Check out the > first version of Git and read through some of the code. You'll discover > that the push / pull functionality for git started out as a script > that would rsync the objects directory Documentation/technical/pack-protocol.txt and early historical versions of pack-objects.c and unpack-objects.c found by git log -- pack-objects.c unpack-objects.c might be useful as well.
On Fri, Mar 09, 2012 at 11:27:57PM -0600, Jonathan Nieder wrote: > Documentation/technical/pack-protocol.txt and early historical > versions of pack-objects.c and unpack-objects.c found by > > git log -- pack-objects.c unpack-objects.c > > might be useful as well. Indeed. Commit c323ac7 in git has a pretty good commit message from Linus that describes the overall idea. -- Sean M. Collins
Thanks Sean and Josh Thanks for the pointers, most helpful! I've poked a bit around in git and libgit2 source. Do I understand the high level of the problem with libgit2 correctly: the equivalent of pack-objects, which creates the packaged representations, is missing but unpack-objects is already available? Similarly, delta computing from git's delta.c that pack-objects requires, is missing, but applying a delta is available? Matias (Please be gentle: this is all new to me.) On Sat, Mar 10, 2012 at 5:39 AM, Sean M. Collins <sean@coreitpro.com> wrote: > On Fri, Mar 09, 2012 at 11:27:57PM -0600, Jonathan Nieder wrote: > > Documentation/technical/pack-protocol.txt and early historical > > versions of pack-objects.c and unpack-objects.c found by > > > > git log -- pack-objects.c unpack-objects.c > > > > might be useful as well. > > Indeed. Commit c323ac7 in git has a pretty good commit message > from Linus that describes the overall idea. > > -- > Sean M. Collins >
On Sat, Mar 10, 2012 at 2:26 PM, Matias Piipari <matias.piipari@gmail.com> wrote: > Do I understand the high level of the problem with libgit2 > correctly: the equivalent of pack-objects, which creates the packaged > representations, is missing but unpack-objects is already available? > Similarly, delta computing from git's delta.c that pack-objects requires, is > missing, but applying a delta is available? Yep, that's exactly it! The code to actually build packfiles is currently missing from the library, and it'd be a great starting point to get Push up to speed. > (Please be gentle: this is all new to me.) I have unlimited patience for people willing to throw us a hand. Cheers, Vicent
>> Do I understand the high level of the problem with libgit2 >> correctly: the equivalent of pack-objects, which creates the packaged >> representations, is missing but unpack-objects is already available? >> Similarly, delta computing from git's delta.c that pack-objects requires, is >> missing, but applying a delta is available? > > Yep, that's exactly it! The code to actually build packfiles is > currently missing from the library, and it'd be a great starting point > to get Push up to speed. I've just worked through all the resources -- thanks to everyone who sent them along, very helpful. It looks like the patch-delta stuff is, strictly speaking, optional when writing a pack-objects implementation. For a first pass, all objects could be put into the pack file directly (without any of them in a delta representation), and the delta calculations could be added as an optimization. Does that sound right? (I'm also surprised that there aren't multiple delta strategies aimed at different file types.) Matias, where are you with this? I don't want to duplicate work or step on your toes. I could get started on this this week, most likely... Duke, same question. Let us know if/when you write any tests. Failing tests are a wonderful place to start. :) Perhaps we should start a libgit2 fork for coordinating work on push? Or is there a better way to coordinate? -josh
Hi Josh Re: packing full objects, was my understanding from the documentation too that one could simply create packfiles consisting of the types other than the two delta representations to get started. That part seems fairly straightforward. The delta computing part looks more involved. diff-delta.c in git.git says it's 'greatly inspired by parts of LibXDiff' (a LGPLd library available at http://www.xmailserver.org/xdiff-lib.html). Not sure from this wording if one could get started with lifting it from LibXDiff and with this generate something that libgit2 is able to successfully unpack? Didn't really have the time to try this out. Can try though if there's some reason to suggest it would work, and if it would be legal to lift code from libXDiff that is LGPL to libgit2 which is GPL with a linking exemption? The other problem with the delta representations is of course choosing the right original based on which you make a delta rep. For this I have some ideas from reading source + the stuff that people have pointed out for some simple heuristics, and would be happy to contribute some time and code if the delta itself can be sorted. Re-implementing the delta itself from scratch does not sound appealing. Also came to my mind that alternative delta formats would be something interesting to look into as a little extension (not for a general purpose normal git repo of course), especially as the git packfile entry type field has two unused bit combinations :-) Have been experimenting a bit with binary delta compression with open-vcdiff and hacking together something based on it that creates tight binary deltas with rather a small memory footprint. Re: duplicating work & my toes, very happy to take your lead Josh on a joint fork. Let me know if you have further ideas of how to coordinate this, again very open to suggestions and can be contacted via mail or IM at this address too. I'm a noob with the delta compression algorithmic work and busy at least until after NSConference next week, so more than happy to be told what to do rather than try to start this from a blank slate. Can contribute some help and feedback with designing an API and of course some failing tests too. Also can put together Objective-C bindings to the object packing functionality into Objective-Git, although I suppose this last point is outside of the remit of this list. Matias On Mon, Mar 12, 2012 at 8:27 PM, Josh Bleecher Snyder <josharian@gmail.com>wrote: > >> Do I understand the high level of the problem with libgit2 > >> correctly: the equivalent of pack-objects, which creates the packaged > >> representations, is missing but unpack-objects is already available? > >> Similarly, delta computing from git's delta.c that pack-objects > requires, is > >> missing, but applying a delta is available? > > > > Yep, that's exactly it! The code to actually build packfiles is > > currently missing from the library, and it'd be a great starting point > > to get Push up to speed. > > I've just worked through all the resources -- thanks to everyone who > sent them along, very helpful. > > It looks like the patch-delta stuff is, strictly speaking, optional > when writing a pack-objects implementation. For a first pass, all > objects could be put into the pack file directly (without any of them > in a delta representation), and the delta calculations could be added > as an optimization. Does that sound right? (I'm also surprised that > there aren't multiple delta strategies aimed at different file types.) > > Matias, where are you with this? I don't want to duplicate work or > step on your toes. I could get started on this this week, most > likely... > > Duke, same question. Let us know if/when you write any tests. Failing > tests are a wonderful place to start. :) > > Perhaps we should start a libgit2 fork for coordinating work on push? > Or is there a better way to coordinate? > > -josh >
> Re: packing full objects, was my understanding from the documentation too > that one could simply create packfiles consisting of the types other than > the two delta representations to get started. That part seems fairly > straightforward. Yep. Let's start there. > The delta computing part looks more involved. > [...] > Re-implementing the delta itself from scratch does not sound > appealing. Indeed! Particularly when you read this: http://schacon.github.com/git/technical/pack-heuristics.txt Ugh. Some of that also (in theory) applies to just building the pack-file, but I think we can leave fine-tuning the object ordering as a todo in the short term. Thinking: (1) The initial purpose is push, not gc, so performant random object access isn't as critical -- the remote is just going to unpack our packfile anyway. (2) Picking the ideal object ordering should be pretty pluggable. We should just make sure we have enough information at hand to make the right decisions later, and put in hooks as we go. > Re: duplicating work & my toes, very happy to take your lead Josh on a joint fork. Ok, let's give that a try; and if it's not working, we'll retrench. I've made a fork at https://github.com/josharian/libgit2, with the primary branch being packfile (branched off development). I've added you (@mz2, right?) as a collaborator. > Let me know if you have further ideas of how to coordinate this, again > very open to suggestions and can be contacted via mail or IM at this address > too. Unless/until the others object, let's keep our communication on-list -- that'll give everyone a chance to chime in and keep us on the right track. :) If you do need to get ahold of me directly, mail and IM both work for this address. > I'm a noob with the delta compression algorithmic work and busy at > least until after NSConference next week, so more than happy to be told what > to do rather than try to start this from a blank slate. Can contribute some > help and feedback with designing an API and of course some failing tests > too. Agreed, first step is putting together an API, and some very basic failing tests. I have more background in Objective-C and Python than in C, so feedback will be most welcome. I'll spend a bit of time looking through the rest of libgit2's API and plan to send along an API proposal for feedback this week. > Also can put together Objective-C bindings to the object packing > functionality into Objective-Git, although I suppose this last point is > outside of the remit of this list. Let's tackle that one later. :) -josh > On Mon, Mar 12, 2012 at 8:27 PM, Josh Bleecher Snyder <josharian@gmail.com> > wrote: >> >> >> Do I understand the high level of the problem with libgit2 >> >> correctly: the equivalent of pack-objects, which creates the packaged >> >> representations, is missing but unpack-objects is already available? >> >> Similarly, delta computing from git's delta.c that pack-objects >> >> requires, is >> >> missing, but applying a delta is available? >> > >> > Yep, that's exactly it! The code to actually build packfiles is >> > currently missing from the library, and it'd be a great starting point >> > to get Push up to speed. >> >> I've just worked through all the resources -- thanks to everyone who >> sent them along, very helpful. >> >> It looks like the patch-delta stuff is, strictly speaking, optional >> when writing a pack-objects implementation. For a first pass, all >> objects could be put into the pack file directly (without any of them >> in a delta representation), and the delta calculations could be added >> as an optimization. Does that sound right? (I'm also surprised that >> there aren't multiple delta strategies aimed at different file types.) >> >> Matias, where are you with this? I don't want to duplicate work or >> step on your toes. I could get started on this this week, most >> likely... >> >> Duke, same question. Let us know if/when you write any tests. Failing >> tests are a wonderful place to start. :) >> >> Perhaps we should start a libgit2 fork for coordinating work on push? >> Or is there a better way to coordinate? >> >> -josh > >
On Tue, Mar 13, 2012 at 4:19 AM, Josh Bleecher Snyder <josharian@gmail.com> wrote: > Some of that also (in theory) applies to just building the pack-file, > but I think we can leave fine-tuning the object ordering as a todo in > the short term. Thinking: (1) The initial purpose is push, not gc, so > performant random object access isn't as critical -- the remote is > just going to unpack our packfile anyway. (2) Picking the ideal object > ordering should be pretty pluggable. We should just make sure we have > enough information at hand to make the right decisions later, and put > in hooks as we go. I very much like this approach! Make sure to design the packing API so that the packing ordering heuristics can be eventually plugged in without rewriting much/any code. They are going to be critical, eventually. > >> Re: duplicating work & my toes, very happy to take your lead Josh on a joint fork. > > Ok, let's give that a try; and if it's not working, we'll retrench. > I've made a fork at https://github.com/josharian/libgit2, with the > primary branch being packfile (branched off development). I've added > you (@mz2, right?) as a collaborator. Brilliant! I'll keep an eye on this and see if I find any issues as you work. >> Let me know if you have further ideas of how to coordinate this, again >> very open to suggestions and can be contacted via mail or IM at this address >> too. As a reminder, using the code from core Git as an inspiration is always a good idea. There are no licensing problems, because we've explicitly asked for permission -- just be careful when copying & pasting given that the core Git code is not reentrant at all, and we're working on a library here. Anyway, can't wait to see what you guys come up with. Cheers, Vicent
Howdy fellow libgit2-hackers, > Yep, that's exactly it! The code to actually build packfiles is > currently missing from the library, and it'd be a great starting point > to get Push up to speed. This discussion really helped me understand the current status of libgit2 push. I am very interested in helping test the creation of packfiles. > I have unlimited patience for people willing to throw us a hand. Is it still the case that all new tests should be written using clay and that we are still transitioning the old test suite? I have been out of the loop for a bit. Duke -- Jonathan "Duke" Leto <jonathan@leto.net> Leto Labs LLC 209.691.DUKE // http://labs.leto.net NOTE: Personal email is only checked twice a day at 10am/2pm PST, please call/text for time-sensitive matters.
On Sun, Mar 11, 2012 at 6:16 PM, Jonathan "Duke" Leto <jonathan@leto.net> wrote: > Is it still the case that all new tests should be written using clay and that > we are still transitioning the old test suite? I have been out of the loop for > a bit. Yep. We haven't written any new tests for the old suite in a while.
Cool! I've been reading about the packfile format from the resources you guys pointed out and this book: http://book.git-scm.com/7_the_packfile.html), and dicked around with a hex editor with some pack files from repos on my disk. I think I got the idea of how to put together the header to the packfile and all the undeltified pack entry types. A question based on reading: The diagram in the Git book (link above) has an example of working out the uncompressed data size for an entry, and it shows 0010010 0000 as the bits that encode the length information. I can see how that would be the case from the example given and the rules listed on the page and in Documentation/technical/package-format.txt, However, the book suggests this means a 144 byte length. If I interpret this sequence of bits as a big-endian integer, I get 288, not 144. Shifting by one bit to right then would obviously give 144 as noted in the book. I'm wondering if I'm misunderstanding something very simple here? On Sun, Mar 11, 2012 at 5:46 PM, Vicent Marti <vicent@github.com> wrote: > On Sun, Mar 11, 2012 at 6:16 PM, Jonathan "Duke" Leto <jonathan@leto.net> > wrote: > > Is it still the case that all new tests should be written using clay and > that > > we are still transitioning the old test suite? I have been out of the > loop for > > a bit. > > Yep. We haven't written any new tests for the old suite in a while. >
On Sun, Mar 11, 2012 at 8:56 PM, Matias Piipari <matias.piipari@gmail.com> wrote: > The diagram in the Git book (link above) has an example of working out the > uncompressed data size for an entry, and it shows 0010010 0000 as the bits > that encode the length information. I can see how that would be the case > from the example given and the rules listed on the page and in > Documentation/technical/package-format.txt, However, the book suggests this > means a 144 byte length. If I interpret this sequence of bits as a > big-endian integer, I get 288, not 144. Shifting by one bit to right then > would obviously give 144 as noted in the book. I'm wondering if I'm > misunderstanding something very simple here? I've done the math several times, and I'm also getting 288. It's probably a mistake Scott made on the picture. ^^