Hi all,
When you use git-stash on a repo, it leaves behind a reference
refs/stash. Whilst running git_reference_listall on such a repo, current
behaviour is to return an error to the user. I think we can agree that
this isn't the best way to react.
One approach to solve this is what you can see in my "allow-stash"
branch[0] which explicitly allows refs/stash to exist. But git-stash is
just one known tool which leaves traces in refs/ and there may be more.
I believe it's more elegant and correct to skip "invalid" references. A
patch to do this is at the bottom of the mail (comments welcome and all
that).
There is also another question, which is related. Do we want to allow
custom tools to write references wherever they want? Most users won't
need this (and it would be a bug if they did), but if we want to get
git.git to use libgit2, git-stash needs to be implementable. Would it be
considered a good idea to add an override variable (and have the normal
one be a #define or an inline wrapper or whatever) in order only to
check for it being in refs/? There probably aren't that many tools, but
adding a special check every time a tool is written seems like the wrong
approach.
cmn
[0]
https://github.com/carlosmn/libgit2/commit/69ea1a535e0662c532812157ba1a46274894fb4f
---8<-----8<-----
diff --git a/src/refs.c b/src/refs.c
index 8e29655..1f08ece 100644
--- a/src/refs.c
+++ b/src/refs.c
@@ -494,6 +494,11 @@ static int packed_parse_oid(
refname[refname_len - 1] = 0;
error = reference_create((git_reference **)&ref, repo, refname,
GIT_REF_OID);
+ if (error == GIT_EINVALIDREFNAME){
+ /* Allow us to read the next one */
+ *buffer_out = refname_end + 1;
+ goto cleanup;
+ }
if (error < GIT_SUCCESS)
goto cleanup;
@@ -576,6 +581,8 @@ static int packed_load(git_repository *repo)
reference_oid *ref = NULL;
error = packed_parse_oid(&ref, repo, &buffer_start, buffer_end);
+ if (error == GIT_EINVALIDREFNAME)
+ continue;
if (error < GIT_SUCCESS)
goto cleanup;
On mié, 2011-03-23 at 10:57 +0100, Carlos Martín Nieto wrote: [...] > There is also another question, which is related. Do we want to allow > custom tools to write references wherever they want? Most users won't > need this (and it would be a bug if they did), but if we want to get > git.git to use libgit2, git-stash needs to be implementable. Would it be > considered a good idea to add an override variable (and have the normal > one be a #define or an inline wrapper or whatever) in order only to > check for it being in refs/? There probably aren't that many tools, but > adding a special check every time a tool is written seems like the wrong > approach. This problem doesn't just show up with odd tools. I'm trying to implement update-ref with libgit2, but I've come up against a problem with the way libgit2 thinks references should look like. If I want to transform a relative into an absolute refname, I have to delete it and then create it again. I can understand this up to a point, though I'd rather have some king of transform function. However, normalize_name assumes that no OID refs are allowed without at least a slash (and then checks for it to be inside of refs/{heads,tags,remotes} which makes one of the checks redundant unless it's done for performance purposes). In detached HEAD state (and possibly during a merge, but I'm not sure about that one) this is not true. Deleting the check at refs.c:1570 and adding HEAD to the whitelist would solve this particular problem, but I have a more general question, which is pretty much the same as the one in the original mail (though now I have a real example). Are these checks really necessary or could we just take out the last check in normalize_name? cmn
Hello,
> If I want to transform a relative into an absolute refname, I have to
delete it and then create it again.
I'm not sure I get a clear understanding of the use case. What would be a
relative refname or an absolute one?
Em.
On mié, 2011-03-23 at 13:58 +0100, Emeric Fermas wrote: > Hello, > > > If I want to transform a relative into an absolute refname, I have to > > delete it and then create it again. > I'm not sure I get a clear understanding of the use case. What would be a > relative refname or an absolute one? I mean symbolic and OID, sorry. My mind starts describing it as file paths if I'm not careful. cmn
>My mind starts describing it as file paths if I'm not careful. :) > I mean symbolic and OID, sorry. Wow. Why would one need such a transformation? Meaning, is there such a requirement in the git ecosystem? Or maybe are you refering to peeling a symref? Em. On Wed, Mar 23, 2011 at 2:01 PM, Carlos Martín Nieto <cmn@elego.de> wrote: > On mié, 2011-03-23 at 13:58 +0100, Emeric Fermas wrote: > > Hello, > > > > > If I want to transform a relative into an absolute refname, I have to > > > delete it and then create it again. > > I'm not sure I get a clear understanding of the use case. What would be a > > relative refname or an absolute one? > > I mean symbolic and OID, sorry. My mind starts describing it as file > paths if I'm not careful. > > cmn > > >
On mié, 2011-03-23 at 16:12 +0100, Emeric Fermas wrote: > >My mind starts describing it as file paths if I'm not careful. > :) > > > I mean symbolic and OID, sorry. > Wow. Why would one need such a transformation? Meaning, is there such a > requirement in the git ecosystem? Or maybe are you refering to peeling a > symref? Peeling a reference works fine with the git_reference_resolve function and friends, that's not the problem. There is indeed such a requirement, and quite a useful one as well. HEAD is usually a symbolic reference, but in the 'detached HEAD' state (say, when you run `git checkout origin/some-branch`), HEAD becomes an OID reference. In my case (implementing a libgit2 version of update-ref) this needs to be done whenever the user tells you to. In the end I solved the transformation problem by creating a new reference with the same name as the old one (a patch to fix the memory leak caused by this is in the works) but my original problem of normalize_path being too restrictive is still there. cmn
> There is indeed such a requirement, and quite a useful one as well. >HEAD is usually a symbolic reference, but in the 'detached HEAD' state >(say, when you run `git checkout origin/some-branch`), HEAD becomes an >OID reference. In my case (implementing a libgit2 version of update-ref) >this needs to be done whenever the user tells you to. Thanks a lot for this very clear explanation. > In the end I solved the transformation problem by creating a new >reference with the same name as the old one So in the "detached HEAD perspective", one would drop the HEAD symref, then create an HEAD oidref. >but my original problem of normalize_path being too restrictive is still there. We could add a special case allowing creation of a oidref named HEAD, just like you created an exception for the "stash". Indeed, "stash" and "HEAD" look like standard cgit ref requirement that libgit2 should be able to deal with. However, I'm rather in favor of the "whitelist" approach which allows to control that what is requested is "sane" from a git usage perspective. Few weeks ago, I asked this question <http://stackoverflow.com/q/4986000>. Indeed, cgit allows you to do pretty weird things which could result in turning your repo into a real mess. I really am afraid of what Junio C. Hamano described as "What the code do to them (without crashing) is not the design, but simply an undefined behaviour." and I'd really like libgit2 to provide some safety net, in accordance with cgit design, by trying to reduce the surface of exposure for "undefined behavior". Em.
Hi, Emeric Fermas wrote: > We could add a special case allowing creation of a oidref named HEAD, just > like you created an exception for the "stash". Indeed, "stash" and "HEAD" > look like standard cgit ref requirement that libgit2 should be able to deal > with. > However, I'm rather in favor of the "whitelist" approach which allows to > control that what is requested is "sane" from a git usage perspective. FWIW there are two relevant differences between HEAD and stash: - HEAD can be a symref, while stash is always an oidref; - HEAD is actually at .git/HEAD, while stash's full name is refs/stash. What the stash does isn't unusual at all. Refs with names like refs/attic/foo, refs/top-bases/foo, refs/yesterday, and so on are perfectly valid and definitely part of the design. Currently oid references like MERGE_HEAD tend to be created and updated by writing the file in .git directly. Since their names do not start with refs/, they are not packed by "git pack-refs" so this usage is safe, but it might make sense to fold that functionality into something like update-ref anyway. The rules for valid refnames are described in git-check-ref-format(1). > I really am afraid of what Junio C. Hamano described as "What the code do to > them (without crashing) is not the design, but simply an undefined > behaviour." and I'd really like libgit2 to provide some safety net, in > accordance with cgit design, by trying to reduce the surface of exposure for > "undefined behavior". Junio was talking about symrefs, and what you say makes perfect sense for them. Hope that helps, Jonathan
Hi Jonathan, Thanks a lot for all this information! This sheds some very appreciated light on the subject. I hope you won't mind giving me some more hints :) >What the stash does isn't unusual at all. Refs with names like refs/attic/foo, >refs/top-bases/foo, refs/yesterday, and so on are perfectly valid and definitely >part of the design. As stated in git-ref-format : "A reference is used in git to specify branches and tags. A branch head is stored under the $GIT_DIR/refs/heads directory, and a tag is stored under the $GIT_DIR/refs/tags directory (or, if refs are packed by git gc, as entries in the $GIT_DIR/packed-refs file).". However, cgit doesn't compel to this convention ("They must contain at least one /. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted."). I'm quite puzzled: Nowadays, from a git workflow perspective, what would lead one developer to create (or move) a ref into one of these refs/xxx structures instead of refs/heads or refs/tags? Or maybe the heads/tags classification was not there in the early releases of git? Would those refs/xxx/ tree structures hold both heads and tags refs, for instance? The doc also states: "<tag> a valid tag name (i.e. the contents of $GIT_DIR/refs/tags/<tag>)." and "<head> a valid head name (i.e. the contents of $GIT_DIR/refs/heads/<head>)." >Junio was talking about symrefs, and what you say makes perfect sense >for them. I know he was talking about symrefs, I'm the one who asked the question :) So, taking this into account, should libgit2 prevent one from storing a symref under the refs/ ? >The rules for valid refnames are described in git-check-ref-format(1). Yes, however, it doesn't make very clear if this should apply to ONLY oid ref. Note: I'm not trying to nitpick. I'm only trying to gather enough info to eventually come up with a better implementation proposal. Cheers, Em.
Hi Emeric, Emeric Fermas wrote: > I hope you won't mind giving me some more hints :) Sure, not at all. :) > I'm quite puzzled: Nowadays, from a git workflow perspective, what > would lead one developer to create (or move) a ref into one of these > refs/xxx structures instead of refs/heads or refs/tags? Or maybe the > heads/tags classification was not there in the early releases of git? > Would those refs/xxx/ tree structures hold both heads and tags refs, > for instance? There are two reasons: 1) refs indicate that some history should be preserved, so sometimes when wanting to make a branch invisible without deleting the corresponding history, people move it somewhere like refs/attic. 2) Various tools extending git claim their own parts of the refs hierarchy. For example, topgit uses refs/top-bases, and "git notes" uses refs/notes. The only branches that you can check out are under refs/heads; the rest of the refs behave roughly like tags and remote-tracking branches. > I know he was talking about symrefs, I'm the one who asked the question :) > So, taking this into account, should libgit2 prevent one from storing > a symref under the refs/ ? Probably yes, except for refs/remotes/foo/HEAD as he mentioned. >> The rules for valid refnames are described in git-check-ref-format(1). > > Yes, however, it doesn't make very clear if this should apply to ONLY oid ref. Generally "ref" means "object reference" (i.e., oid ref). I agree with you that the documentation of rules for symrefs could use some help. Hope that helps, Jonathan
Hi Jonathan, >Hope that helps, That helps a lot. Thanks! Is it safe to assume the following statements: -Git internally relies on "reserved" references (HEAD, refs/stash, refs/bisect, ORIG_HEAD, ...) -It would be dangerous to let a human Git consumer alter the life-cycle (creation/updation, ...) of those references Beside Git, would you know about some tooling which would require such a feature (altering "reserved" references)? Thanks in advance, Cheers Em. On Thu, Mar 24, 2011 at 10:36 PM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Hi Emeric, > > Emeric Fermas wrote: > >> I hope you won't mind giving me some more hints :) > > Sure, not at all. :) > >> I'm quite puzzled: Nowadays, from a git workflow perspective, what >> would lead one developer to create (or move) a ref into one of these >> refs/xxx structures instead of refs/heads or refs/tags? Or maybe the >> heads/tags classification was not there in the early releases of git? >> Would those refs/xxx/ tree structures hold both heads and tags refs, >> for instance? > > There are two reasons: > > 1) refs indicate that some history should be preserved, so sometimes > when wanting to make a branch invisible without deleting the > corresponding history, people move it somewhere like refs/attic. > > 2) Various tools extending git claim their own parts of the refs > hierarchy. For example, topgit uses refs/top-bases, and "git > notes" uses refs/notes. > > The only branches that you can check out are under refs/heads; the > rest of the refs behave roughly like tags and remote-tracking > branches. > >> I know he was talking about symrefs, I'm the one who asked the question :) >> So, taking this into account, should libgit2 prevent one from storing >> a symref under the refs/ ? > > Probably yes, except for refs/remotes/foo/HEAD as he mentioned. > >>> The rules for valid refnames are described in git-check-ref-format(1). >> >> Yes, however, it doesn't make very clear if this should apply to ONLY oid ref. > > Generally "ref" means "object reference" (i.e., oid ref). I agree > with you that the documentation of rules for symrefs could use some > help. > > Hope that helps, > Jonathan >
Hi again. :) Emeric Fermas wrote: > Is it safe to assume the following statements: > -Git internally relies on "reserved" references (HEAD, refs/stash, > refs/bisect, ORIG_HEAD, ...) > -It would be dangerous to let a human Git consumer alter the > life-cycle (creation/updation, ...) of those references It depends on what you mean by Git and by human, I suppose. * refs/stash and refs/bisect/* are tool-specific --- the former is only used by "git stash" and the latter only by "git bisect". If I wanted to write a stash or bisect clone, I would want libgit2 to let me modify them, and otherwise it seems more reasonable for me to stake a claim on some other ref. Meanwhile spying on what stash and bisect do by reading those refs would seem less unusual. Now suppose I want to use refs to represent a mapping from Subversion revision numbers to git commits (I don't, but suppose I wanted to). It would be obnoxious to place these under refs/heads/* or refs/tags/*, since then they would clutter "git branch" and "git tag -l" output. So I should put them under refs/mytool/*. * Updating HEAD is a very common thing for low-level tools to do. It is how you change where the current branch points. * Before "git checkout --orphan" existed, some people emulated it by running git symbolic-ref -m "preparing new parentless commit" \ HEAD refs/heads/newbranch > Beside Git, would you know about some tooling which would require such > a feature (altering "reserved" references)? This formulation seems odd --- refs/stash is reserved for "git stash", but refs/orangutan is not reserved for anyone. The main use I know of for creating refs outside refs/notes, refs/heads, refs/remotes, refs/tags, refs/stash, refs/bisect, refs/top-bases, refs/replace, refs/changes (Gerrit), and refs/original is to hide a branch without making it a candidate for pruning, like so: cmit=$(git rev-parse --verify refs/heads/topic) && git update-ref -m retiring refs/attic/topic "$cmit" "" && git update-ref -m retiring -d refs/heads/topic "$cmit" Because of the reference to $cmit under refs/attic/, that piece of history is guaranteed not to be garbage collected. Meanwhile it does not clutter up the "git branch" output. So, to make a long story short: humans don't tend to deal with refs directly --- they work with abstractions like "branch", "tag", or "note collection". But the ability to invent new abstractions without changing the core of git (and without even telling anyone about it) is very important. Sorry for the ramble. Jonathan
Hi Jonathan, > Sorry for the ramble. Are you kidding me? It's not a ramble, it's a pure gem! Thanks a lot for sharing all this in such a clear way.I guess we have no other choice than loosening the tie. >This formulation seems odd Sorry for my badly worded question. Your answer extensively covered my question.. and beyond (the garbage collecting insight, for instance). I'm still a bit sad that there is no way, for instance, to prevent a user from making HEAD point to a tag. But I think I can live with this. :) Once again, thank you for having taken the time to exchange on this subject. Cheers, Em, part-time orangutan. :)
Emeric Fermas wrote: > I'm still a bit sad that there is no way, for instance, to prevent a > user from making HEAD point to a tag. But I think I can live with > this. :) Perhaps the symbolic-ref API could be made strict (only allowing a HEAD symref to point to refs/heads/* and refs/remotes/<remote>/HEAD to point to refs/remotes/<remote>/*)? My only worry would be from sequences like read HEAD change HEAD temporarily put HEAD back the way it was in case the user was crazy enough to make HEAD point to a tag separately, but in that case I think the user is crazy enough to suffer. :) Likewise I'd be happy with the update-ref API only allowing updates to refs that pass check-ref-format (meaning refs/stash would be okay but not HEAD) and treating updates to HEAD, CHERRY_PICK_HEAD, etc as a different operation. > Once again, thank you for having taken the time to exchange on this subject. No problem; thanks for your thoughtful work. :) Jonathan
On lun, 2011-03-28 at 16:55 -0500, Jonathan Nieder wrote: > Emeric Fermas wrote: > > > I'm still a bit sad that there is no way, for instance, to prevent a > > user from making HEAD point to a tag. But I think I can live with > > this. :) > > Perhaps the symbolic-ref API could be made strict (only allowing a > HEAD symref to point to refs/heads/* and refs/remotes/<remote>/HEAD to > point to refs/remotes/<remote>/*)? My only worry would be from > sequences like If we provide a branch/remotes API (which would eventually play together with the config), then the reference API could be left for users who are doing unusual things. > > read HEAD > change HEAD temporarily > put HEAD back the way it was > > in case the user was crazy enough to make HEAD point to a tag > separately, but in that case I think the user is crazy enough to > suffer. :) That'd be interesting to see. > > Likewise I'd be happy with the update-ref API only allowing updates to > refs that pass check-ref-format (meaning refs/stash would be okay but > not HEAD) and treating updates to HEAD, CHERRY_PICK_HEAD, etc as a > different operation. Are CHERRY_PICK_HEAD and co. object id or symbolic refs? The normalize_name function only allows object id references inside its whitelist (currently the three standard dirs under refs/). The pull request[0] that deals with this allows anything under refs/ and HEAD, so refs/stash is covered. With the rest of the tests, it should also be pretty much equivalent to check-ref-format. BTW, I've noticed check-ref-format will allow a ref called /this/that, and `update-ref /home/carlos/ref` HEAD will silentely fail to create the reference. This is probably a better discussion for git@vger. Maybe there could be an additional API if you want to deal with "shallow" refs (or whatever you'd like to call them). [0] https://github.com/libgit2/libgit2/pull/96 cmn
Carlos Martín Nieto wrote: > Are CHERRY_PICK_HEAD and co. object id or symbolic refs? They're ordinary (object id) refs. > The > normalize_name function only allows object id references inside its > whitelist (currently the three standard dirs under refs/). > > The pull request[0] that deals with this allows anything under refs/ > and HEAD, so refs/stash is covered. With the rest of the tests, it > should also be pretty much equivalent to check-ref-format. Excellent. That seems like the right behavior to me. > BTW, I've > noticed check-ref-format will allow a ref called /this/that, and > `update-ref /home/carlos/ref` HEAD will silentely fail to create the > reference. This is probably a better discussion for git@vger. > > Maybe there could be an additional API if you want to deal with > "shallow" refs (or whatever you'd like to call them). Yes, exactly.
On mié, 2011-03-23 at 14:16 -0500, Jonathan Nieder wrote: > Hi, > > Emeric Fermas wrote: > > > We could add a special case allowing creation of a oidref named HEAD, just > > like you created an exception for the "stash". Indeed, "stash" and "HEAD" > > look like standard cgit ref requirement that libgit2 should be able to deal > > with. > > However, I'm rather in favor of the "whitelist" approach which allows to > > control that what is requested is "sane" from a git usage perspective. > > FWIW there are two relevant differences between HEAD and stash: > > - HEAD can be a symref, while stash is always an oidref; > - HEAD is actually at .git/HEAD, while stash's full name is refs/stash. > > What the stash does isn't unusual at all. Refs with names like refs/attic/foo, > refs/top-bases/foo, refs/yesterday, and so on are perfectly valid and definitely > part of the design. So then the whitelist could just be HEAD and we'd change the check for refs/{tags,heads,remotes}/ at the end of normalize_name to just check for something under refs/. > > Currently oid references like MERGE_HEAD tend to be created and > updated by writing the file in .git directly. Since their names do > not start with refs/, they are not packed by "git pack-refs" so this > usage is safe, but it might make sense to fold that functionality into > something like update-ref anyway. Maybe it'd be useful to expose an API that allows the user to create non-standard references (which would then be "protected" by a switch in update-ref). > > The rules for valid refnames are described in git-check-ref-format(1). I've just tried it out and git-check-ref-format doesn't allow HEAD... but I guess that one is a special case. > > > I really am afraid of what Junio C. Hamano described as "What the code do to > > them (without crashing) is not the design, but simply an undefined > > behaviour." and I'd really like libgit2 to provide some safety net, in > > accordance with cgit design, by trying to reduce the surface of exposure for > > "undefined behavior". > > Junio was talking about symrefs, and what you say makes perfect sense > for them. > > Hope that helps, > Jonathan >
On mié, 2011-03-23 at 17:42 +0100, Emeric Fermas wrote: > > There is indeed such a requirement, and quite a useful one as well. > >HEAD is usually a symbolic reference, but in the 'detached HEAD' state > >(say, when you run `git checkout origin/some-branch`), HEAD becomes an > >OID reference. In my case (implementing a libgit2 version of update-ref) > >this needs to be done whenever the user tells you to. > Thanks a lot for this very clear explanation. > > > In the end I solved the transformation problem by creating a new > >reference with the same name as the old one > So in the "detached HEAD perspective", one would drop the HEAD symref, then > create an HEAD oidref. Yes. Either that or (with libgit2's current behaviour) create a HEAD oidref which replaces the one you had. This can cause a bit of a problem with memory[0] but it's a lot easier for the programmer. > > >but my original problem of normalize_path being too restrictive is still > there. > We could add a special case allowing creation of a oidref named HEAD, just > like you created an exception for the "stash". Indeed, "stash" and "HEAD" > look like standard cgit ref requirement that libgit2 should be able to deal > with. > However, I'm rather in favor of the "whitelist" approach which allows to > control that what is requested is "sane" from a git usage perspective. Well, if Junio says the rest is undefined, then that's fine. I'll make a patch for that tomorrow. [0] See issue #95 https://github.com/libgit2/libgit2/issues/95 on GitHub cmn