librelist archives

« back to archive

Low level remote API

Low level remote API

From:
Stefan Buschmann
Date:
2013-01-05 @ 18:46
Hi all,

I started using libgit2 for a distributed synchronization project and 
have to say that this library is really great. The API is well 
documented and easy to use. Great work!

However, now I'm struggling the first time as I'm trying to implement 
the push/pull-logic. I'm using the git repository only as some kind of 
database, so I have my own logic of how and what kind of objects are 
stored in it, that differs from that of a git-repository. To be more 
precise, I currently save blobs only, no trees and no revisions. Now I 
want to be able to pull and push some of these objects to or from a remote.

Unfortunately, it seems that the remote-logic inherently assumes a 
git-repository layout, and I could not find a way around it. What I need 
would be a low-level remote interface that allows me to connect to a 
remote, ask what objects are there and push or pull an arbitrary list of 
objects/references, without any magic that tries to walk revisions or 
find a HEAD revision etc.

Do you think it would be possible to revise the remote API to offer such 
kind of low-level support? That would be great for users like me that 
want to use low-level git but not a real git-repository layout. All 
higher-level logic for the "real" git could be implemented on top of it, 
e.g., inside the push/pull-API.

Thanks,

Stefan

Re: [libgit2] Low level remote API

From:
Carlos Martin Nieto
Date:
2013-01-05 @ 20:31
Stefan Buschmann <sbusch@pixellight.org> writes:

> However, now I'm struggling the first time as I'm trying to implement 
> the push/pull-logic. I'm using the git repository only as some kind of 
> database, so I have my own logic of how and what kind of objects are 
> stored in it, that differs from that of a git-repository. To be more 
> precise, I currently save blobs only, no trees and no revisions. Now I 
> want to be able to pull and push some of these objects to or from a
> remote.

You can't. The protocol git speaks negotiates on the base of commits, as
that's the point of the synchronization, to push or fetch history, which
is made out of commits.

>
> Unfortunately, it seems that the remote-logic inherently assumes a 
> git-repository layout, and I could not find a way around it. What I need 
> would be a low-level remote interface that allows me to connect to a 
> remote, ask what objects are there and push or pull an arbitrary list of 
> objects/references, without any magic that tries to walk revisions or 
> find a HEAD revision etc.

This is not possible. If you want to do this, you'll have to implement
your own negotiation protocol. The Library isn't going to help you here.

But I don't see how this gives you any advantage over something like
leveldb, which is a purpose-built key-value database. Why would libgit2
be a better database than that?

   cmn

Re: [libgit2] Low level remote API

From:
Stefan Buschmann
Date:
2013-01-05 @ 21:22
Hi!

Thanks for your answer.

Am 05.01.2013 21:31, schrieb Carlos Martin Nieto:
> You can't. The protocol git speaks negotiates on the base of commits, 
> as that's the point of the synchronization, to push or fetch history, 
> which is made out of commits. 
I see.

> This is not possible. If you want to do this, you'll have to implement 
> your own negotiation protocol. The Library isn't going to help you here.
I unterstand that. However, it would be nice if the library would allow 
me to do exactly that: Implement my own negotiation logic on top of the 
low-level network code, that takes care of sending the pack-file over, 
extracting the files in it etc. In the end, it's only a pack file, that 
is sent and extracted, right? So if I could only change the negotiating 
part to decide what objects are sent, but still use the 
network/pack/unpack-part of the library, I would not have to reimplement 
all the network stuff. This should be possible by strongly separating 
the git synchronization logic from the low-level network code. But I 
understand if that is not the purpose of libgit2.

> But I don't see how this gives you any advantage over something like
> leveldb, which is a purpose-built key-value database. Why would libgit2
> be a better database than that?
I am trying not to reinvent the wheel if possible, and libgit2 does 
exactly what I need for the most part: It can store strings and files 
into the database, compute a checksum over it and store the data with 
that key. This fits very nicely for storing and synchronizing data. 
Also, I thought I could reuse the network part, since git already has 
methods for sending and retrieving files over ssh and http(s). The only 
thing I would need to change would be what part of the file system is 
retrieved/sent.

I have only just looked into levelDB, thanks for the suggestion. 
However, I'm not sure if this would be a good choice, e.g., if it is 
meant to store binary data at all, also there seems to be no 
synchronization part etc.

So, most probably I will implement something on my own, which will be 
very similar to git/libgit2 storage regarding the file storing/hashing 
part and add my own network protocol using either ssh or http(s) on top 
of that.

Stefan

Re: [libgit2] Low level remote API

From:
Carlos Martin Nieto
Date:
2013-01-05 @ 23:23
Stefan Buschmann <sbusch@pixellight.org> writes:

> Hi!
>
> Thanks for your answer.
>
> Am 05.01.2013 21:31, schrieb Carlos Martin Nieto:
>> You can't. The protocol git speaks negotiates on the base of commits, 
>> as that's the point of the synchronization, to push or fetch history, 
>> which is made out of commits. 
> I see.
>
>> This is not possible. If you want to do this, you'll have to implement 
>> your own negotiation protocol. The Library isn't going to help you here.
> I unterstand that. However, it would be nice if the library would allow 
> me to do exactly that: Implement my own negotiation logic on top of the 
> low-level network code, that takes care of sending the pack-file over, 
> extracting the files in it etc. In the end, it's only a pack file, that 
> is sent and extracted, right? So if I could only change the negotiating 
> part to decide what objects are sent, but still use the 
> network/pack/unpack-part of the library, I would not have to reimplement 
> all the network stuff. This should be possible by strongly separating 
> the git synchronization logic from the low-level network code. But I 
> understand if that is not the purpose of libgit2.

So you want libgit2 to expose its abstraction layer for the network? You
can do it the version you ship with your code, but that's not something
we can do, and you're bound to be better served by a library that does
multiplatform network operations like libcurl.

The packing code is a public API and the push code uses that so there's
nothing to change there.

>
>> But I don't see how this gives you any advantage over something like
>> leveldb, which is a purpose-built key-value database. Why would libgit2
>> be a better database than that?
> I am trying not to reinvent the wheel if possible, and libgit2 does 
> exactly what I need for the most part: It can store strings and files 
> into the database, compute a checksum over it and store the data with 
> that key. This fits very nicely for storing and synchronizing data.

This is what key-value databases like leveldb do.
 
> Also, I thought I could reuse the network part, since git already has 
> methods for sending and retrieving files over ssh and http(s). The only 
> thing I would need to change would be what part of the file system is 
> retrieved/sent.

It doesn't. The Library has an abstraction layer that tunnels
connections through raw TCP or HTTP in the way that the git protocol
needs it to be done. As you want to negotiate directly about which
objects each end have without the information that's derived from the
commits and its trees, it's unlikely it will be as useful as you seem to
think.

>
> I have only just looked into levelDB, thanks for the suggestion. 
> However, I'm not sure if this would be a good choice, e.g., if it is 
> meant to store binary data at all, also there seems to be no 
> synchronization part etc.

Synchronization is something that's generally application-specific, a
local database like leveldb won't have it, because it has no idea what
the semantics of your application are.

Are you aware of riak? It's a key-value database which comes with the built-in
ability to synchronize its contents over multiple nodes.

   cmn

Re: [libgit2] Low level remote API

From:
Stefan Buschmann
Date:
2013-01-05 @ 23:41
Thank you very much again!

After receiving your first reply, I already started implementing my 
synchronization layer using libcurl, which should be all I need. Thanks 
again for your explanations, that has been very helpful.

And keep up the good work on libgit2. I'm still excited about having a 
linkable git library :)

Stefan