librelist archives

« back to archive

Performance comparison question

Performance comparison question

From:
Stephen
Date:
2015-01-24 @ 10:39
Good afternoon list,

I'm just wondering if there's a difference in performance between
pushing/pulling backups. What I mean by this is, I have a Mini-ITX box
on my home connection which acts as my backup target (i.e. all attic
backups are sent here from my various servers on the internet). At the
moment, I've got each server out on the web backing up to an SSH
target, which points to my home server.

My theory is that, in doing this, all the deduplication is done
locally before being sent out over the web, thereby saving bandwidth +
time, etc. However, I was wondering if this is actually the case, or
if I would experience similar performance by just using my home server
to "pull" from each of my servers out on the internet. I.e. that
server SSH's into each of them and pull's the info that way. I'd much
rather do it this way if the performance is similar, as I'm much more
comfortable storing passwordless SSH auth keys in my home network than
I am keeping them on an Internet-exposed box.

I hope my question makes sense. Cheers guys!

-Stephen

Re: [attic] Performance comparison question

From:
Thiago Coutinho
Date:
2015-01-24 @ 10:43
That's a good question, I have this thought too some time ago. It
would be nice to have local deduplication on both cases.

2015-01-24 8:39 GMT-02:00 Stephen <steve@tuxcon.com>:
> Good afternoon list,
>
> I'm just wondering if there's a difference in performance between
> pushing/pulling backups. What I mean by this is, I have a Mini-ITX box
> on my home connection which acts as my backup target (i.e. all attic
> backups are sent here from my various servers on the internet). At the
> moment, I've got each server out on the web backing up to an SSH
> target, which points to my home server.
>
> My theory is that, in doing this, all the deduplication is done
> locally before being sent out over the web, thereby saving bandwidth +
> time, etc. However, I was wondering if this is actually the case, or
> if I would experience similar performance by just using my home server
> to "pull" from each of my servers out on the internet. I.e. that
> server SSH's into each of them and pull's the info that way. I'd much
> rather do it this way if the performance is similar, as I'm much more
> comfortable storing passwordless SSH auth keys in my home network than
> I am keeping them on an Internet-exposed box.
>
> I hope my question makes sense. Cheers guys!
>
> -Stephen



-- 
Thiago Coutinho

"O povo não deveria temer o governo. O governo é quem deveria temer o povo."
V de Vingança

Re: [attic] Performance comparison question

From:
Date:
2015-01-26 @ 12:23
> My theory is that, in doing this, all the deduplication is done
> locally before being sent out over the web, thereby saving bandwidth +
> time, etc. However, I was wondering if this is actually the case, or
> if I would experience similar performance by just using my home server
> to "pull" from each of my servers out on the internet. I.e. that
> server SSH's into each of them and pull's the info that way. I'd much
> rather do it this way if the performance is similar, as I'm much more
> comfortable storing passwordless SSH auth keys in my home network than
> I am keeping them on an Internet-exposed box.
> 
> I hope my question makes sense. Cheers guys!

It does, but without an agent on the machines that do parsing the FS and 
checking files, your performance won't be too good. 

At the moment, "attic serve" is only able to manage repository data, not 
perform the backup itself (unlike rsync, which can go both ways). That 
means the "client" attic has to have access to the filesystem. 

Your workaround could be:
* Mount the remote filesystem with sshfs or something similar
* backup that mountpoint with a local attic.

But this will have probably really bad performance because:
* The complete directory tree has to be walked via the remote connection, 
which generally is a lot slower than doing it locally.
* FUSE (which sshfs is based on) and NFS and many other network 
filesystems don't keep consistent inode-numbers. And if attic sees a 
changed inode, it will assume the file has changed and read it completely 
to look for changes. You might be able to avoid that by choosing your 
mount options carefully.

I don't know if - in the case of an updated file - attic sends the whole 
file to atticserve or just the changes (like rsync does). I assume the 
latter, because it's the right thing to do. In that case, your performance 
loss by doing it via remote-fs will be even worse :)


Best Regards
 Heiko

Re: [attic] Performance comparison question

From:
Stephen
Date:
2015-01-27 @ 02:32
Thanks for the reply. Am I reading your message right, in that you
seem to suggest if I had the attic server component installed on one
of my remote servers (i.e. listening) that it would do the work
locally, before sending just the changes, and therefore having decent
performance? I understood the rest of your reply entirely, but just
wasn't clear on this section.

I appreciate the time you've taken :)

On Mon, 26 Jan 2015, heiko.helmle@horiba.com wrote:

> > My theory is that, in doing this, all the deduplication is done
> > locally before being sent out over the web, thereby saving bandwidth +
> > time, etc. However, I was wondering if this is actually the case, or
> > if I would experience similar performance by just using my home server
> > to "pull" from each of my servers out on the internet. I.e. that
> > server SSH's into each of them and pull's the info that way. I'd much
> > rather do it this way if the performance is similar, as I'm much more
> > comfortable storing passwordless SSH auth keys in my home network than
> > I am keeping them on an Internet-exposed box.
> > 
> > I hope my question makes sense. Cheers guys!
> 
> It does, but without an agent on the machines that do parsing the FS and 
> checking files, your performance won't be too good. 
> 
> At the moment, "attic serve" is only able to manage repository data, not 
> perform the backup itself (unlike rsync, which can go both ways). That 
> means the "client" attic has to have access to the filesystem. 
> 
> Your workaround could be:
> * Mount the remote filesystem with sshfs or something similar
> * backup that mountpoint with a local attic.
> 
> But this will have probably really bad performance because:
> * The complete directory tree has to be walked via the remote connection, 
> which generally is a lot slower than doing it locally.
> * FUSE (which sshfs is based on) and NFS and many other network 
> filesystems don't keep consistent inode-numbers. And if attic sees a 
> changed inode, it will assume the file has changed and read it completely 
> to look for changes. You might be able to avoid that by choosing your 
> mount options carefully.
> 
> I don't know if - in the case of an updated file - attic sends the whole 
> file to atticserve or just the changes (like rsync does). I assume the 
> latter, because it's the right thing to do. In that case, your performance 
> loss by doing it via remote-fs will be even worse :)
> 
> 
> Best Regards
>  Heiko

Re: [attic] Performance comparison question

From:
Date:
2015-01-27 @ 07:13
> 
> Thanks for the reply. Am I reading your message right, in that you
> seem to suggest if I had the attic server component installed on one
> of my remote servers (i.e. listening) that it would do the work
> locally, before sending just the changes, and therefore having decent
> performance? I understood the rest of your reply entirely, but just
> wasn't clear on this section.

No - I assume you use attic serve on the home machine (where the attic 
repo is). attic calls attic serve automatically if you specify a remote 
repository on the command line.

So in general you can't pull a backup with attic alone - you would need to 
mount the fs remotely, which doesn't make sense for performance.

Though I'm more a fan of pushing backups than pulling. It feels more 
secure.
For pulling backups you need remote root access (you can do sudo magic, 
but it's still root from remote). Pushing backups means local root (called 
from cron probably) and only local (and restricted) user access on the 
repo side.

Best Regards
 Heiko

Re: [attic] Performance comparison question

From:
Stephen
Date:
2015-01-27 @ 10:21
Ah, thanks for clarifying. I do have attic on the home machine,
however, I've been doing it as a direct SSHFS target (because I have
home port 22 already in use for a different machine, and I couldn't
find a way to specify a separate port, but still using the attic
software - if you know a way to do this, by the way, it would be much
appreciated).

However, I hadn't got to the point of thinking about filesystem
permissions yet - your point about sudo totally makes sense, and I
would've encountered this eventually. Cheers for pointing this out :)
But if you are aware of a way to specify a separate port when using
attic target (as opposed to being stuck with SSH + port), that'd be
fantastic! Thanks,

-Stephen
On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:

> > 
> > Thanks for the reply. Am I reading your message right, in that you
> > seem to suggest if I had the attic server component installed on one
> > of my remote servers (i.e. listening) that it would do the work
> > locally, before sending just the changes, and therefore having decent
> > performance? I understood the rest of your reply entirely, but just
> > wasn't clear on this section.
> 
> No - I assume you use attic serve on the home machine (where the attic 
> repo is). attic calls attic serve automatically if you specify a remote 
> repository on the command line.
> 
> So in general you can't pull a backup with attic alone - you would need to 
> mount the fs remotely, which doesn't make sense for performance.
> 
> Though I'm more a fan of pushing backups than pulling. It feels more 
> secure.
> For pulling backups you need remote root access (you can do sudo magic, 
> but it's still root from remote). Pushing backups means local root (called 
> from cron probably) and only local (and restricted) user access on the 
> repo side.
> 
> Best Regards
>  Heiko

Re: [attic] Performance comparison question

From:
Date:
2015-01-27 @ 11:13
> 
> However, I hadn't got to the point of thinking about filesystem
> permissions yet - your point about sudo totally makes sense, and I
> would've encountered this eventually. Cheers for pointing this out :)
> But if you are aware of a way to specify a separate port when using
> attic target (as opposed to being stuck with SSH + port), that'd be
> fantastic! Thanks,


Normally you don't need a specific port. Just run your attic repo on a 
special user with only rights for its own $HOME - then setup the public 
key for the clients connecting to it in a way so that they can only run 
attic serve:

so the public keys in $HOME/.ssh/authorized_keys should start like this:
command="attic 
serve",no-agent-forwarding,no-port-forwarding,no-pty,no-X11-forwarding 
ssh-rsa AAAA...<rest of key>

in most cases this is enough and you don't need to have a second ssh port 
running. The users your remote client have keys for can only run "attic 
serve" on the target server.

Re: [attic] Performance comparison question

From:
Stephen
Date:
2015-01-27 @ 12:15
Sorry, what I meant was my incoming 22 at home points to a different
server. So, if I were to run "attic create users@host:/backup/location
etc, it would attemmpt to go to port 22, which goes to an embedded
device that can only run on port 22 (and need to be remotely accessed
on port 22, etc - long story). So, I have to run attic by forwarding
port 2222 externally -> internal backup:22. 

So, keeping that in mind, is there a way to still use the "attic
create user@host:/backup" format, whilst specifying the remote
listening port, rather than having to do an "attic create
ssh://users@host:2222/backup/location" etc? Thanks

On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:

> > 
> > However, I hadn't got to the point of thinking about filesystem
> > permissions yet - your point about sudo totally makes sense, and I
> > would've encountered this eventually. Cheers for pointing this out :)
> > But if you are aware of a way to specify a separate port when using
> > attic target (as opposed to being stuck with SSH + port), that'd be
> > fantastic! Thanks,
> 
> 
> Normally you don't need a specific port. Just run your attic repo on a 
> special user with only rights for its own $HOME - then setup the public 
> key for the clients connecting to it in a way so that they can only run 
> attic serve:
> 
> so the public keys in $HOME/.ssh/authorized_keys should start like this:
> command="attic 
> serve",no-agent-forwarding,no-port-forwarding,no-pty,no-X11-forwarding 
> ssh-rsa AAAA...<rest of key>
> 
> in most cases this is enough and you don't need to have a second ssh port 
> running. The users your remote client have keys for can only run "attic 
> serve" on the target server.
> 

Re: [attic] Performance comparison question

From:
Date:
2015-01-27 @ 12:18
> Sorry, what I meant was my incoming 22 at home points to a different
> server. So, if I were to run "attic create users@host:/backup/location
> etc, it would attemmpt to go to port 22, which goes to an embedded
> device that can only run on port 22 (and need to be remotely accessed
> on port 22, etc - long story). So, I have to run attic by forwarding
> port 2222 externally -> internal backup:22. 
> 

you can specify an alias in the client's ~/.ssh/config and use that

see "man 5 ssh_config" for details.

Re: [attic] Performance comparison question

From:
Stephen
Date:
2015-01-27 @ 12:28
I'll check that out. Much appreciated!

On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:

> > Sorry, what I meant was my incoming 22 at home points to a different
> > server. So, if I were to run "attic create users@host:/backup/location
> > etc, it would attemmpt to go to port 22, which goes to an embedded
> > device that can only run on port 22 (and need to be remotely accessed
> > on port 22, etc - long story). So, I have to run attic by forwarding
> > port 2222 externally -> internal backup:22. 
> > 
> 
> you can specify an alias in the client's ~/.ssh/config and use that
> 
> see "man 5 ssh_config" for details.
> 

Sv: [attic] Performance comparison question

From:
Petter Gunnerud
Date:
2015-01-29 @ 11:18
I use the following syntax:
attic create -s -v 
ssh://user@baksrv:42233/backup/path/srcname.attic::bakname-`date 
+%Y-%m-%d-%H-%M` /src/folder
What took me a while to figure was that you need to specify ssh://, if not
the port is ignored.

     Fra: Stephen <steve@tuxcon.com>
 Til: attic@librelist.com 
 Sendt: Tirsdag, 27. januar 2015 13.28
 Emne: Re: [attic] Performance comparison question
   
I'll check that out. Much appreciated!

On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:

> > Sorry, what I meant was my incoming 22 at home points to a different
> > server. So, if I were to run "attic create users@host:/backup/location
> > etc, it would attemmpt to go to port 22, which goes to an embedded
> > device that can only run on port 22 (and need to be remotely accessed
> > on port 22, etc - long story). So, I have to run attic by forwarding
> > port 2222 externally -> internal backup:22. 
> > 
> 
> you can specify an alias in the client's ~/.ssh/config and use that
> 
> see "man 5 ssh_config" for details.
> 


   

Re: Sv: [attic] Performance comparison question

From:
Stephen
Date:
2015-01-29 @ 12:45
Yeah, this is precisely what I'm doing at the moment - but was just
wondering if there was any downside to this - i.e. does the fact that we're
specifying an ssh filesystem decrease performance when measured
against attic's native communication/transport. Cheers :)
On Thu, 29 Jan 2015, Petter Gunnerud wrote:

> I use the following syntax:
> attic create -s -v 
ssh://user@baksrv:42233/backup/path/srcname.attic::bakname-`date 
+%Y-%m-%d-%H-%M` /src/folder
> What took me a while to figure was that you need to specify ssh://, if 
not the port is ignored.
> 
>      Fra: Stephen <steve@tuxcon.com>
>  Til: attic@librelist.com 
>  Sendt: Tirsdag, 27. januar 2015 13.28
>  Emne: Re: [attic] Performance comparison question
>    
> I'll check that out. Much appreciated!
> 
> On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:
> 
> > > Sorry, what I meant was my incoming 22 at home points to a different
> > > server. So, if I were to run "attic create users@host:/backup/location
> > > etc, it would attemmpt to go to port 22, which goes to an embedded
> > > device that can only run on port 22 (and need to be remotely accessed
> > > on port 22, etc - long story). So, I have to run attic by forwarding
> > > port 2222 externally -> internal backup:22. 
> > > 
> > 
> > you can specify an alias in the client's ~/.ssh/config and use that
> > 
> > see "man 5 ssh_config" for details.
> > 
> 
> 
>    

Sv: Sv: [attic] Performance comparison question

From:
Petter Gunnerud
Date:
2015-01-29 @ 13:59
In my case the backup server is only available through ssh. If specifying 
ssh makes any difference at all it would just make it faster as it would 
say "don't waste time trying something else".
As I understand attic, you are not calling a remote filesystem, but an 
agent running at remote server. If you want the local attic program to 
work directly on a remote filesystem you'll need to mount the remote 
filesystem locally. As long as a filesystem is locally mountet, attic 
doesn't care what kind of mount it is.

      Fra: Stephen <steve@tuxcon.com>
 Til: attic@librelist.com 
 Sendt: Torsdag, 29. januar 2015 13.45
 Emne: Re: Sv: [attic] Performance comparison question
   
Yeah, this is precisely what I'm doing at the moment - but was just
wondering if there was any downside to this - i.e. does the fact that we're
specifying an ssh filesystem decrease performance when measured
against attic's native communication/transport. Cheers :)
On Thu, 29 Jan 2015, Petter Gunnerud wrote:

> I use the following syntax:
> attic create -s -v 
ssh://user@baksrv:42233/backup/path/srcname.attic::bakname-`date 
+%Y-%m-%d-%H-%M` /src/folder
> What took me a while to figure was that you need to specify ssh://, if 
not the port is ignored.
> 
>      Fra: Stephen <steve@tuxcon.com>
>  Til: attic@librelist.com 
>  Sendt: Tirsdag, 27. januar 2015 13.28
>  Emne: Re: [attic] Performance comparison question
>    
> I'll check that out. Much appreciated!
> 
> On Tue, 27 Jan 2015, heiko.helmle@horiba.com wrote:
> 
> > > Sorry, what I meant was my incoming 22 at home points to a different
> > > server. So, if I were to run "attic create users@host:/backup/location
> > > etc, it would attemmpt to go to port 22, which goes to an embedded
> > > device that can only run on port 22 (and need to be remotely accessed
> > > on port 22, etc - long story). So, I have to run attic by forwarding
> > > port 2222 externally -> internal backup:22. 
> > > 
> > 
> > you can specify an alias in the client's ~/.ssh/config and use that
> > 
> > see "man 5 ssh_config" for details.
> > 
> 
> 
>