librelist archives

« back to archive

Compression News

Compression News

From:
Thomas Waldmann
Date:
2015-03-28 @ 07:51
A while ago, I added gzip (other levels than 6) and lzma compression to the
code in merge-all branch (mostly because both are easy to get from python
stdlib and zlib level 1 is even somehow fast).

(see https://github.com/attic/merge/tree/merge-all for that branch)

But somehow, I wasn't really satisfied with compression: it's often slow,
burning down one CPU core, while leaving all other cores idle.

So I searched for better ways and found blosc, see http://www.blosc.org/ .

And they don't promise too much, it is really fast and was easy to
integrate into attic.
They internally use multithreaded workers, so using multiple cores (for
compression) is possible now.

Here are some results:

compression        encryption     time [m:s]  orig    comprd  dedupd
------------------------------------------------------------------------
00 no compression  -              1:05        4.31GB  4.31GB  3.52GB
29 *lz4 level 9    -              1:05        4.31GB  3.79GB  3.17GB
29 *lz4 level 9    aes-gcm/ghash  1:02        4.31GB  3.79GB  3.17GB
69 *blosc zlib l 9 -              1:52        4.31GB  3.75GB  3.13GB
------------------------------------------------------------------------
01 zlib level 1    -              2:28        4.31GB  3.76GB  3.13GB
06 zlib l6 [~0.14] -              2:38        4.31GB  3.74GB  3.12GB
09 zlib level 9    -              3:06        4.31GB  3.74GB  3.12GB
11 lzma level 1    -              11:01       4.31GB  3.75GB  3.13GB
21 *lz4 level 1    -              1:09        4.31GB  3.79GB  3.17GB
25 *lz4 level 5    -              1:10        4.31GB  3.79GB  3.17GB
31 *lz4hc level 1  -              1:23        4.31GB  3.78GB  3.15GB
39 *lz4hc level 9  -              1:29        4.31GB  3.77GB  3.15GB
49 *blosclz lvl 9  -              1:12        4.31GB  3.84GB  3.21GB
59 *snappy level 9 -              1:15        4.31GB  4.27GB  3.49GB
-----------------------------------------------------------------------

Notes:
- Timing tolerances might be a fews secs due to caching / multitasking.
- Most impressive results are in first section.
- * = new blosc-library based stuff, multi-threaded, highly optimized.
- Laptop, SSD src, SSD dst, Intel i5-4200u, 8GB RAM, AES-NI, PCMUL.
- Data wasn't compressible too well, likely due to MP3 audio books.

Especially the 3rd entry is impressive, it basically means that the
influence of compression and encryption was negligible (fews secs, inside
timing tolerance) in this setup.

attic 0.14 only offers zlib level 6 compression (by default), so it looks
like we just got a 2.5x speedup compared to that while still having a
reasonable lz4 level 9 compression. \o/

Cheers,

Thomas

Re: [attic] Compression News

From:
Yuri D'Elia
Date:
2015-03-30 @ 10:18
On 03/28/2015 08:51 AM, Thomas Waldmann wrote:
> Here are some results:
> 
> compression        encryption     time [m:s]  orig    comprd  dedupd
> ------------------------------------------------------------------------
> 00 no compression  -              1:05        4.31GB  4.31GB  3.52GB
> 29 *lz4 level 9    -              1:05        4.31GB  3.79GB  3.17GB
> 29 *lz4 level 9    aes-gcm/ghash  1:02        4.31GB  3.79GB  3.17GB
> 69 *blosc zlib l 9 -              1:52        4.31GB  3.75GB  3.13GB

It would be nice to try using lz4 directly
(https://pypi.python.org/pypi/lz4/) which depends only on liblz4 which
is also multi-core friendly and much more likely to be already installed.

lz4 is known to be one of the fastest, reasonable compressors around. I
actually migrated my projects from lzo a year ago.

Re: [attic] Compression News

From:
Thomas Waldmann
Date:
2015-03-31 @ 02:10
Hi Yuri,

> It would be nice to try using lz4 directly
> (https://pypi.python.org/pypi/lz4/) which depends only on liblz4
> which is also multi-core friendly

What does "multi-core friendly" mean and where is that documented?

> and much more likely to be already installed.

Yeah, that might be true. I never heard of blosc before.

But I liked its speed, that it gives some different compressor options and
that
it automatically dispatches to internal worker threads.

Cheers, Thomas