I have added the second set of benchmarks that demonstrate the effect of the different pre-processing options on compression ratio and speed. The results are available here: http://moinakg.github.io/pcompress/results2.html
All of these results have Global Dedupe enabled. These results also compare the effect of various compression algorithms on two completely different datasets. One is a set of VMDK files and another purely textual data. Some observations below:
- In virtually all the cases using ‘-L’ and ‘-P’ switches results in the smallest file. Only in case of LZMA these options marginally deteriorate the compression ratio indicating that the reduction of redundancy is hurting LZMA. To identify which of the two hurts more I repeated the command (see the terminology in results page) with lzmaMt algo and only option ‘-L’ at compression level 6 on the CentOS vmdk tarball. The resultant size came to: 472314917. The size got from running with only option ‘-P’ is available in the results page: 469153825. Thus it is the LZP preprocessing that unsettles LZMA the most along with segment size of 64MB. Delta2 actually helps. Running the command with segment size of 256MB we see the following results – ‘-L’ and ‘-P’: 467946789, ‘-P’ only: 466076733, ‘-L’ only: . Once again Delta2 helps. At higher compression however, Delta2 is marginally worse as well.
- There is some interesting behavior with respect to the PPMD algorithm. The time graph (red line) shows a relative spike for the CentOS graphs as compared to the Linux source tarball graphs. PPMD is an algorithm primarily suited for textual data so using it on non-textual data provides good compression but takes more time.
- Both Libbsc and PPMD are especially good on the textual Linux source tar and are comparable to LZMA results while only taking a fraction of the time taken by LZMA. Especially Libbsc really rocks by producing better compression and being much faster as compared to LZMA. However i have seen decompression time with Libbsc to be quite high as compared to PPMD.
- Updated Compression Benchmarks (moinakg.wordpress.com)