I spent the last few days running some benchmarks to see how Pcompress in its current incarnation fares in terms of compression performance and effectiveness. This was needed given the extent of changes that have happened. I have tested only the compression performance here. I still need to check out how decompression and crypto perform.
My test system is my laptop so the absolute numbers do not make much sense. It is only the relative numbers comparing different utilities and algorithms that are interesting. My laptop config: Intel Core i5 M430 @ 2.27 GHz (2011 model), 8GB RAM, 5400 RPM HDD (Hitachi HTS725050A9A364). It is a HP Pavilion Dv6 laptop dating back to Oct 2011. My desktop is a 2006 model Athlon 64 X2 with 4GB RAM so I decided to test on the more capable laptop. I used 4 different data sets: linux 3.6 RC2 tarball, the enwik9 corpus, glibc 12.1 tarball, a tarball containing CentOS 6.0 and 6.2 vmdk images. Except for the VMware images these are mostly different kinds of text since this is a traditional compression test.
Except where mentioned all the pcompress tests used a 256MB compression chunk size. Files bigger than 256MB were compressed using multiple threads. Only the Glibc tarball is 102MB in size and a single pcompress thread was used. Pcompress detects if the segment size is more than the file size and reduces it to fit the whole file in a single segment. The actual commands that were executed are listed in the leftmost graph column. The magenta bar shows original and compressed sizes in megabytes while the pink lines denote time taken in seconds.
Pcompress version used was 0.9.1 standard build. The standard utilities were used as distributed with Fedora 16. Some may have reservations that same compiler flags were not used to rebuild everything. In that case use some salt. I will rerun the standard utility tests with custom builds when I get time in the future. I have also missed out other parallel tools like Pigz and Pbzip2. I will post those comparisons shortly for the Linux tarball.
Finally except where mentioned Pcompress was always run with Chunk-level Deduplication (-D) and LZ-Prediction (-L) enabled. Identity Deduplication always adds value. The VMDK image dataset includes comparison for compression ratio with and without Deduplication.
Linux 3.6 Git clone Tarball
Of course the Lzma algorithm implementations give the best compression ratio but they also take more time. As usual the Xz utility performs quite badly and I keep wondering why would anyone use that in practice. I wonder if the XZ-Utils folks are considering performance aspects at all. The parallelized Pxz is a lot better, though I missed out running Pxz in extreme mode in this test. I will fill this gap shortly.
What is most interesting however is to note the performance and compression ratio achieved by Pcompress. Even with extreme settings parallel Pcompress using lzmaMt (lzma multi-thread) is faster than non-extreme parallel Pxz. Part of the speed comes from the low-level optimizations (like SSE) I have done to the lzma implementation. The original 7-Zip utility performs quite badly with extreme settings(-mx9 -m0fb=128 -m0d=256M) since it is not fully parallelized. The PPMd algorithm in Pcompress is by far the rock star in this test. It gives the best compression ratio while still having awesome speed. PPMd works excellent for text files. The LZFX algorithm is blindingly fast with decent compression ratio.
Pcompress is run with Deduplication (-D) and with and without Delta Compression (-E). Delta compression at this scale does not seem to add much value.
Glibc 2.12.1 Release Tarball
This test mostly shows single-threaded behavior. Since the glibc tarball is about 102MB in size and given the Pcompress compression chunk size of 256MB, it fits into a single chunk. So for the Pcompress tests only lzmaMt is multithreaded. That actually uses multiple threads for the match-finder. As you will notice, extreme level Pcompress with lzmaMt is significantly faster than both Pxz and 7-zip with extreme settings (-mx9 -m0fb=128 -m0d=256M).
Once again Delta compression at this scale did not add much value.
The Enwik9 Corpus
This test uses the enwik9 (English Wikipedia initial segment, 2009) from http://mattmahoney.net/dc/textdata.html. This is XML text data. Interestingly with a 256M chunk size Libbsc (BWT with Quantized Local Frequency Coding and dual LZ-Prediction) gives the best results on XML data even compared to Lzma and PPMd.
So I wanted to see how much better libbsc will work if I put the entire enwik9 file into a single chunk of 1GB size. So ran ‘pcompress -D -c libbsc -l 14 -L -s 1g’ to check. Indeed this produces a smaller 156.3MB(163870946 bytes) file taking 233.8 seconds. For reference the older Enwik8 corpus compresses to 20785844 bytes using the same command. This puts us at the 15th position on Matt Mahoney’s Large Text Compression Benchmark slightly above the original BSC. I have not made any changes to libbsc. Pcompress just links to it as an external dependency. However I have copied the LZP portion and tweaked it a bit. I believe that this result can be improved by tweaking the Rolling Hash function to get actual dedupe out of XML data. At the moment deduplication (-D) is having no impact on Enwik9 as it is not finding any duplicates. You can build Pcompress from source configured with the ‘–enable-debug-stats’ option to check.
Tarball of CentOS 6 and CentOS 6.2 VMDK Images
For this test I did not try all the variations and specifically left out Lzma. For such large multi-gigabyte there are very few practical use cases where one will spend hours using slow high-compression algorithms with lower ROI. The objective rather will be to balance compression effectiveness with practical speed.
The distinct benefits of parallelism are clearly visible from these results. With multi-gigabyte data using multiple threads far outstrips the traditional utilities. Deduplication also ensures that we get smaller compressed files. Look at the standalone bzip2 result and the pcompress bzip2 result with Deduplication. Since this is in-chunk dedupe and not global dedupe the effect of using larger chunks are also shown.
These results are quite interesting and puts Pcompress among the tools at the top of the heap of general data compression utilities with Pigz, Pbzip2 etc. However it also provides stronger data integrity checks using SKEIN 256/512 and SHA 256/512 checksums while still maintaining the performance.