Tag Archives: SKEIN

Pcompress 2.1 released with fixes and performance enhancements

I just uploaded a new release of Pcompress with a load of fixes and performance tweaks. You can see the download and some details of the changes here: https://code.google.com/p/pcompress/downloads/detail?name=pcompress-2.1.tar.bz2&can=2&q=

A couple of the key things are improvement in Global Dedupe accuracy and ability to set the dedupe block hash independent of the data verification hash. From a conservative viewpoint the default block hash is set to the proven SHA256. This however can be changed via an environment variable called ‘PCOMPRESS_CHUNK_HASH_GLOBAL’. SKEIN is one of the alternatives supported for this. SKEIN is a solid NIST SHA3 finalist with good amount of cryptanalysis done and no practical weakness found. It is also faster than SHA256. These choices give a massive margin of safety against random hash collisions and unexpected data corruptions considering that other commercial and open-source dedupe offerings tend to use weaker options like SHA1(Collision attack found, see below), Tiger24 or even the non-cryptographic Murmur3-128! All this for the sake of performance. Albeit some of them did not have too many choices at the time development started on those products. In addition even with a collision attack it is still impractical to get a working exploit for a dedupe storage engine that uses SHA1 like say Data Domain, and corrupt stored data.

The Segmented Global Dedupe algorithm used for scalability now gives around 95% of the data reduction efficiency of simple full chunk index based dedupe.

Pcompress 1.2 Released

Pcompress 1.2 is now available and can be grabbed from here: https://code.google.com/p/pcompress/downloads/list. This is a major release containing a large number of bug fixes and improvements. There are performance and stability improvements, code cleanup, resolution of corner cases etc. The biggest new additions to this release are the new Delta2 Encoding and support Keccak message digest. Keccak has been announced the NIST SHA3 standard secure hash. The SIMD (SSE) optimized variant of Keccak runs faster than SHA256 on x86 platforms. However it is still slower than SKEIN so SKEIN remains the default hash algorithm for data integrity verification. In addition Deduplication is now significantly faster.

Delta2 Encoding as I had mentioned in a previous post probes for embedded tables of numeric sequences in the data and encodes them by collapsing the arithmetic sequence into it’s parameters: starting value, increment/decrement, number of terms. This generally provides benefits across different kinds of data and can be combined with LZP preprocessing to enable the final compression algorithm to achieve the maximum compression ratio beyond what it can normally achieve. This encoding works very fast and still manages to detect a good amount of numeric sequences if they are present in the data.

I have extended the statistics mode to display additional data including throughput figures. Here is an output from a compression run of the silesia corpus test data set:

## CPU: Core i5 430M
## RAM: 8GB
## Compiler: Gcc 4.7.2
##
## Compression
##
./pcompress -D -c lz4 -l1 -L -P -s200m silesia.tar 
Scaling to 1 thread
Checksum computed at 241.321 MB/s
Original size: 206612480, blknum: 46913
Number of maxlen blocks: 0
Total Hashtable bucket collisions: 17225
Merge count: 46750
Deduped size: 206197375, blknum: 242, delta_calls: 0, delta_fails: 0
Chunking speed 112.189 MB/s, Overall Dedupe speed 100.880 MB/s
LZP: Insize: 206196371, Outsize: 192127556
LZP: Processed at 55.839 MB/s
DELTA2: srclen: 192127556, dstlen: 191899643
DELTA2: header overhead: 50800
DELTA2: Processed at 382.530 MB/s
Chunk compression speed 207.908 MB/s
##
## Decompression
##
./pcompress -d silesia.tar.pz silesia.tar.1 
Scaling to 4 threads
Chunk decompression speed 383.488 MB/s
DELTA2: Decoded at 3030.724 MB/s
Checksum computed at 244.235 MB/sls -l silesia.tar.pz 
-rw-rw-r--. 1 testuser testuser 99115899 Jan  5 21:36 silesia.tar.pz

Note that these are single-threaded performance figures. The entire file is being compressed in a single chunk. The default checksum is SKEIN. Look at the decoding speed of the Delta2 implementation. It is close to 3GB/s rate. Next lets check the performance of SSE optimized Keccak:

./pcompress -D -c lz4 -l1 -P -S KECCAK256 -s200m silesia.tar 
Scaling to 1 thread
Checksum computed at 143.904 MB/s
Original size: 206612480, blknum: 46913
Number of maxlen blocks: 0
Total Hashtable bucket collisions: 17225
Merge count: 46750
Deduped size: 206197375, blknum: 242, delta_calls: 0, delta_fails: 0
Chunking speed 111.601 MB/s, Overall Dedupe speed 100.352 MB/s
DELTA2: srclen: 206196371, dstlen: 201217172
DELTA2: header overhead: 570448
DELTA2: Processed at 360.383 MB/s
Chunk compression speed 213.226 MB/sls -l silesia.tar.pz 
-rw-rw-r--. 1 testuser testuser 100204566 Jan  5 21:34 silesia.tar.pz

This time I left out LZP to show the reduction using Delta2 alone. As you can see combining LZP and Delta2 gives the greatest reduction. Also see how much slower Keccak is compared to SKEIN. Note that I am using optimized 64-bit assembly implementation of Skein but it does not use SSE whereas Keccak uses SSE.

Next lets have a look at a dataset that has lots of embedded numeric data. I used a Global Topographic Elevation map data from USGS:

#
# Compression
#
./pcompress -c lz4 -l1 -P -s100m e020n40.tar 
Scaling to 1 thread
Checksum computed at 237.584 MB/s
DELTA2: srclen: 86599680, dstlen: 43707440
DELTA2: header overhead: 2024320
DELTA2: Processed at 279.484 MB/s
Chunk compression speed 211.112 MB/s

ls -l e020n40.tar.pz 
-rw-rw-r--. 1 testuser testuser 35360062 Jan  5 21:46 e020n40.tar.pz
#
# Decompression
#
./pcompress -d e020n40.tar.pz e020n40.tar.1 
Scaling to 4 threads
Chunk decompression speed 622.394 MB/s
DELTA2: Decoded at 1971.282 MB/s
Checksum computed at 246.015 MB/s

This time I left out dedupe and LZP. The Delta2 benefits are a lot more since there is a lot of numeric data. Also because there is a lot more Delta spans in the encoded dataset the decoding speed is also lesser. However it still decodes at 1.9 GB/s.

As can be seen Delta2 performance is on par with LZ4 can be used to improve LZ4 results with very little overhead.

Pcompress updated to 1.1

I just put out a new release that implements proper HMAC computation as described in my previous post. I added a bunch of new tests that test error conditions and error handling for out of range parameters and corrupted archives. These tests caused problems to show up that went unnoticed earlier. These were mostly improper error handling situations. As you see testing may be boring but is absolutely necessary.

I also optimized the use of Zlib by switching to raw Zlib streams. Normal Zlib puts a header and computes an Adler32 checksum among other things. However Pcompress adds its own headers and computes cryptographic checksums like SHA and SKEIN which are light years ahead of the simple Adler32 in terms of data integrity verification. So the Adler32 computation overhead was unnecessary.

I had also tweaked the rolling hash used for deduplication while doing the chunking analysis. This produces better results and improved identity dedupe reduction. Finally I also added header checksums in plain non-crypto mode.