I just uploaded a new release of Pcompress with a load of fixes and performance tweaks. You can see the download and some details of the changes here: https://code.google.com/p/pcompress/downloads/detail?name=pcompress-2.1.tar.bz2&can=2&q=
A couple of the key things are improvement in Global Dedupe accuracy and ability to set the dedupe block hash independent of the data verification hash. From a conservative viewpoint the default block hash is set to the proven SHA256. This however can be changed via an environment variable called ‘PCOMPRESS_CHUNK_HASH_GLOBAL’. SKEIN is one of the alternatives supported for this. SKEIN is a solid NIST SHA3 finalist with good amount of cryptanalysis done and no practical weakness found. It is also faster than SHA256. These choices give a massive margin of safety against random hash collisions and unexpected data corruptions considering that other commercial and open-source dedupe offerings tend to use weaker options like SHA1(Collision attack found, see below), Tiger24 or even the non-cryptographic Murmur3-128! All this for the sake of performance. Albeit some of them did not have too many choices at the time development started on those products. In addition even with a collision attack it is still impractical to get a working exploit for a dedupe storage engine that uses SHA1 like say Data Domain, and corrupt stored data.
The Segmented Global Dedupe algorithm used for scalability now gives around 95% of the data reduction efficiency of simple full chunk index based dedupe.
- Chinese researchers crack major U.S. government algorithm used in digital signatures (rinadhopeshwarkar59.wordpress.com)