However the hash is good for many use cases as compared to the moth-eaten MD5 and even SHA1. The users section needs to have more entries. The Gluster project has been looking at bit rot detection which requires computing fast cryptographic hashes for files/fragments of files. BLAKE2 is ideal for the purpose and I submitted this bugzilla entry a while back. SHA256 is just too slow for that task. While Intel’s optimized SHA256 code is fast, BLAKE2-256 is a lot faster.
Pcompress provides 2 cryptographic hashes from the SHA2 family, namely SHA512-256 and SHA512. The core SHA512 block function is the implementation from Intel: http://edc.intel.com/Download.aspx?id=6548
Intel’s implementation provides two heavily optimized versions. One for SSE4 and one for AVX. Intel only provided the core compression function. I had to add supporting code from other sources to get a complete hash implementation including padding and IVs for 512-bit and the 256-bit truncation. Since I am bothered only with 64-bit CPUs using SHA512-256 is the most optimal choice. SHA512 is much faster that native SHA256 on 64-bit CPUs. I did some benchmarks using the SMHasher suite to check how this implementation fares. SMHasher is primarily designed to test various qualities of non-cryptographic hash functions, however it’s benchmarking implementation is good and I used just that part. I had to modify SMhasher to add all the various SHA2 implementations and a tweak to support 512-bit hashes.
I only tested the SSE4 version since i do not currently have an AVX capable CPU. SMHasher shows bytes/cycle and I just did the reciprocal to get cycles/byte. All this was done on my laptop which has a Core i5 430M, 2.27 GHz CPU (non sandy bridge). The OpenSSL version used is 1.0.1c from Linux Mint 14. I also used GCC 4.7.2 with -O3 and -ftree-vectorize flags. The results are shown below.
Clearly Intel’s SSE4 optimized version is superior than the rest on x64 platforms with OpenSSL not too far behind. The AVX version should give even better results. The other result shows cycles/hash for tiny 31-byte buffers.