I decided to pull another release of Pcompress primarily due to some bugfixes that went in. One of them is a build issue on Debian6 and non-SSE4 processor and the others are a couple of crashes with invalid input.
In addition to fixing stuff I have re-wrote the Min-Heap code and took out all the Python derived stuff. It is now much simpler and much faster than before. While doing this re-write I found and fixed a problem with the earlier Min-Heap approach. Thus Delta Differencing is now faster and more accurate than before.
I also improved the scalable Segmented Global Dedupe and it now works with greater than 95% efficiency in finding duplicate chunks. it appears that using larger segments for larger dedupe block sizes results in better accuracy. If you come to think of it this is also logical since one would want faster processing with smaller indexes when using larger and larger dedupe blocks. Corresponding larger segments enable just that.