I just completed adding support for AES encryption in Pcompress – whew! On the surface it is simple, just encrypt and decrypt using a password provided by the user. However there are a myriad of security pieces around this that make actual implementations lengthy and involved. First and foremost password based encryption requires a symmetric encryption key to be generated from the user password. This step is fraught with problems. We need to make dictionary attacks reasonably hard even if the user provides a weak password. There is a NIST standard for this called PBKDF2 – Password Based Key Derivation Function 2. However given modern distributed computing techniques, botnets, GPGPUs etc it is still possible to do brute force dictionary attacks practically. The online cloud back service Tarsnap provides a unique algorithm called Scrypt that attempts to make this hard enough to be impractical due to high resource requirements for the key derivation process.
Using Scrypt is just one step in the process. One also needs to generate a random salt value as input to the key derivation function. One ideally needs to get high-quality random bytes from the operating system’s entropy pool. This can be done using the RAND_bytes() function in OpenSSL. However entropy may not always be available immediately. So if OpenSSL returns a not ready status then we need to use other alternatives. So second good quality option is to use “/dev/urandom“. This will be available on Linux and other Unixes. If for some reason this fails as well then we need another lower quality but reasonable fallback than the simple pseudorandom rand() function in the standard library. I looked around and picked up ideas from the approach used by PHP. The PHP idea also uses a Mersenne Twister which I will add in the future. All this results in a unique key being generated every time Pcompress is invoked even when using the same password. Decryption of course recovers the key used to encrypt the file.
Inputting passwords is another piece. OpenSSL has some old UI compat functions to do this but they seem to be artifacts retained only for backward compatibility. So I decided to roll my own based on what is being done by Python’s getpass module. Passwords can also be input via a file. This needs to be a writable temp file since Pcompress zeroes out the file after reading the password from it. The next thing is to generate an nonce value. Tarsnap uses sequential nonces that appear to start at 0. I wanted to use sequential nonces but starting at something other than 0. The nonce is 64-bit of which the top 32 bits can be selected at runtime and bottom 32 bits can be zero, incrementing by chunk number. I am using the salt value, monotonic clock value passed through PBKDF2 to get a 256-bit quantity. This I then pass through CRC64 to get a 64-bit quantity from which the top 32 bits are extracted.
Finally the salt and starting nonce are stored in the compressed file in clear. This means that the file format has changed a bit and is now version 4. Encryption is performed on the compressed data as compression takes out redundancies and makes for stronger encryption. It is also faster. In addition temporary values on the stack and in memory are cleared out with zeroes at every stage.
However after all these I do not yet have a per-chunk HMAC to verify the encrypted data has not been tampered with (In addition to the plaintext message digest for integrity). I plan to add it in the next few days.