id0-rsa

Breaking PDF Passwords

Fast hashes are easy to brute force, let's apply this to an actual scheme

Discuss The Problem

Intro

So far you may have encountered a couple of problems (such as here and here) that deal with brute forcing hashed passwords or even hashed passwords with salt. You may have thought to yourself "sure these problems demonstrate that digest algorithms like the SHA and MD family are not suited for storing passwords securely, but they seem contrived". Unfortunately you would be wrong, in fact very widely deployed formats and systems also use these algorithms to secure passwords, which makes them perfect candidates for the types of attacks we mounted in previous hashed password attacks. Let's take a look at one, namely PDF.

PDF Cliff Notes

PDF gives the user the option to encrypt a file with a password, which follows the fairly common procedure of deriving a symmetric key from the password to encrypt the file. The symmetric key derivation goes as follows:

Using a set padding string (28BF4E5E4E758A4164004E56FFFA01082E2E00B6D0683E802F0CA9FE6453697A) pad out the password to 32 bytes. If the password is longer than 32 bytes, truncate the password to 32 bytes.
Concatenate the following values and pass the result to the MD5 hash function:
- The 32 byte padded password
- The 32 byte owner hash
- The 4 byte permissions
- The 16 byte document ID
Do the following 50 times: Take the output of the previous MD5 call and pass it as input to MD5. Take the output of the final MD5 call. Truncate it down to be the same length as the value specified by key length. This is the symmetric key.

When a user enters a password to decrypt a file two things happen. First, an algorithm computes a value known as the user hash, and then (if the derived user hash matches the user hash in the document), the symmetric key is derived from the password and the PDF is decrypted. This might sound fairly standard but we have two advantages. First, the hash derivation uses MD5 and RC4, which are both very fast meaning we can try a lot of passwords in a short amount of time. Second, since PDFs are by nature offline, once we have a PDF there is no throttling mechanism to limit how many passwords we try, hence we are bottlenecked only by our available computing power. Anyway, back to the user hash, here is how to derive it:

Derive the symmetric key from the user password (as described above).
Concatenate the following values and pass the result to the MD5 hash function:
- A 32 byte padding string (as defined above)
- The 16 byte document ID
Encrypt the output of the MD5 call via RC4 with the symmetric key from step 1.
Do the following for i = 1 to 19:
- Create a new RC4 key by XORing every byte of the symmetric key from step 1 with i.
- Take the output of the previous RC4 call and encrypt it under the new RC4 key.
Append 16 bytes of arbitrary padding to the output of the last RC4 call. This is the user hash. (It’s not clear why the user hash is padded at all, since the comparison to validate the user password throws out the padding bytes of the user hash and just looks at the first 16 bytes. For the our purposes just ignore this padding step.)

Problem Details

I'll spare you the pain of parsing PDF metadata (it's not terribly fun) and instead just give you the metadata of the PDF here:

user hash: 0xfe97c8762da0f0201eb463df6e181288
key length: 128
revision: 3
permissions: 0xfcffffff
owner hash: 0x79e64c5fd1d84a4ae2bc07e82636b34b5969c4fe6bcce36886db40dece66f0e2
document id: 0xbfb6e1530b109ae5d10cf52562e8bd3a

The password is once again one of the passwords in the rockyou password list. Your job is to recover the password based on the metadata given above. Submit the answer in ASCII. Feel free to verify your answer against this PDF from which the metadata above was parsed. Kinda scary how easy that was huh?

Test Vector

Given the metadata above, the string "password" should yield symmetric key 0xe7ddc8badf2d9135a038861f44e7d692 and a user hash 0xdd14c7f46c7648d9881105740a001d07.

Extra Credit

If you had to improve this scheme, what would you change?