Intro to Hashing

Hashing is a vital concept in cryptography.


Discuss The Problem

A hash function is one which takes any amount of input and gives a fixed size output. Hash functions have many uses in computing, but here we're going to focus on cryptographic hash functions. In addition to giving a fixed size output for any input, cryptographic hash functions have a couple other important qualities. One is that they're difficult to reverse. That means that given the output of a hash function, it's hard to figure out an input to the hash function that would result in that output. Another is that they're collision resistant, which means that it's hard to find two inputs to the hash function that result in the same output. Because of these qualities, cryptographic hash functions are useful for things like validating passwords. Instead of storing passwords, authentication databases can store values that are derived from passwords using hash functions. This means that anybody that recovers the database will have a hard time figuring out the user's passwords (because the function is difficult to reverse). They're also useful verifying the integrity of data. A large amount of data may be difficult to inspect for tampering but the fixed sized output of a hash function could be reasonable, and since the functions are collision resistant, it would be hard to tamper with the data in a way that would result in the same hash value. Hash values are also used in many places on this site, especially for submitting solutions. Instead of submitting hundreds or thousands of characters of plain text to prove that you have it, we generally have you run the data through a hash function and submit the output value. Let's go over how to hash data in linux and in python. The hash functions we'll use are md5 and sha256.

Linux

In linux, there are programs called md5sum and sha256sum which work well for our purposes. Note that on OS X these commands are named slightly differently, namely md5 and shasum -a 256 respectively. Let's say we want to hash the string "Hello, world!".
$ printf "Hello, world!" | md5sum
6cd3556deb0da54bca060b4c39479839  -
Notice the use of the command printf as opposed to echo, this prevents a new line from being appended to our data, so the string is piped to the md5sum exactly as is. The output line shows 6cd3556deb0da54bca060b4c39479839, which is the hexadecimal string representation of the 128 bit hash output. The dash after the hex value indicates that the input was stdin and isn't part of the output of the hash function. Similarly, to compute the sha256 hash of the value:
$ printf "Hello, world!" | sha256sum
315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3  -
Notice that the hash output is twice as long, as sha256 outputs 256 bits and md5 outputs 128 bits.

Python

Luckily, python comes with implementations of common hash function in a module called hashlib. Using the same string as before, "Hello, world!", here's how to compute the md5 and sha256 hash in python using the ipython shell.
In [1]: import hashlib

In [2]: hashlib.md5('Hello, world!').hexdigest()
Out[2]: '6cd3556deb0da54bca060b4c39479839'

In [3]: hashlib.sha256('Hello, world!').hexdigest()
Out[3]: '315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3'

Review

To prove you understood this tutorial, show that you can use these hash functions. Hash the string id0-rsa.pub with sha256, then hash the hex string output with md5, and submit the lowercase hex result.

Test Vector

Using the string testing. The sha256 hex hash value of this string is cf80cd8aed482d5d1527d7dc72fceff84e6326592848447d2dc0b0e87dfc9a90. Hashing this hex string with md5 gives the result ea52a276e74e31d07c6f82af2f3c192a, so this would be the solution.