Skip to main content

Section 2.11 Lab: Hash it Out

A hash is a one-way cryptographic function that produces a unique set of characters for a given message. In a perfect world, given a hash you should not be able to determine what the original message was, but given a hash and the original message you can check that the hash matches the message. Before we dive into the uses of a hash, lets try to further understand it by looking at a simple and consequently poor hashing algorithm.

Subsection 2.11.1 Anagram Hash

Note 2.11.1.

The following algorithm is so poor that it may be a stretch even to call it a hashing algorithm. That being said, it is being used as a tool to explain what hashes are.
Let’s assume we wanted to hash the message "Hello from Karl" so that we can have a string of characters the uniquely represent that phrase. One way to do it would be to strip all the punctuation in the message, make everything lowercase, and then arrange all the letters alphabetically. "Hello from Karl" becomes "aefhklllmoorr". You can think of it like saying, "There is one ’a’ in the message, one ’e’ in the message, one ’f’ in the message’, one ’k’ in the message, three ’l’s in the message…​" Now our hash, "aefhklllmoorr", can be used to uniquely identify the phrase.
Now assume Karl wants to send us a message but he can’t trust the person sending the message. He could use the untrusted party to send us the message and then put the hash someplace public like on a website. We could use the hash to know the message came from Karl and if anyone else got the hash they would not be able to discern the message because a hash is a one-way function. "aefhklllmoorr" reveals very little about the message, but it can be used to check its accuracy.
Hopefully this is beginning to show the power of hashes. Now lets examine another very common usecase and find out exactly why this is a terrible algorithm.
Assume you run a website where a user uses a password to log in. You want to make sure users are using their password when they log in, but you do not want to store the password on your website. This is quite common. If you website was breached you don’t want to leak a bunch of people’s passwords. What do you do? What you could do is store a hash of their password, hash the password when they try to login, and compare the hashes. For example if our password was "password" using our basic hash algorithm the hash would be "adoprssw". We could store "adoprssw" in our database, use it for comparison during login, and if someone were to ever steal the data in our database they wouldn’t know that the original password is "password". This may prevent an attacker from exploiting the fact that many people use the same password on multiple sites.
The problem is that there are many things that hash to "adoprssw" including "wordpass", "drowsaps", or even the hash we’re storing: "adoprssw". When multiple messages have the same hash it is referred to as a collision and this particular algorithm is useless because it generates so many of them.

Checkpoint 2.11.2.

What would the anagram hash of "AlwaysDancing" be?
Now that we understand what hashes do and to some extant how they are possible, lets look at a much more useful hash function.

Subsection 2.11.2 MD5

Many of the labs in this book need to run in an isolated environment inside of a Docker container. A Docker container image is an standalone, executable package of software that includes everything needed to run the application: code, runtime, system tools, system libraries and settings.
For this lab and for all of those that require Docker containers, you can choose to use a Github codespace or to use a local Docker installation. There are advantages to each. Each Github codespace you create is hosted by GitHub in a Docker container, running on a virtual machine. Github codespaces are easy to set up and use, but you may be charged for the time you spend using them if you use too much time, so you need to remember to shut them down or delete them when you have finished with them. Local Docker installations are cost-free, but they require space on your harddrive as well as more setup and configuration.
Please choose one of the following options to run this lab. Complete that set-up and then skip down to the section titled Subsection 2.11.3 to continue with the lab.

Subsubsection 2.11.2.1 MD5 in a Github Codespace

Go github.com/pearcej/security-hash
 1 
github.com/pearcej/security-hash
. Then:
  1. Fork this codespace into your own Github repository.
  2. Navigate to your repository on GitHub.
  3. Click the green Code button and select Codespaces.
  4. Click "Create codespace on main".
  5. Wait patiently for the codespace to be created. It can take a bit longer than you might expect, but once it is created, you will be able to run the lab in the codespace.
Be sure to either stop or delete this codespace when you are done by clicking the "Stop" button or the "Delete" button in the Codespaces tab of your repository.
Next, jump down to follow the lab directions in Subsection 2.11.3.

Subsubsection 2.11.2.2 MD5 in a Local Docker Container

For a local installation, you neeed Docker and a terminal. Please follow these directions for installing Docker.
 2 
docs.docker.com/get-docker/
For Windows you can use the Windows Terminal app
 3 
www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701
and in MacOS you can use the preinstalled Terminal app. Boxes show the commands to be typed into the terminal with typical output where possible. Your prompt (the part shown before the command) may differ depending on your OS.
Start by running a bash shell in the terminal on a custom Linux container, and type docker run -it ryantolboom/hash. You should see your command followed by output similar to the following:
ryan@R90VJ3MK:/windir/c/Users/rxt1077/it230/docs$ docker run -it ryantolboom/hash
root@vm-name:/
Note 2.11.3.
Here we are using the Docker run command interactively (-it) as this container runs bash by default.
Note 2.11.4.
Notice the new prompt in the terminal showing that we are root on this container.

Subsection 2.11.3 MD5 Hash

MD5 is a message-digest algorithm that produces significantly better hashes than our Anagram algorithm. Most Linux distributions include a simple utility for creating an MD5 hash based on a file’s contents. This command is named md5sum. Typically this is used to detect if a file has been tampered with. A website may provide links to download software as well as an MD5 hash of the files so that you know what you’ve downloaded is correct. Similarly a security system may keep md5sums (MD5 hashes) of certain critical files to determine if they have been tampered with by malware. Let’s practice taking the md5sum of the /etc/passwd file. Note that root@vm-name:/# is the prompt in the terminal, indicating that we are running as root on a container named vm-name. Your will appear differently.
Type the following command into the terminal: md5sum /etc/passwd. You should see your command followed by output similar to the following:
root@vm-name:/# md5sum /etc/passwd
9911b793a6ca29ad14ab9cb40671c5d7  /etc/passwd

Note 2.11.5.

The first line above is the prompt followed by the command md5sum /etc/passwd. The second line is the output of the command, which is the MD5 hash of the contents of the file /etc/passwd. The output is in two parts separated by a space. The first part of the output line, namely 9911b793a6ca29ad14ab9cb40671c5d7 is the MD5 hash, the second part, namely /etc/passwdis the file name.
Now we’ll make a file with your first name in it and store it in /tmp/name.txt by typing echo "<your_name>" >> /tmp/name.txt which will appear as follows:
root@vm-name:/# echo "<your_name>" >> /tmp/name.txt

Note 2.11.6.

Be sure to substitute your actual first name for <your_name>, but be sure to include the quotation marks.
The cat command in Linux is used to display the contents of files, concatenate multiple files, and create new files, so you can see the contents of the new file by running: cat /tmp/name.txt.

Question 2.11.7.

What is the md5sum of your first name which is stored in /tmp/name.txt? (You can run the command md5sum /tmp/name.txt to find out.)
For our final lab activity, let’s take a look at some of the weaknesses of hashes.
Hash Cracking
Passwords in a Linux system are hashed and stored in the /etc/shadow file. Let’s print out the contents of that file to see how it looks. Type cat /etc/shadow, and you should see your command followed by output similar to the following:
root@vm-name:/# cat /etc/shadow
root:*:19219:0:99999:7:::
daemon:*:19219:0:99999:7:::
bin:*:19219:0:99999:7:::
sys:*:19219:0:99999:7:::
sync:*:19219:0:99999:7:::
games:*:19219:0:99999:7:::
man:*:19219:0:99999:7:::
lp:*:19219:0:99999:7:::
mail:*:19219:0:99999:7:::
news:*:19219:0:99999:7:::
uucp:*:19219:0:99999:7:::
proxy:*:19219:0:99999:7:::
www-data:*:19219:0:99999:7:::
backup:*:19219:0:99999:7:::
list:*:19219:0:99999:7:::
irc:*:19219:0:99999:7:::
gnats:*:19219:0:99999:7:::
nobody:*:19219:0:99999:7:::
_apt:*:19219:0:99999:7:::
karl:$y$j9T$oR2ZofMTuH3dpEGbw6c/y.$TwfvHgCl4sIp0b28YTepJ3YVvl/3UyWKeLCmDV1tAd9:19255:0:99999:7:::

Note 2.11.8.

As you can see here the karl user has a long hash immediately after their username.
One of the problems with hashes are that if people choose simple passwords, they can be easily cracked by a program that takes a wordlist of common passwords, generates their hashes, and then checks to see if the hash is the same. While a hash may be a one-way function, it is still subject to this type of attack. We will use a program called John the Ripper
 4 
www.openwall.com/john/
and do exactly that.
John the Ripper (aka John) is already installed on this container along with a simple wordlist. To use John the Ripper, you need to provide it with a password file containing hashed passwords and then run the command john <passwordfile> in your terminal. You can also customize its behavior with options like --wordlist to specify a custom wordlist for cracking. We will tell it to use the default wordlist to try and determine what the password is that matches karl’s hash in /etc/shadow by running the command john --format=crypt --wordlist=/usr/share/john/password.lst /etc/shadow. The --format=crypt option tells John the Ripper to use the crypt format, which is the format used by the hashes in the shadow file. The --wordlist option tells John to use the specified wordlist file, which is a list of common passwords. The last argument is the file containing the hashes, in this case /etc/shadow. You should see your command followed by output similar to the following:
root@vm-name:/# john --format=crypt --wordlist=/usr/share/john/password.lst /etc/shadow
Loaded 1 password hash (crypt, generic crypt(3) [?/64])
Press 'q' or Ctrl-C to abort, almost any other key for status
<karl's password>             (karl)
1g 0:00:00:01 100% 0.6211g/s 178.8p/s 178.8c/s 178.8C/s lacrosse..pumpkin
Use the "--show" option to display all of the cracked passwords reliably
Session completed

Note 2.11.9.

Once John has cracked a password it will not show it if you run it again. To show the passwords that have already been cracked you must run the --show command with the file: john --show /etc/shadow

Checkpoint 2.11.10.

What is Karl’s password?
Given that the password is in the included common password wordlist, /usr/share/john/password.lst, you will quickly find that John the Ripper figures out karl’s password. John the Ripper can also run incrementally though all the possible character combinations, but it takes much longer. To help make these types of attacks more difficult, every hash in /etc/shadow is built off of a random number. This number is called a salt and is stored with the hash. This means that instead of just trying one hash for each word in the wordlist, the hash cracker must try every possible salt for every word in the wordlist, slowing things down significantly. Modern hash crackers may use rainbow tables
 5 
en.wikipedia.org/wiki/Rainbow_table
so that all of the possible hashes have already been computed. These tables may take up terabytes of disk space, but can make cracking even complicated hashes much simpler.
Let’s use a custom utility named crypt to show that we have the actual password. This utility is already installed on your container. We will start by printing out just the line in /etc/shadow that has karl’s info. The Linux grep command is a powerful search tool. (The name is an acronym from Globally search for a Regular Expression and Print matches.) We will use the grep command to limit out output to things that have karl in them by typing cat /etc/shadow | grep karl. You should see your command followed by output similar to the following:
root@vm-name:/# cat /etc/shadow | grep karl
karl:$y$j9T$oR2ZofMTuH3dpEGbw6c/y.$TwfvHgCl4sIp0b28YTepJ3YVvl/3UyWKeLCmDV1tAd9:19255:0:99999:7:::
  • Colons, :, are used as separators in the shadow file. The first part of the shadow line is the username, karl.
  • The next part of the shadow line, immediately following the first colon, is the hash information, namely $y$j9T$oR2ZofMTuH3dpEGbw6c/y.$TwfvHgCl4sIp0b28YTepJ3YVvl/3UyWKeLCmDV1tAd9.
  • The characters in between the first set of dollar signs, $, is the version of the hashing algorithm being used, y for yescrypt in our case.
  • The characters in between the second set of dollar signs are the parameters passed to yescrypt which will always be j9T for us.
  • The characters oR2ZofMTuH3dpEGbw6c/y. in between the third set of dollar signs is the salt.
  • Finally the characters TwfvHgCl4sIp0b28YTepJ3YVvl/3UyWKeLCmDV1tAd9 in between the fourth $ and the : is the hash itself.
The crypt utility calls the system crypt
 6 
man7.org/linux/man-pages/man3/crypt.3.html
command and prints the output. Let’s run this utility with the password we’ve cracked and the first three parts of the hash information from /etc/shadow. If everything goes well, you should see hash output that matches what is in /etc/shadow. To try this, type crypt <karl's password> '$y$j9T$oR2ZofMTuH3dpEGbw6c/y.' into the terminal, replacing <karl's password> with the actual password you cracked. You should see your command followed by output similar to the following:
root@vm-name:/# crypt <karl's password> '$y$j9T$oR2ZofMTuH3dpEGbw6c/y.'
$y$j9T$oR2ZofMTuH3dpEGbw6c/y.$TwfvHgCl4sIp0b28YTepJ3YVvl/3UyWKeLCmDV1tAd9

Note 2.11.11.

Don’t forget to use the actual password you cracked and to put the hash info in single quotes. Remember that the output line was described above as having the first part being the version of the hashing algorithm, the second part being the parameters passed to the algorithm, the third part being the salt, and the fourth part being the hash itself.

Checkpoint 2.11.12.

True or False: The output of the crypt command exactly matches the hash in /etc/shadow.
  • True
  • Right! They should be identical.
  • False
  • Humm... They should be identical. Try again.

Note 2.11.13.

Your instructor might ask you to submit a screenshot with your lab showing that the output of the crypt command matches the hash in /etc/shadow

Note 2.11.14.

If you chose to use a Github codespace, don’t forget to stop or delete the codespace by clicking the "Stop" button or the "Delete" button in the Codespaces tab of your repository.
You have attempted 1 of 4 activities on this page.