Learn all about Python SHA256, including its meaning, its purpose and how to implement it using Python’s hashlib
module. Being able to hash values to encrypt them is a useful skill, especially when working with sensitive data. It can also help when working with web applications and having to store a user’s password in a secure manner.
The Quick Answer: Use Pyhton’s hashlib.sha256()
Table of Contents
What is SHA256 Hashing?
Before we dive into how to implement a SHA256 algorithm in Python, let’s take a few moment to understand what it is. The acronym SHA stands for Secure Hash Algorithm
, which represent cryptographic hash functions. These functions are have excellent uses in protecting sensitive information such as passwords, personal identifiers such as identification items.
What makes the SHA256 algorithm interesting is that:
- It is a one-way algorithm, meaning that under current technologies, the algorithm cannot be returned to its original value, and
- Two different input values will practically never yield the same result, allowing us to maintain integrity and uniqueness of data.
Because of this, we can identify overlap in records, say, to identify same birthdates, social security numbers, etc. This allows us to use unique identifiers, even when their data is obfuscated.
In the next section, you’ll learn how to use the Python hashlib
library to implement the SHA256 encryption algorithm.
Want to learn about other ways of working with files in Python? Check out my entire learning path on file handling, including how to open, read, and move files.
Using Python hashlib to Implement SHA256
Python has a built-in library, hashlib
, that is designed to provide a common interface to different secure hashing algorithms. The module provides constructor methods for each type of hash. For example, the .sha256()
constructor is used to create a SHA256 hash.
The sha256
constructor takes a byte-like input, returning a hashed value. The function has a number of associated with hashing values, which are especially useful given that normal strings can’t easily be processed:
encode()
which is used to convert a string to bytes, meaning that the string can be passed into thesha256
functionhexdigest()
which is used to convert our data into hexadecimal format
Python SHA256 for a Single String
Let’s take a look at how we can pass in a single string to return its hashed value using Python and hashlib
:
# Hash a single string with hashlib.sha256
import hashlib
a_string = 'this string holds important and private information'
hashed_string = hashlib.sha256(a_string.encode('utf-8')).hexdigest()
print(hashed_string)
# Returns:
# 4e7d696bce894548dded72f6eeb04e8d625cc7f2afd08845824a4a8378b428d1
In the above example, we first encoded the string and grabbed its hexadecimal value, passing it into the hash function.
Python SHA256 for an Entire File
Now, let’s take a look at how we can implement SHA256 for an entire file. You may be familiar with opening files using a context manager, such as the one shown below:
with open(file, 'r') as f:
...
The problem with this is that when we open a file using the 'r'
method, Python implicitly asks Python to decode the bytes in the string to a default encoding, such as utf-8
.
In order to change this behaviour, we can change our context manager to the following:
with open(file, 'rb') as f:
...
Now, if we were to try to hash the file using simply the 'r'
method, it would raise a TypeError
.
Using the 'rb'
method, we can write the following code, which would successfully hash a line:
# Hash a file with hashlib.sha256
import hashlib
with open('/Users/datagy/Desktop/sample.txt', 'rb') as f:
for line in f:
hashed_line = hashlib.sha256(line.rstrip()).hexdigest()
print(hashed_line)
In the code block above, we used a Python for loop to encrypt each line in our file. This allows us to ensure that we’re hashing our text in a memory-efficient manner.
In the next section, we’ll explicitly cover off how to use Python SHA256 for unicode strings.
Python SHA256 with Unicode Strings
In the above section, you learned how to use Python to implement a SHA256 hashing method. In this section, you’ll learn explicitly how to convert a unicode string to a SHA256 hash value.
Because the Python hashlib
library cannot hash unicode encoded strings, such as those in utf-8, we need to first convert the string to bytes. We can do this using the .encode()
and .hexdigest()
methods.
Let’s see how we can take a unicode-encoded string and return its HSA256 hash value using Python:
# Hash a single string with hashlib.sha256
import hashlib
a_string = 'this string holds important and private information'
hashed_string = hashlib.sha256(a_string.encode('utf-8')).hexdigest()
print(hashed_string)
# Returns:
# 4e7d696bce894548dded72f6eeb04e8d625cc7f2afd08845824a4a8378b428d1
In many cases, you won’t just be working with a single string. Let’s say that you have a list of strings that you need to hash, we can do this using either a for loop or a list comprehension.
Let’s take a look at what the for loop method looks like first:
# Hash a list of strings with hashlib.sha256
import hashlib
strings = ['this string holds important and private information', 'this is another string', 'and yet another secret string']
hashed_strings = []
for string in strings:
hashed_strings.append(hashlib.sha256(string.encode('utf-8')).hexdigest())
print(hashed_strings)
# Returns:
# ['4e7d696bce894548dded72f6eeb04e8d625cc7f2afd08845824a4a8378b428d1', 'adc93c2130934bbb28f652d77980e94d65e4b9b44e6100484ca2e77ab432caa1', 'd16bdaf20806fd89f9a08eec1c9002a12c1c5978157b25f40e79e42a76013a31']
We can see here that we apply the same method as for a single string to each string in our list. We then append the string to a new list to hold our hashed values.
Now, let’s take a look at how we can accomplish this using a list comprehension. This method is a bit more Pythonic and allows us to skip the step of initializing an empty list. Let’s see how we can do this:
# Hash a list of strings with hashlib.sha256
import hashlib
strings = ['this string holds important and private information', 'this is another string', 'and yet another secret string']
hashed_strings = [hashlib.sha256(string.encode('utf-8')).hexdigest() for string in strings]
print(hashed_strings)
# Returns:
# ['4e7d696bce894548dded72f6eeb04e8d625cc7f2afd08845824a4a8378b428d1', 'adc93c2130934bbb28f652d77980e94d65e4b9b44e6100484ca2e77ab432caa1', 'd16bdaf20806fd89f9a08eec1c9002a12c1c5978157b25f40e79e42a76013a31']
Here, we used a Python list comprehension to hash each string in a list using the SHA256 hashing method. We first decode the unicode string into bytes, which are then passed into the sha256
function.
In the next section, you’ll learn how to hash Pandas Dataframe columns using SHA256 in Python.
Want to learn more about Python list comprehensions? Check out this in-depth tutorial that covers off everything you need to know, with hands-on examples. More of a visual learner, check out my YouTube tutorial here.
Python SHA256 a Pandas Column
There may be many times when you’re working with tabular data in a Pandas Dataframe and need to hash an entire column. For example, you could be pulling data from one system and need to protect the data before it’s uploaded to another system.
In order to accomplish this, we need to iterate over each row for a given column and apply the hash value function. Let’s see how we can do this and then explain how it works. We’ll begin by creating a Pandas Dataframe:
# Hashing a Pandas Dataframe column with SHA256
import hashlib
import pandas as pd
df = pd.DataFrame.from_dict({
'ID': [1,2,3,4,5,6,7],
'Password': ['cookies', 'ponies', 'iloveyou123', 'p4ssw0rd', 'passwordsaresilly!', 'ilikecoffee987', 'spaceagents444']
})
print(df.head())
# Returns
# ID Password
# 0 1 cookies
# 1 2 ponies
# 2 3 iloveyou123
# 3 4 p4ssw0rd
# 4 5 passwordsaresilly!
Now that we have our dataframe, we can see that we have a column that stores a user’s password. Let’s see how we can use the hashlib
module to hash an entire dataframe column:
def hash_unicode(a_string):
return hashlib.sha256(a_string.encode('utf-8')).hexdigest()
df['Hashed Password'] = df['Password'].apply(hash_unicode)
print(df.head())
# Returns:
# ID Password Hashed Password
# 0 1 cookies 2f0030c535193fc164e4e2b5371e8e676510cf0a64b268...
# 1 2 ponies ebe85ca5dd0d48cd1fb0f96853830536283aba62425dd8...
# 2 3 iloveyou123 dce7a17f9bff4aa7efebf32c3b3a082033be7e5f103143...
# 3 4 p4ssw0rd 48d2a5bbcf422ccd1b69e2a82fb90bafb52384953e77e3...
# 4 5 passwordsaresilly! 62e8c65bda8914ab8f76dd5d94136d056d19f04861ce18...
We can see here that we apply the function to each row in the dataframe. Of course, we would want to delete the original column when importing it. Because of this, we could simply re-assign the column to itself to overwrite it.
Be aware, this process may take a little bit of time. Since this isn’t a vectorized application to your dataframe, we need to apply the function to each record in that column. Depending on the size of your dataframe, this can take significant time.
Want to learn how to use the Python zip()
function to iterate over two lists? This tutorial teaches you exactly what the zip()
function does and shows you some creative ways to use the function.
Conclusion
In this tutorial, you learned how to use the Python hashlib
library to implement a secure SHA256 hashing algorithm. You learned what the algorithm is and how it is often used. You also learned how to hash a single string, a list of strings, and a Pandas Dataframe column. Being able to work with hashing strings in an effective manner can make your data much more secure. As more and more data goes online, this can be an important safeguard to how your data is stored.
Being able to combine this with other important workflows, such as unzipping files in Python or moving files can be an important way of ensuring your data is safe.
To learn more about the Python hashlib
module, check out the official documentation here.