In this tutorial, you’ll learn about the Python defaultdict objects, which is part of the collections
library. The object overrides the default dictionary behavior of throwing a KeyError
when you try to access a key that doesn’t exist.
You’ll get a complete overview of the defaultdict and how it’s different from the regular Python dictionary. You’ll also learn how to generally handle missing errors in Python. Finally, you’ll learn how to use the defaultdict object, and how to use it to count data, group data, and accumulate data.
Table of Contents
How Do Python Dictionaries Handle Missing Keys?
Python dictionaries are important data structures that store data in key:value
pairs. Dictionaries are referred to as associative arrays because each key is associated with a value. When we try to access a key’s value in a Python dictionary, we can access its value directly using square brackets around its key.
Let’s take a look at how to access a key’s value from a Python dictionary:
# Accessing a Key's Value from a Python Dictionary
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data['Name'])
# Returns: Nik
Now, let’s take a look at what happens when we try to access a key that doesn’t exist:
# Accessing a Missing Key's Value from a Python Dictionary
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data['Hobbies'])
# Raises: KeyError: 'Hobbies'
We can see that when we try to access a key that doesn’t exist, Python raises a KeyError
. This will cause your program to crash unless the error is directly handled.
Python provides a number of ways to avoid this – let’s explore them here.
Using .get() to Handle Missing Dictionary Keys
We can use the dictionary .get()
method to prevent a KeyError
from being raised when a dictionary key doesn’t exist. If we try to access a key’s value that doesn’t exist using the .get()
method, the method simply returns the None
value. Let’s see what this looks like:
# Using .get() to Prevent a KeyError
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data.get('Hobbies'))
# Returns: None
Using try-except to Handle Missing Dictionary Keys
Similar to the example above, we can wrap accessing our dictionary values in a try-except block. This allows us to directly handle the KeyError
that is raised when a key doesn’t exist. Let’s see what this looks like:
# Using try-except to Handle Missing Keys
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
try:
print(data['Hobbies'])
except KeyError:
pass
Using .defaultvalue() to Handle Missing Dictionary Keys
Python dictionaries also provide a method, .defaultvalue()
, which allows us to set, well, a default value for a key. This method sets the default value when a key doesn’t exist and returns that value. Let’s see how we can use that:
# Using .setdefault() to Set a Default Value for a Missing Key
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
data.setdefault('Hobbies', None)
print(data['Hobbies'])
The problem with this approach is that we need to know the key for which we want to create a default value. This still requires a decent amount of planning and does not account for fringe cases.
What are the alternatives to these methods?
After reading these options, you may be thinking that either you don’t need anything different or what some suitable alternatives to these methods are. This is where the defaultdict
comes in. It provides a clean, safe way of providing default values for any missing key without needing to create it first. Let’s dive into learning about the defaultdict
and its many benefits.
What is the Python defaultdict?
The Python defaultdict
is a class belonging to the collections
library, which is part of the standard Python library. This means that you don’t need to actually install anything to make use of it.
We can make use of the class by importing it in our program:
# Importing the defaultdict Class
from collections import defaultdict
The defaultdict
class is a subclass of the Python dict
class. This means it shares many of the same attributes of the dictionary class, but also modifies and extends some of its functionality.
The main changes to the dictionary class are:
defaultdict
overrides the__missing__()
, meaning that noKeyError
is raised when a key doesn’t exist- It adds a required instantiation variable,
.default_factory
, which must be provided
What does this mean for you? In short, the defaultdict
provides ways of providing default values for dictionaries, when a key doesn’t exist. The .default_factory
callable can take any valid callable (such as list
, int
, etc.) or None
and will use that as the default type to create when no key exists.
In this section, you learned what the Python defaultdict
class is and how it’s different from a regular Python dictionary. In the next section, you’ll begin working with these objects to better handle missing data and extend your ability to work with dictionaries.
How to Create a Python defaultdict
In this section, you’ll learn how to create a defaultdict
object using the collections
module in Python. As shown above, we can easily import the class without needing to worry about installing it. This is because the module is part of the standard library and will come installed with Python.
Let’s take a look at how we can create a defaultdict
:
# Creating our first defaultdict
from collections import defaultdict
default = defaultdict(int)
Here, we created a defaultdict
called default. In order to successfully create the dictionary, we need to pass in a valid callable. These callables are ways of creating objects, such as integers, lists, etc. It’s important to use the callable, rather than the function:
# Using a function rather than a callable
from collections import defaultdict
default = defaultdict(int())
# Raises: TypeError: first argument must be callable or None
We can see when we pass in the int()
function, the program will raise a TypeError
.
We can now use the defaultdict
in many of the same ways as we would a normal Python dictionary. For example, we can assign a new key-value pair to the dictionary:
# Assigning a value to a provided key
from collections import defaultdict
default = defaultdict()
default['Name'] = 'Nik'
print(default)
# Returns: defaultdict(None, {'Name': 'Nik'})
In this section, you learned how to create a defaultdict
in Python and how to add key-value pairs to it. In the next section, you’ll learn how to use these dictionaries to handle missing keys.
Handle Missing Keys with Python defaultdict
In this section, you’ll learn how to handle missing keys with the Python defaultdict
. Previously, you learned that when you create a new defaultdict that you have to pass in a new default value type. This allows us to handle missing data when attempting to access a missing key. Let’s see what this looks like:
# Handling Missing Data with defaultdict
from collections import defaultdict
default = defaultdict(int)
print(default['Name'])
# Returns: 0
We can see from the example above that we were able to access a key that didn’t exist and the value of 0 was returned. We can pass in the callables for list
, int
, float
, and set
. In the next section, you’ll learn how to use the defaultdict
object to count items in a list.
Count Items in a List Using Python defaultdict
One of the creative uses for the defaultdict
object is the ability to effectively count items in an iterable and return the counts in a dictionary. Before we dive into how to implement this with the defaultdict
object, let’s take a look at how we can implement this with a regular dictionary.
# Counting Items in a List
names = ['Nik', 'Kate', 'Evan', 'Kyra', 'John', 'Nik', 'Kate', 'Nik']
counts = {}
for name in names:
if name in counts:
counts[name] += 1
else:
counts[name] = 1
print(counts)
# Returns: {'Nik': 3, 'Kate': 2, 'Evan': 1, 'Kyra': 1, 'John': 1}
In the code above, we need to include the if-else statement in order to account for the case that the key doesn’t already exist in our dictionary.
We can simplify this tremendously by making use of the defaultdict
object. We can instantiate an integer as our default key and we can safely increment our keys, even if they didn’t previously exist. Let’s take a look at what this looks like:
# Counting Items in a List with defaultdict
from collections import defaultdict
names = ['Nik', 'Kate', 'Evan', 'Kyra', 'John', 'Nik', 'Kate', 'Nik']
counts = defaultdict(int)
for name in names:
counts[name] += 1
print(counts)
# Returns: defaultdict(<class 'int'>, {'Nik': 3, 'Kate': 2, 'Evan': 1, 'Kyra': 1, 'John': 1})
What we did here was instantiate a defaultdict
object, passing in the callable for an integer value. This means that if a key doesn’t exist, then the value will be assigned 0. Because of this, when we encounter a key that doesn’t exist, we can safely increment its value by 1. This code is significantly cleaner than our naive implementation using the regular dictionary.
Group Data with Python defaultdict
We can use the defaultdict
object to group data based on other data structures. With this, we can iterate over some object, such as a list of tuples, another dictionary, or a set of lists to group data in meaningful ways.
Let’s take a look at an example. If we had a dictionary of people’s names and their hometowns, we can create a dictionary that contains our locations as keys and the people from that location as a list. This allows us to take, say, a table like below and convert it to a dictionary with people’s locations and the people that live there.
Person | Hometown |
---|---|
Nik | Toronto |
Kate | Toronto |
Evan | London |
Kyra | New York |
Jane | New York |
We can turn this table into the following:
Hometown | People |
---|---|
Toronto | ['Nik', 'Kate'] |
London | ['Kyra'] |
New York | ['Evan', 'Jane'] |
Before we take a look at how to implement this with the defaultdict
object, let’s take a look at how to do this with a regular Python dictionary.
# Grouping Items with Dictionaries
people = {'Nik': 'Toronto', 'Kate': 'Toronto', 'Evan': 'London', 'Kyra': 'New York', 'Jane': 'New York'}
locations = {}
for person, location in people.items():
if location in locations:
locations[location].append(person)
else:
locations[location] = [person]
print(locations)
# Returns: {'Toronto': ['Nik', 'Kate'], 'London': ['Evan'], 'New York': ['Kyra', 'Jane']}
We can see that this is a bit more complex than our previous example! Thankfully, defaultdict
makes this significantly easier. Because we can set a default value to be an empty list, we can remove the if-else
statement. Let’s take a look at how this works:
# Grouping Items with Dictionaries with defaultdict
from collections import defaultdict
people = {'Nik': 'Toronto', 'Kate': 'Toronto', 'Evan': 'London', 'Kyra': 'New York', 'Jane': 'New York'}
locations = defaultdict(list)
for person, location in people.items():
locations[location].append(person)
print(locations)
# Returns: defaultdict(<class 'list'>, {'Toronto': ['Nik', 'Kate'], 'London': ['Evan'], 'New York': ['Kyra', 'Jane']})
We can see how much easier this example is to understand because we do not need to worry about the error handling. Our defaultdict
will instantiate an empty list everytime a key doesn’t exist. Because of this, we can easily append to the key’s value without needing to worry about KeyErrors
being thrown!
Accumulate Data with Python defaultdict
In this final section, you’ll learn how to accumulate data with the Python defaultdict
. This example is quite similar to the counting example, except we’ll instantiate a float as the default value. Imagine that you’re keeping track of your spending in different categories and want add up total spent across these categories.
Your data is stored in a list of tuples, where the first value is the category and the second is the amount spent. Let’s take a look at how we can make this work:
# Accumulating Data with defaultdict
from collections import defaultdict
data = [('Groceries', 12.34), ('Entertainment', 5.40), ('Groceries', 53.45),
('Video Games', 65.32), ('Groceries', 33.12), ('Entertainment', 15.44),
('Groceries', 34.45), ('Video Games', 32.22)]
accumulated = defaultdict(float)
for category, amount in data:
accumulated[category] += amount
print(accumulated)
# Returns: defaultdict(<class 'int'>, {'Groceries': 133.36, 'Entertainment': 20.84, 'Video Games': 97.54})
We can see this approach closely resembles our first example, except that we are accessing an item’s value instead of simply counting a value.
Conclusion
In this tutorial, you learned about the Python defaultdict
object. You started off by learning learning how Python dictionaries respond to missing keys and how you can handle these missing keys without crashing your program. You then learned how to use the Python defaultdict
to provide default values. Finally, you learned three complex examples of how to use the defaultdict
object to count, group, and accumulate data.
To learn more about the defaultdict
object, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the tutorials below:
Pingback: Binning Data in Pandas with cut and qcut • datagy
Pingback: Pandas GroupBy: Group, Summarize, and Aggregate Data in Python
Pingback: Python Object-Oriented Programming (OOP) for Data Science • datagy