Skip to content

Python Defaultdict: Overview and Examples

Python Defaultdict Overview and Examples Cover image

In this tutorial, you’ll learn about the Python defaultdict objects, which is part of the collections library. The object overrides the default dictionary behavior of throwing a KeyError when you try to access a key that doesn’t exist.

You’ll get a complete overview of the defaultdict and how it’s different from the regular Python dictionary. You’ll also learn how to generally handle missing errors in Python. Finally, you’ll learn how to use the defaultdict object, and how to use it to count data, group data, and accumulate data.

How Do Python Dictionaries Handle Missing Keys?

Python dictionaries are important data structures that store data in key:value pairs. Dictionaries are referred to as associative arrays because each key is associated with a value. When we try to access a key’s value in a Python dictionary, we can access its value directly using square brackets around its key.

Let’s take a look at how to access a key’s value from a Python dictionary:

# Accessing a Key's Value from a Python Dictionary
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data['Name'])

# Returns: Nik

Now, let’s take a look at what happens when we try to access a key that doesn’t exist:

# Accessing a Missing Key's Value from a Python Dictionary
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data['Hobbies'])

# Raises: KeyError: 'Hobbies'

We can see that when we try to access a key that doesn’t exist, Python raises a KeyError. This will cause your program to crash unless the error is directly handled.

Python provides a number of ways to avoid this – let’s explore them here.

Using .get() to Handle Missing Dictionary Keys

We can use the dictionary .get() method to prevent a KeyError from being raised when a dictionary key doesn’t exist. If we try to access a key’s value that doesn’t exist using the .get() method, the method simply returns the None value. Let’s see what this looks like:

# Using .get() to Prevent a KeyError
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
print(data.get('Hobbies'))

# Returns: None

Using try-except to Handle Missing Dictionary Keys

Similar to the example above, we can wrap accessing our dictionary values in a try-except block. This allows us to directly handle the KeyError that is raised when a key doesn’t exist. Let’s see what this looks like:

# Using try-except to Handle Missing Keys
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
try:
    print(data['Hobbies'])
except KeyError:
    pass

Using .defaultvalue() to Handle Missing Dictionary Keys

Python dictionaries also provide a method, .defaultvalue(), which allows us to set, well, a default value for a key. This method sets the default value when a key doesn’t exist and returns that value. Let’s see how we can use that:

# Using .setdefault() to Set a Default Value for a Missing Key
data = {'Name': 'Nik', 'Location': 'Toronto', 'Age': 33}
data.setdefault('Hobbies', None)

print(data['Hobbies'])

The problem with this approach is that we need to know the key for which we want to create a default value. This still requires a decent amount of planning and does not account for fringe cases.

What are the alternatives to these methods?

After reading these options, you may be thinking that either you don’t need anything different or what some suitable alternatives to these methods are. This is where the defaultdict comes in. It provides a clean, safe way of providing default values for any missing key without needing to create it first. Let’s dive into learning about the defaultdict and its many benefits.

What is the Python defaultdict?

The Python defaultdict is a class belonging to the collections library, which is part of the standard Python library. This means that you don’t need to actually install anything to make use of it.

We can make use of the class by importing it in our program:

# Importing the defaultdict Class
from collections import defaultdict

The defaultdict class is a subclass of the Python dict class. This means it shares many of the same attributes of the dictionary class, but also modifies and extends some of its functionality.

The main changes to the dictionary class are:

  1. defaultdict overrides the __missing__(), meaning that no KeyError is raised when a key doesn’t exist
  2. It adds a required instantiation variable, .default_factory, which must be provided

What does this mean for you? In short, the defaultdict provides ways of providing default values for dictionaries, when a key doesn’t exist. The .default_factory callable can take any valid callable (such as list, int, etc.) or None and will use that as the default type to create when no key exists.

In this section, you learned what the Python defaultdict class is and how it’s different from a regular Python dictionary. In the next section, you’ll begin working with these objects to better handle missing data and extend your ability to work with dictionaries.

How to Create a Python defaultdict

In this section, you’ll learn how to create a defaultdict object using the collections module in Python. As shown above, we can easily import the class without needing to worry about installing it. This is because the module is part of the standard library and will come installed with Python.

Let’s take a look at how we can create a defaultdict:

# Creating our first defaultdict
from collections import defaultdict

default = defaultdict(int)

Here, we created a defaultdict called default. In order to successfully create the dictionary, we need to pass in a valid callable. These callables are ways of creating objects, such as integers, lists, etc. It’s important to use the callable, rather than the function:

# Using a function rather than a callable
from collections import defaultdict

default = defaultdict(int())

# Raises: TypeError: first argument must be callable or None

We can see when we pass in the int() function, the program will raise a TypeError.

We can now use the defaultdict in many of the same ways as we would a normal Python dictionary. For example, we can assign a new key-value pair to the dictionary:

# Assigning a value to a provided key
from collections import defaultdict
default = defaultdict()

default['Name'] = 'Nik'
print(default)

# Returns: defaultdict(None, {'Name': 'Nik'})

In this section, you learned how to create a defaultdict in Python and how to add key-value pairs to it. In the next section, you’ll learn how to use these dictionaries to handle missing keys.

Handle Missing Keys with Python defaultdict

In this section, you’ll learn how to handle missing keys with the Python defaultdict. Previously, you learned that when you create a new defaultdict that you have to pass in a new default value type. This allows us to handle missing data when attempting to access a missing key. Let’s see what this looks like:

# Handling Missing Data with defaultdict
from collections import defaultdict
default = defaultdict(int)

print(default['Name'])

# Returns: 0

We can see from the example above that we were able to access a key that didn’t exist and the value of 0 was returned. We can pass in the callables for list, int, float, and set. In the next section, you’ll learn how to use the defaultdict object to count items in a list.

Count Items in a List Using Python defaultdict

One of the creative uses for the defaultdict object is the ability to effectively count items in an iterable and return the counts in a dictionary. Before we dive into how to implement this with the defaultdict object, let’s take a look at how we can implement this with a regular dictionary.

# Counting Items in a List
names = ['Nik', 'Kate', 'Evan', 'Kyra', 'John', 'Nik', 'Kate', 'Nik']

counts = {}
for name in names:
    if name in counts:
        counts[name] += 1
    else:
        counts[name] = 1

print(counts)

# Returns: {'Nik': 3, 'Kate': 2, 'Evan': 1, 'Kyra': 1, 'John': 1}

In the code above, we need to include the if-else statement in order to account for the case that the key doesn’t already exist in our dictionary.

We can simplify this tremendously by making use of the defaultdict object. We can instantiate an integer as our default key and we can safely increment our keys, even if they didn’t previously exist. Let’s take a look at what this looks like:

# Counting Items in a List with defaultdict
from collections import defaultdict
names = ['Nik', 'Kate', 'Evan', 'Kyra', 'John', 'Nik', 'Kate', 'Nik']

counts = defaultdict(int)
for name in names:
    counts[name] += 1

print(counts)

# Returns: defaultdict(<class 'int'>, {'Nik': 3, 'Kate': 2, 'Evan': 1, 'Kyra': 1, 'John': 1})

What we did here was instantiate a defaultdict object, passing in the callable for an integer value. This means that if a key doesn’t exist, then the value will be assigned 0. Because of this, when we encounter a key that doesn’t exist, we can safely increment its value by 1. This code is significantly cleaner than our naive implementation using the regular dictionary.

Group Data with Python defaultdict

We can use the defaultdict object to group data based on other data structures. With this, we can iterate over some object, such as a list of tuples, another dictionary, or a set of lists to group data in meaningful ways.

Let’s take a look at an example. If we had a dictionary of people’s names and their hometowns, we can create a dictionary that contains our locations as keys and the people from that location as a list. This allows us to take, say, a table like below and convert it to a dictionary with people’s locations and the people that live there.

PersonHometown
NikToronto
KateToronto
EvanLondon
KyraNew York
JaneNew York
Our original table of values

We can turn this table into the following:

HometownPeople
Toronto['Nik', 'Kate']
London['Kyra']
New York['Evan', 'Jane']
The resulting data structure

Before we take a look at how to implement this with the defaultdict object, let’s take a look at how to do this with a regular Python dictionary.

# Grouping Items with Dictionaries
people = {'Nik': 'Toronto', 'Kate': 'Toronto', 'Evan': 'London', 'Kyra': 'New York', 'Jane': 'New York'}
locations = {}

for person, location in people.items():
    if location in locations:
        locations[location].append(person)
    else:
        locations[location] = [person]

print(locations)

# Returns: {'Toronto': ['Nik', 'Kate'], 'London': ['Evan'], 'New York': ['Kyra', 'Jane']}

We can see that this is a bit more complex than our previous example! Thankfully, defaultdict makes this significantly easier. Because we can set a default value to be an empty list, we can remove the if-else statement. Let’s take a look at how this works:

# Grouping Items with Dictionaries with defaultdict
from collections import defaultdict
people = {'Nik': 'Toronto', 'Kate': 'Toronto', 'Evan': 'London', 'Kyra': 'New York', 'Jane': 'New York'}
locations = defaultdict(list)

for person, location in people.items():
    locations[location].append(person)

print(locations)

# Returns: defaultdict(<class 'list'>, {'Toronto': ['Nik', 'Kate'], 'London': ['Evan'], 'New York': ['Kyra', 'Jane']})

We can see how much easier this example is to understand because we do not need to worry about the error handling. Our defaultdict will instantiate an empty list everytime a key doesn’t exist. Because of this, we can easily append to the key’s value without needing to worry about KeyErrors being thrown!

Accumulate Data with Python defaultdict

In this final section, you’ll learn how to accumulate data with the Python defaultdict. This example is quite similar to the counting example, except we’ll instantiate a float as the default value. Imagine that you’re keeping track of your spending in different categories and want add up total spent across these categories.

Your data is stored in a list of tuples, where the first value is the category and the second is the amount spent. Let’s take a look at how we can make this work:

# Accumulating Data with defaultdict
from collections import defaultdict
data = [('Groceries', 12.34), ('Entertainment', 5.40), ('Groceries', 53.45), 
        ('Video Games', 65.32), ('Groceries', 33.12), ('Entertainment', 15.44), 
        ('Groceries', 34.45), ('Video Games', 32.22)]

accumulated = defaultdict(float)
for category, amount in data:
    accumulated[category] += amount

print(accumulated)

# Returns: defaultdict(<class 'int'>, {'Groceries': 133.36, 'Entertainment': 20.84, 'Video Games': 97.54})

We can see this approach closely resembles our first example, except that we are accessing an item’s value instead of simply counting a value.

Conclusion

In this tutorial, you learned about the Python defaultdict object. You started off by learning learning how Python dictionaries respond to missing keys and how you can handle these missing keys without crashing your program. You then learned how to use the Python defaultdict to provide default values. Finally, you learned three complex examples of how to use the defaultdict object to count, group, and accumulate data.

To learn more about the defaultdict object, check out the official documentation here.

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *