The Python collections module builds on many of the container data types available in Python. By providing efficient and intuitive classes and functions, the collections module extends the many different Python data types. While lists, dictionaries, and strings are highly customizable, the collections module is meant to enhance these data types.
By diving into the collections module, you can extend your programming utility by making your code cleaner and more maintainable. By the end of this guide, you’ll have learned how to use the collections module to:
- Write more readable tuples using namedtuples
- Managing ordering of lists with deques and of dictionaries with ordereddicts
- Count object values quickly and efficiently using Counter
- Handle missing dictionary keys using factory functions
- Subclass dictionaries, lists, and strings
Table of Contents
Understanding the Python collections Module
The Python collections module offers a ton of different functionality. While the module has been around for many years, not every feature was available right away. Let’s take a look at the different data types that the Python collections module has to offer:
Class / Function | Description | Use Cases | Python Version |
---|---|---|---|
namedtuple() | A factory function for creating tuple subclasses that have named fields for easier access. Jump to namedtuple section. | Accessing fixed values by name, rather than the position | 2.6 |
deque | List-like containers that allow you to easily append and pop items from either end of the container. Jump to deque section. | Efficiently handling queues of data, even when the lists are large | 2.4 |
ChainMap | A dictionary-like container for creating views of multiple dictionary mappings. Jump to ChainMap section. | A dictionary-like object that treats multiple mappings as a single dictionary | 3.3 |
Counter | A dictionary-like container for counting different objects. Jump to Counter section. | Counting large containers of data and finding most frequent and rare items | 2.7, 3.1 |
OrderedDict | A dictionary-like container that keeps track of the order of items. Jump to OrderedDict section. | Keeping track of when items were added to a dictionary | 2.7, 3.1 |
defaultdict | A dictionary subclass that supplies missing values. Jump to defaultdict section. | Creating dictionaries that overrides missing value behavior | 2.5 |
UserDict | A dictionary wrapper for easier subclassing of dictionaries. Jump to UserDict section. | Making subclassing of dictionaries easier | 2.2 |
UserList | A list wrapper for easier subclassing of lists. Jump to UserList section. | Making subclassing of lists easier | 2.2 |
UserString | A string wrapper for easier subclassing of strings. Jump to UserString section. | Making subclassing of strings easier | 2.2 |
Now, let’s actually dive into these classes and functions in order to see how they work.
Using Collections Counter to Count Items in Python
The Python Counter
class is an integral part of the collections
module. The class provides incredibly intuitive and Pythonic methods to count items in an iterable, such as lists, tuples, or strings. This allows you to count the frequency of items within that iterable, including finding the most common item.
Let’s create our first Python Counter object. We can pass in a string and the Counter
object will return the counts of all the letters in that string.
The class takes only a single parameter, the item we want to count. Let’s see how we can use it:
# Creating Our First Counter
from collections import Counter
a_string = 'hello! welcome to datagy'
counter = Counter(a_string)
print(counter)
# Returns:
# Counter({'e': 3, 'l': 3, 'o': 3, ' ': 3, 't': 2, 'a': 2, 'h': 1, '!': 1, 'w': 1, 'c': 1, 'm': 1, 'd': 1, 'g': 1, 'y': 1})
By printing out our counter, we’re able to see that it returns a dictionary-like object. The items are sorted by the frequency of each item in the object. In this case, we can see that the letter 'e'
exists three times in our string.
The Counter
class makes it easy to find the most common item in a given object. This can be done by applying the .most_common()
method onto the object. Let’s see how we can find the most common item in our object:
# Accessing the Most Common Item
from collections import Counter
a_string = 'hello! welcome to datagy'
counter = Counter(a_string)
print(counter.most_common()[0])
# Returns: ('e', 3)
To learn more about how to use this great tool, check out this complete guide to the collections Counter class in Python.
Using namedtuple to Make Tuples More Pythonic
The namedtuple()
function available in the Python collections
module is a factory function to generate special tuples. In particular, these tuples are a subclass of normal Python tuples, but allow you to use named fields in their values. Similar to accessing an object’s attribute, you can use dot notation to access a named value in a named tuple.
Let’s see what the namedtuple()
function looks like:
# Understanding the namedtuple() Function
from collections import namedtuple
namedtuple(
typename, # The type name to be used for the tuple
field_names, # The field names to be used
rename=False, # Whether to automatically rename reserved names
defaults=None, # Whether to include default values
module=None # Whether to assign a custom module value
)
Now that you have an understanding of the parameters available in the namedtuple()
function, let’s see how you can recreate the tuple from above.
# Creating your first namedtuple
from collections import namedtuple
Person = namedtuple('Person', ['name', 'age', 'location', 'profession'])
nik = Person('nik', 33, 'Toronto', 'datagy')
print(nik.profession)
# Returns: datagy
Similarly, named tuples can also be indexed. For example, you could access the profession value by accessing its 3 index value:
# Indexing a named tuple
print(nik[3])
# Returns: datagy
To learn more about namedtuples, check out this complete guide to namedtuples.
Handle Missing Keys in Dictionaries with defaultdict
The defaultdict
class is a subclass of the Python dict
class. This means it shares many of the same attributes of the dictionary class, but also modifies and extends some of its functionality.
The main changes to the dictionary class are:
defaultdict
overrides the__missing__()
, meaning that noKeyError
is raised when a key doesn’t exist- It adds a required instantiation variable,
.default_factory
, which must be provided
What does this mean for you? In short, the defaultdict
provides ways of providing default values for dictionaries, when a key doesn’t exist. The .default_factory
callable can take any valid callable (such as list
, int
, etc.) or None
and will use that as the default type to create when no key exists.
Let’s take a look at how we can create a defaultdict
:
# Creating our first defaultdict
from collections import defaultdict
default = defaultdict(int)
Now we can use the defaultdict to easily count items, without first needing to instantiate a key-value pair. Let’s see what this looks like:
# Counting Items in a List with defaultdict
from collections import defaultdict
names = ['Nik', 'Kate', 'Evan', 'Kyra', 'John', 'Nik', 'Kate', 'Nik']
counts = defaultdict(int)
for name in names:
counts[name] += 1
print(counts)
# Returns: defaultdict(<class 'int'>, {'Nik': 3, 'Kate': 2, 'Evan': 1, 'Kyra': 1, 'John': 1})
Because we used the defaultdict to instantiate an empty integer (0) if a key is accessed, we’re able to use the increment operator to add 1 to a value, even if the key-value pair doesn’t yet exist.
To dive more into this topic, check out this complete guide to defaultdicts in Python.
Created Ordered Dictionaries with OrderedDict in Python
OrderedDicts, as the name implies, are dictionaries that maintain the order of dictionaries in Python. While Python dictionaries have been ordered since Python version 3.5, OrderedDicts continue to be an important part of Python.
OrderedDicts are also a subclass of the normal Python dictionary, which means they have access to a lot of the functionality that normal Python dictionaries have. For example, OrderedDicts consist of items or rather key-value pairs.
What makes OrderedDicts unique is that they maintain the original order in which items were added. Similarly, the OrderedDict class provides two additional methods:
.popitem()
, which removes an item, either from the front or back of the OrderedDict.move_to_end()
, which moves an item to either end of the OrderedDict
We can load a few initial items, delete an item and add it back in. This will allow us to see how the order is maintained.
# Adding and Deleting Items
from collections import OrderedDict
ordered = OrderedDict({1: 1, 2: 2, 3: 3, 4: 4})
# Print OrderedDict
print('Before deleting item: ', ordered)
# Delete an Item
del ordered[3]
# Print Ordered Dict
print('After deleting item: ', ordered)
# Add a New Item
ordered[3] = 3
# Print Ordered Dict
print('After inserting item: ', ordered)
# Returns:
# Before deleting item: OrderedDict([(1, 1), (2, 2), (3, 3), (4, 4)])
# After deleting item: OrderedDict([(1, 1), (2, 2), (4, 4)])
# After inserting item: OrderedDict([(1, 1), (2, 2), (4, 4), (3, 3)])
Let’s break down what we’re doing in the code block above:
- We create a new OrderedDict and print it
- We then delete an item using the del keyword and then print the OrderedDict
- Finally, we add the item again and print the OrderedDict
Let’s see how we can use the method to move one of our items to the last position and another to the front:
# Move an Item to the End of an OrderedDict
from collections import OrderedDict
ordered = OrderedDict({1: 1, 2: 2, 3: 3, 4: 4})
ordered.move_to_end(1)
ordered.move_to_end(4, last=False)
print(ordered)
# Returns:
# OrderedDict([(4, 4), (2, 2), (3, 3), (1, 1)])
We can see that we used the method twice. The first method call moved the item with key 1 to the last position. The second method call moved to the item with key 4 to the front of the dictionary.
To learn more about all you can do with this data type, check out an in-depth guide on OrderedDicts in Python.
Create Queues and Stacks with deques in Python
Python lists provide helpful methods for adding an item or multiple items to the end of a list. However, adding items to the front of a list is a bit more complicated. When working with lists, adding items to the front of list involves moving every element of a list back. This is, of course, a very memory-inefficient process. This is where deques come in, which provide memory-efficient ways to add and remove items from the front of a list-like structure.
To create a deque, we can instantiate it using the deque()
function. Because the collections module is built into Python, we don’t need to install anything. Let’s see how we can instantiate a deque:
# Instantiating a Deque with Items
from collections import deque
queue = deque([1, 2, 3, 4, 5])
print(queue)
# Returns: deque([1, 2, 3, 4, 5])
To add items to the left side (the front) of a deque in Python, you can use either the .appendleft()
or .extendleft()
methods. Similar to their list counterparts, the methods add either a single item or multiple items to a deque’s front.
Let’s see how we can use the deque we previously created to prepend a single item using the .appendleft()
method:
# Prepending an Item to a Deque in Python
from collections import deque
queue = deque([1, 2, 3, 4, 5])
queue.appendleft(0)
print(queue)
# Returns:
# deque([0, 1, 2, 3, 4, 5])
If we wanted to add multiple items to the front of a list, we can use the .extendleft()
and pass in another collection, such as list of items:
# Prepending Multiple Items to a Deque in Python
from collections import deque
queue = deque([1, 2, 3, 4, 5])
queue.extendleft([6, 7, 8])
print(queue)
# Returns:
# deque([8, 7, 6, 1, 2, 3, 4, 5])
We can see that by using the .extendleft()
method, that items are added to the left of the deque. It’s important to note that the items are added to the queue as they appear in the list. This means that the first item is added first, followed by the remaining items one by one. This emulates how a queue would work in real life.
To learn more about how to use this data type, check out this complete guide to deques in Python’s collection module.
Chain Dictionaries with ChainMap in Python
A ChainMap is used to group multiple dictionaries to a single view that provides dictionary-like behavior, as though the dictionaries were a single dictionary. What’s more, is that the ChainMap also allows you to define priority both in terms of looking up values and updating values.
In order to create a ChainMap in Python, you can import the class directly from the collections module. The class was added in Python version 3.3 and has had a number of different improvements in the following versions. Let’s see how we can chain and map together multiple dictionaries into a single ChainMap:
# Creating Your First ChainMap
from collections import ChainMap
netflix = {'Harry Potter': '2022-01-01', 'Lord of the Rings': '2021-12-23'}
hulu = {'The Office': '2023-03-01', 'Harry Potter': '2022-03-01'}
amazon_prime = {'Lord of the Rings': '2020-01-01', '30 Rock': '2023-01-01'}
all_services = ChainMap(netflix, hulu, amazon_prime)
print(all_services)
# Returns:
# ChainMap({'Harry Potter': '2022-01-01', 'Lord of the Rings': '2021-12-23'}, {'The Office': '2023-03-01', 'Harry Potter': '2022-03-01'}, {'Lord of the Rings': '2020-01-01', '30 Rock': '2023-01-01'})
In the example above, we created our first ChainMap. In order to do this, we followed the steps below:
- We imported the ChainMap class from the collections module
- We then created three dictionaries, each representing a streaming service and the movies or TV shows that it offers (as well as the date they were added)
- We then created a ChainMap by passing all three services into the ChainMap instantiator
- Finally, we printed the ChainMap to see what it looks like
Let’s see how we can look for an item in the ChainMap:
# Accessing an Item in a ChainMap
from collections import ChainMap
netflix = {'Harry Potter': '2022-01-01', 'Lord of the Rings': '2021-12-23'}
hulu = {'The Office': '2023-03-01', 'Harry Potter': '2022-03-01'}
amazon_prime = {'Lord of the Rings': '2020-01-01', '30 Rock': '2023-01-01'}
all_services = ChainMap(netflix, hulu, amazon_prime)
print(all_services.get('Harry Potter'))
# Returns:
# 2022-01-01
We can see that when we use the .get()
method to find an item in the ChainMap, it returned the value. Remember, this is the standard, safe way of accessing an item in a dictionary. What’s interesting here is that the method returned only a single value, even though the item exists in two dictionaries.
ChainMaps will prioritize getting dictionary values based on the order in which the dictionaries are added to the ChainMap. We can see how this works in that netflix
is added before hulu
. Because of this, the value for 'Harry Potter'
is taken from the netflix
dictionary.
To learn more about this class, check out the complete guide to ChainMap in Python’s collection module.
Subclass Strings, Dictionaries, and Lists with Python Collections
The collections module also makes it easy to subclass strings, dictionaries, and lists. This allows you to create objects that meet specific criteria, such as strings that need to be upper-case. This includes creating custom types using the following:
UserString
for custom strings,UserDict
for custom dictionaries, andUserList
for custom lists
Let’s explore each of these in a bit more detail:
Python Collections UserString for Custom Strings
We can use the UserString class to inherit from a string to enhance or modify its default behavior. In particular, the class simply subclasses, rather than instantiates.
This means that the class inherits from the string and emulates it. The UserString also makes the original string accessible through a .data
attribute.
The class created a data
attribute, which stores the contents of the underlying string. This allows you to build custom methods and behavior that have access to the underlying data.
By default, Python strings let you print out the strings in different ways. For example, you can use the .upper()
and .lower()
methods to print values in uppercase and lowercase. We’ll create a custom string class that let’s you print in alternating case. This means that a string like 'datagy'
will be printed as 'dAtAgY'
.
Let’s see how we can use the UserString class to create a custom Python string:
# Create a Custom String Class
from collections import UserString
import re
class FunnyString(UserString):
def __init__(self, sequence):
self.data = re.sub(r'[^\w\s]', '', sequence)
def funnify(self):
funny = ""
for idx in range(len(self.data)):
if not idx % 2:
funny += self.data[idx].upper()
else:
funny += self.data[idx].lower()
print(funny)
text = FunnyString('Hello! Welcome to datagy.io!')
In this case, we’re using the __init__()
function to take the original string and remove any punctuation from the string. We also create a method that lets us print the string in alternating capitalization.
Check out this complete guide to the collections UserDict class to continue walking through these examples.
Python Collections UserList for Custom Lists
The UserList class has been included in Python since version 1.6, however, it was moved to the collections module in Python 3. We can use the UserList class to inherit from a list to enhance or modify its default behavior. In particular, the class simply subclasses, rather than instantiates.
The class was made available before it was possible to simply inherit from the list class itself. However, it still provides some helpful benefits. For example, you can access the underlying data using the .data
attribute, rather than relying on calling the super()
function.
The class created a data
attribute, which stores the contents of the underlying list. This allows you to build custom methods and behavior that have access to the underlying data.
By default, Python lists are heterogenous, meaning that they can hold different types of data. For example, we can create a list that looks like this: ['datagy', 1, 2, 3]
. In this sample list, we have a string as well as integers.
Let’s see how we can use the UserList class to create a custom Python list:
# Creating a Custom List
from collections import UserList
class NumberList(UserList):
def __init__(self, iterable):
super().__init__(
item for item in iterable if type(item) in [int, float])
def __setitem__(self, index, item):
if type(item) in [int, float]:
self.data[index] = item
else:
raise TypeError('Item must be a number.')
def append(self, item):
if type(item) in [int, float]:
self.data.append(item)
else:
raise TypeError('Item must be a number.')
def square_values(self):
for item in self.data:
print(item ** 2)
custom_list = NumberList([1, 2, 3, 'datagy'])
print(custom_list)
# Returns: [1, 2, 3]
In the code block above, we created a UserList. There’s a lot going on there! We can see that when we create the list that, despite passing in 'datagy'
, that it’s not included in the list.
Check out this complete guide to the collections UserList class to continue walking through these examples.
Python Collections UserDict for Custom Dictionaries
The UserDict class has been included in Python since version 1.6, however, it was moved to the collections module in Python 3. We can use the UserDict class to inherit from a dictionary to enhance or modify its default behavior. In particular, the class simply subclasses, rather than instantiates.
This means that the class inherits from the dictionary and emulates it. The UserDict also makes the original dictionary accessible through a .data
attribute.
The class created a data
attribute, which stores the contents of the underlying dictionary. This allows you to build custom methods and behavior that have access to the underlying data.
By default, Python dictionaries have unique keys, which are case-sensitive. Looking up the key 'age'
is different than looking up the key of 'AGE'
. If you’re not sure what capitalization your key has, this can lead to some unexpected behavior.
Let’s see how we can use the UserDict class to create a custom Python dictionary:
# Creating a Case-Insensitive Lookup in a Custom Dictionary
from collections import UserDict
class CaseInsensitiveDict(UserDict):
def __setitem__(self, key, value):
key = str(key).lower()
self.data[key] = value
def __getitem__(self, key):
key = str(key).lower()
return self.data[key]
custom_dict = CaseInsensitiveDict({'Name': 'Nik', 'Age': 33, 3: 3})
print(custom_dict['name'])
# Returns:
# Nik
In the example above, we have included two methods in the class.
When we created a new class that inherits from the UserDict, we wrap a dictionary inside of the UserDict. What this means is that the original dictionary is maintained. The original dictionary is also accessible using the data
attribute. Because of this, when we access the self.data
value, we can access, modify, and retrieve it directly.
Check out this complete guide to the collections UserDict class to continue walking through these examples.
Conclusion
In this guide, you learned all about the collections module in Python. The module provides specialized container data types that make working with different data types easier. For example, the module provides helpful containers, such as Counter, that allow you to easily count values. Similarly, the DefaultDict allows you to override missing keys in a dictionary.
The module opens up a ton of possibilities in terms of working with different data types, without needing to do much work. The module builds on top of dictionaries, lists, and strings to meet specific needs.
In this tutorial, you learned how to:
- Use collections Counter to easily count items
- Use namedtuple to make accessing items in tuples simpler
- Handle missing keys in dictionaries with defaultdict
- Create ordered dictionaries with OrderedDict
- Create queues and stacks with deque
- Chain dictionaries together with ChainMap
- Subclass strings, dictionaries, and lists with UserString, UserDict, and UserList
You can also find the official documentation for the collections module here.