In this complete guide to the Python itertools
library, you’ll dive into every single function available with easy-to-follow and practical examples. The itertools
library is a hidden gem that comes bundled with Python and continues to grow.
On the surface, many of the functions seem simple. Their power, however, is deepened when working with large sets of data or continuous streams of data. We’ll start off by exploring why you may want to (or even need to) make use of the library and then dive into the functions themselves.
By the end of this guide, you’ll have learned:
- The power of using generators from
itertools
- How to use infinite iterators to generate continuous sequences
- How to use terminating iterators to iterate over a sequence until different conditions are met
- How to use combinatronic iterators to combine different iterators in meaningful ways
Let’s get started!
Table of Contents
Function Types in Python’s Itertools
Python’s itertools module breaks its many functions into three main sections based on the function that they perform:
- Infinite iterators: These keep going forever, like counting numbers endlessly or endlessly cycling through a sequence.
- Terminating iterators: These stop when they hit a certain condition, such as taking elements until a condition is no longer met or skipping elements until a condition is met.
- Combinatronic iterators: These help you create different combinations or arrangements of elements from a list, allowing you to explore all the possible ways they can be organized.
We’ll use these broad groupings to walk through each of the functions. Where it’s helpful, I have included examples that illustrate how you might use the function. Some of them feel a little obtuse, until you remember why iterators might be more powerful than using naive Python.
Let’s explore this just a little further in the next section.
Why Use Iterators at all in Python?
Some of the functions in itertools
may feel redundant when working with small sample data. While Python isn’t the fastest language, it’s also not slow. So, when you’re working through tutorials like this one that aim to show you the mechanics of a function, it can feel like you’re not getting much out of the itertools
module.
Let’s dedicate a little bit of time to understanding the memory and time savings of using iterators over naive Python.
We’ll start with a simple example: creating permutations of values. A permutation is basically a rearrangement of a set of elements. With values 1, 2, and 3, here are all the permutations:
- (1, 2, 3)
- (1, 3, 2)
- (2, 1, 3)
- (2, 3, 1)
- (3, 1, 2)
- (3, 2, 1)
In Python, without using any additional tools, we can calculate permutations naively using the following function:
# Naive method to generate permutations
def generate_permutations_naive(data):
if len(data) == 0:
return [[]]
permutations = []
for i in range(len(data)):
remaining_elements = data[:i] + data[i+1:]
for permutation in generate_permutations_naive(remaining_elements):
permutations.append([data[i]] + permutation)
return permutations
In the function above, we use recursion to iterate over all our data points. Now, the code above is already quite complex (especially since many beginner programmers struggle with recursion).
Thankfully, itertools
provides a helpful function for calculating permutations. Let’s take a look at how we can accomplish the same thing with itertools
:
# Using itertools to generate permutations
import itertools
def generate_permutations_itertools(data):
return list(itertools.permutations(data))
We can already see how much more explicit and brief our code is! Now, what’s magical here is that as the name implies, itertools
uses iterators to speed up your code.
We can feed in a relatively small set of values, say the range(10)
, covering ten unique values. We’ll time how long each approach takes to generate a list of permutations:
# Sample data
data = list(range(10))
# Using itertools
start_time = time.time()
perms_itertools = generate_permutations_itertools(data)
end_time = time.time()
print("Time taken using itertools:", end_time - start_time)
# Using naive method
start_time = time.time()
perms_naive = generate_permutations_naive(data)
end_time = time.time()
print("Time taken using naive method:", end_time - start_time)
When we run this code, we get the following results:
- Time taken using itertools: 0.39 seconds
- Time taken using naive method: 10.09 seconds
We can see that even with a relatively manageable list of values, itertools
is over 25 times faster!
Now that I have made a good case for using itertools
, let’s dive into exploring all the functions the library has to offer.
Infinite Iterators in itertools
In this section, we’ll explore the three infinite iterators that are available in the itertools
module. These iterators have the potential for iterating, well, infinitely. This can be helpful if you don’t know how long a sequence should run for or you need persistent values being generated.
In particular, we’ll take a look at the following functions:
count()
, which allows you to generate an iterator that counts from a specific start with a specific stepcycle()
, which allows you to cycle through an iterator infinitely, andrepeat()
, which repeats an element either an infinite or a set amount of times
Let’s dive into each of these functions now.
Continuous Counting with Itertools’ count() Function
The itertools.count()
function is used to return an iterator that counts from a specific value using a specific value continuously. This is helpful when you need to have some way of keeping track of items, but don’t know for how long.
Let’s take a look at the most simple example, by calling the function without only its default parameters:
# Calling count()
counts = itertools.count()
print(counts)
# Returns: count(0)
Because we’re working with iterators, we need to iterate over our object in order to make use of it. We can do this in one of two ways:
- Calling the
next()
function, which grabs the next element in the iterator, or - Looping over the iterator itself
Because our iterator is an infinite iterator, we need to be mindful about some stopping condition. Let’s take a look at how we can grab the first few elements of our iterator:
# Understanding Default Behavior
for i in itertools.count():
if i >= 5:
break
print(i)
# Returns:
# 0
# 1
# 2
# 3
# 4
We can see that by using an if
statement, we were able to stop our iterator at a certain point.
Customizing Itertool’s count() Function
So far, we have explored how to use the itertools.count()
function using its default arguments. However, we can customize the behaviour by passing in two different arguments:
start=
defines what number to start at, andstep=
defines how to increment the numbers
Let’s take a look at how this works:
# Changing the Step Value
for i in itertools.count(start=1, step=2):
if i >= 5:
break
print(i)
# Returns:
# 1
# 3
We can see in our above example that our iterator started at 1 (rather than 0) and incremented by 2 (rather than by 1).
What’s great about this is that we can even use negative values and floating point values to make our counting even more custom:
# Using Negative Values
for i in itertools.count(-1.0, -0.5):
if i <= -2.5:
break
print(i)
# Returns:
# -1.0
# -1.5
# -2.0
We can see above that we implemented both negative values and floating point values to be more precise. If you’re following along in your own code editor, make sure to change the break condition!
Let’s now take a look at a practical example of how this function can be implemented.
Replicating Enumerate with itertools.count()
One of the most intuitive uses for the itertools.count()
function is to replicate the behaviour of the Python enumerate()
function. The enumerate()
function is used to iterate over an iterable and provide counters along the way.
Let’s take a look at how we can recreate the function using itertools
:
# Defining a Custom Enumerate Function
def custom_enumerate(iterable, start=0):
for item in iterable:
yield start, item
start += 1
for i, item in custom_enumerate(['welcome', 'to', 'datagy.io']):
print(i, item)
# Returns:
# 0 welcome
# 1 to
# 2 datagy.io
In the example above, we returned a generator object to lazily return items by using the yield
keyword.
Now, in practice, you’d never do this and you’d use enumerate()
instead. However, this example was meant to get your creativity flowing and to think about how else you might use the function.
Endless Cycling with Itertool’s cycle() Function
The itertools.cycle()
function is used to cycle, or repeat, over an iterable. For example, say you have a set of instructions and you want to cycle over them, you can use the cycle()
function.
Say we have a list of actions to go about our day and we want to repeat these, day after day.
# Understanding the cycle function
pattern = ['Wake up', 'Read datagy.io', 'Go to work', 'Enjoy life', 'Go to Sleep']
i = 0
for item in itertools.cycle(pattern):
if i >= 8:
break
i += 1
print(item)
# Returns:
# Wake up
# Read datagy.io
# Go to work
# Enjoy life
# Go to Sleep
# Wake up
# Read datagy.io
# Go to work
In the example above, we first defined a sequence that contained the items that we wanted to cycle over. In this case, we have five different items defining what a day might look like. We then pass this sequence into the itertools.cycle()
function as its only parameter.
From there, we can cycle over the list. Once all the items are depleted, we start the cycle again.
itertools.cycle() in Action: Turn-Based Games
The itertools.cycle()
function can be used in many different ways. One great way is to loop over players in a turn-based game.
Say you’re developing the next big Nintendo game and you want an efficient way to create a loop of instructions of who should play next. You could accomplish this using the following code:
# Looping over entities in a game
entities = ['Player1', 'Player2', 'Enemy1', 'Enemy2']
game_loop = itertools.cycle(entities)
for _ in range(6):
print(next(game_loop))
# Returns:
# Player1
# Player2
# Enemy1
# Enemy2
# Player1
# Player2
In the example above, we defined a list of game entities and passed them into the cycle()
function. We then used the Python range()
function to call a the next()
function six times.
We can see that this looped back to the Players after exhausting the full list.
itertools.cycle() in Action: Data Visualization
We can also use itertools.cycle()
to create cyclical data visualizations. The benefit of this approach is that it reduces the complexity of the code that we need to write.
Let’s take a look at an example, where we repeat a cycle of y-values:
# Create a plot using itertools.cycle()
import matplotlib.pyplot as plt
x = list(range(22))
cycle = itertools.cycle([0, 2, 1, 4])
y = [next(cycle) for _ in x]
plt.plot(x, y)
plt.show()
In the code above, we created our x-variables first. We then defined a cycle that we wanted to repeat. We used a list comprehension to define our y-values by cycling over the sequence for the length of x.
This returns the following image:
Now, let’s dive into the final infinite iterator, repeat()
.
Repeating Values with Itertool’s repeat() Function
The itertools.repeat()
function is used to, well, repeat a given value and infinite number of times. We’ll keep this section a little shorter since I find the use cases for this to be quite limited.
The function offers two parameters:
object=
which is the object to repeat, andtimes=
, an optional parameter, that defines how many times to repeat an item. If left blank, the iterator will repeat an infinite number of times
Let’s take a look at an example of how to use itertools.repeat()
:
# Repeating a Set Number of Times
reps = itertools.repeat(object=2, times=3)
print(list(reps))
# Returns:
# [2, 2, 2]
We can see that by passing in the object=2
and repeat=3
, we repeated the value 2, three times.
Combining itertools’ count() with map()
The Python documentation itself recommends combining this with the Python map()
function by providing a constant stream of values. Let’s take a look at how we can use this to map a function to a set of iterators.
# Combining with map()
vals = [1, 2, 3, 4]
squares = list(map(pow, vals, itertools.repeat(2)))
print(squares)
# Returns:
# [1, 4, 9, 16]
In the code block above, we used the repeat()
function to supply a consistent stream of values. In this case, we supplied the value 2, meaning that each value in our list vals
is squared.
Combinatoric Iterators in itertools
In this section, we’ll take a look at the different combinatoric iterators available in itertools
. The functions included in this section allow you to easily generate and manipulate combinations, permutations, and Cartesian products without needing explicit loops or extensive custom cost.
In particular, we’ll take a look at the following functions:
product()
, which returns the cartesian product, similar to a nested for-loop,permutations()
, which returns all possible orderings with no repeated elements,combinations()
, which returns all combinations in sorted order, with no repeated elements, andcombinations_with_replacement()
, which returns all combinations in sorted order, with repeated elements.
Let’s dive into each of these functions now.
Exploring Cartesian Products: Itertools’ product() Function
The itertools.product()
function is used to return the cartesian product of the input iterables. This is similar to created nested for-loops to iterate over multiple iterables at one time.
Let’s take a look at how this works:
# Using the product() Function in Itertools
letters = ['a', 'b', 'c']
values = [1, 2]
prod = list(itertools.product(letters, values))
print(prod)
# Returns:
# [('a', 1), ('a', 2), ('b', 1), ('b', 2), ('c', 1), ('c', 2)]
In the code block above, we used the product()
function to return the cartesian product of our two lists. We can see that it returns a set of tuples containing the items.
This works similar to a nested for loop, where the loop iterates from the right inward. Let’s see how we could replicate the function using a nested for loop, to see how much code and logic we’re saving:
# Replicating itertools.product() with Nested For Loops
letters = ['a', 'b', 'c']
values = [1, 2]
prod = []
for letter in letters:
for value in values:
prod.append((letter, value))
print(prod)
# Returns:
# [('a', 1), ('a', 2), ('b', 1), ('b', 2), ('c', 1), ('c', 2)]
We can see that even with just two inputs, we’re already simplifying our code quite a bit by using the product()
function.
The function also offers a named argument, repeat=
, which is used to specify that a value can be repeated in the cartesian product a number of times. For example, using a list of [1, 2]
and a repeat value of 3, the value 1 could be repeated three times, such as (1, 1, 1)
.
Let’s take a look at what this looks like:
# Using the repeat parameter in product()
values = [1, 2]
prod = list(itertools.product(values, repeat=3))
print(prod)
# Returns:
# [(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2), (2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]
Let’s now take a look at how this compares a SQL cross join using the itertools.product()
function.
Replicating a SQL Cross Join Using itertools product()
We can use the itertools.product()
function to replicate a SQL cross join. We can combine the function with the great Pandas library to create a DataFrame from a two lists.
Take a look at the code below:
SELECT *
FROM table1
CROSS JOIN table2
We can replicate this in Python and Pandas using the code below:
# Emulation a SQL Cross Join
import pandas as pd
table1 = [1, 2, 3]
table2 = ['a', 'b', 'c']
result = pd.DataFrame(itertools.product(table1, table2), columns=['table1', 'table2'])
print(result)
# Returns:
# table1 table2
# 0 1 a
# 1 1 b
# 2 1 c
# 3 2 a
# 4 2 b
# 5 2 c
# 6 3 a
# 7 3 b
# 8 3 c
We can see how using the product()
function has a ton of different possibilities. Using it in different contexts can save you a ton of time in terms of making your code more explicit.
Unraveling Permutations: Leveraging Itertools’ permutations() Function
We can use the itertools.permutations()
function to calculate permutations of a given iterable. A permutation is an arrangement of items in a specific order.
For example, if we had a list of values of [1, 2, 3]
, a single permutation of this might be [1, 3, 2]
. It’s important to note that permutations look at the number of elements, not their uniqueness. This means that a list containing [1, 1, 2]
, would have the same number of permutations as our original list.
The number of permutations available is always equal to the factorial of the number of items. For example, in our previous example, this would be 3!
. If you’re unfamiliar with factorials, this simply means 3*2*1
, or 6.
Let’s take a look at how we can implement permutations in Python:
# Implementing Permutations in Python
vals = [1, 2, 3]
print(list(itertools.permutations(vals)))
# Returns:
# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
We can see from the example above that our code does, in fact, return six different permutations of our original list!
Modifying Permutation Length
We can also modify the length of each permutation by making use of the option r=
parameter. By default, this is set to None
, which will use the length of the iterable as its length.
Let’s see how we can modify our code to return permutations of length 2:
# Modifying Permutation Length
vals = [1, 2, 3]
print(list(itertools.permutations(vals, 2)))
# Returns:
# [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
We can see that the length of each tuple that is returned is now length two.
Let’s now dive into an applied example: calculating the number of medal winners.
Applied Permutations: Calculating Medal Winners
In this section, we’ll explore an applied example of permutations. Say that we have an eight players competing in an Olympic sport and we want to know how many different permutations we might have for players winning either Gold, Silver, or Bronze medals.
What’s fun about this example is that only the first three items in our permutation matters! For example, say we have [1, 2, 3, 4, 5, 6, 7, 8]
, this means that players [1, 2, 3]
won Gold, Silver, and Bronze respectively. In another permutation, [1, 2, 3, 4, 5, 6, 8, 7]
, the same three players win the same medals. In this case, we’d be treating these as the same.
To solve this problem, we have three different options involving permutations: a long option, a short option, and an even shorter option! Let’s dive into the long option first:
# Count options for winning medals
def count_options(num_players):
options = itertools.permutations(range(num_players))
winner_options = set()
for option in options:
if option[:3] not in winner_options:
winner_options.add(option[:3])
return len(winner_options)
print(count_options(8))
# Returns: 336
In the example above, we created our list of permutations based on all players. We then created an empty set and iterated over each option. If the winning players (the first three) aren’t in the set, we add it to the set. We then return the length of that set.
Using the r=
parameter, we can actually simplify this code dramatically! Because we only care about the available options of three players, we can simply pass r=3
into the function call:
# Simplifying our code
options = len(list(itertools.permutations(range(8), 3)))
print(options)
# Returns: 336
We can see how much simpler, cleaner and faster that code is!
Now, if you’re mathematically inclined, you might already know there’s an easier way to do this. Recall my earlier comment that the number of permutations will always be equal to the factorial of the length of the iterable. So in this case, we would have 8!
permutations.
But, we only care about the first three elements. Well, we know that the remaining elements will be of length five, meaning that there are 5!
permutations. Using some math rules, the number of permutations of winners is equal to:
8! / 5!
When we divide factorial, what’s left is the 8*7*6
(where we remove the numbers equal to and less than the factorial in the denominator). When we do the math here, we see that we get the same result as before: 336!
Navigating Combinations: Harnessing Itertools’ combinations() Function
Now that we have covered permutations, let’s explore another function: itertools.combinations()
. Combinations refers to a subset of a collection of items, similar to a permutation, but without considering the ordering.
For example, let’s explore a simple list: [1, 2, 3]
:
- Permutations of length two would be:
[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
- Combinations of length two would be:
[(1, 2), (1, 3), (2, 3)]
While permutations treat items with the same values as distinct (as long as their order is distinct), combinations ignore the order in the resulting item.
What is uniqueness?
It’s important here to note that uniqueness doesn’t refer to whether an item is unique from another, but simply that it is an item itself. For example, a list containing [1, 1, 1]
, would have the same number of combinations as a list containing [1, 2, 3]
.
Now that you have a good understanding of how combinations work, let’s take a look at an example. In the code block below, we first create a list containing three values. We then create a list of all the length 2 combinations from that list:
vals = [1, 2, 3]
combinations = list(itertools.combinations(vals, 2))
print(combinations)
# Returns:
# [(1, 2), (1, 3), (2, 3)]
We can see in the example above that we were able to return three different combinations. This, again, highlights the difference between permutations, which would have returned six!
Calculating Team Combinations
A very practical example for the itertools.combinations()
function is to count the number of combinations of teams you can have. Say you have a group of five people and you want to place them into teams of three. How many combinations can you have?
This is where the combinations()
function makes easy work of this. Let’s take a look at how we can accomplish this:
players = ['Alice', 'Bob', 'Charlie', 'David', 'Emily']
team_size = 3
team_combinations = itertools.combinations(players, team_size)
print(len(list(team_combinations)))
# Returns: 10
We pass our list of names and the team size into the combinations()
function to generate all possible combinations. By converting it to a list and getting its length, we can see that there are ten different combinations possible.
Empowering with Combinations: Utilizing Itertools’ combinations_with_replacement() Function
itertools
provides one final combinatronic function: combinations_with_replacement()
. As the name suggests, this function generates combinations with replacement.
Let’s take a look at how the function works:
# Exploring combinations_with_replacement()
vals = [1, 2, 3]
combinations = list(itertools.combinations_with_replacement(vals, r=2))
print(combinations)
# Returns:
# [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
In the code block above, we used a familiar example. We wanted to explore the length 2 combinations of the values [1, 2, 3]
– this time, however, with replacement. This means that any value can be used again the combination!
Similarly to the regular combinations()
function, this function takes two parameters:
iterable=
, which is the iterable that we want to draw combinations from, andr=
, which represents the length of the resulting combination
Let’s take a look at a practical example of using the function.
Counting Dice Roll Combinations
In this example, we’ll explore how many combinations we could have in rolling two dice. To keep things simple, we’ll use a six-sided die, which can be represented using the range(1, 7)
object. We want to roll our dice twice. Logically, each number can be picked again.
Let’s see how we would count the number of options resulting from this:
# Rolling two dice
dice_outcomes = range(1, 7)
roll_count = 2
combos = itertools.combinations_with_replacement(dice_outcomes, roll_count)
print(len(list(combos)))
# Returns: 21
By converting our resulting iterable to a list and getting its length, we can see that there are 21 different combinations available by rolling our dice!
Terminating Iterators in Itertools
In this section, we’ll take a look at the different terminating iterators available in itertools
. The functions included in this section allow you to work with sequences by consuming them and returning a modified product.
In particular, we’ll take a look at the following functions:
accumulate()
, which returns an iterator that includes accumulated results, such as sumsbatched()
, which returns batches of a specified size from an iteratorchain()
, which creates an endless loop from an iterator by starting a sequence over once it’s exhaustedcompress()
, which filters an iterator to only include values that map toTrue
dropwhile()
, which drops elements from the iterable as long as the predicate is truefilterfalse()
, which keeps only items where the mapped function returnsFalse
groupby()
, which returns groups of objects as it iterators over an iterableislice()
, which returns an iterator of selected elements from an iterablepairwise()
, which returns successive overlapping pairs taken from the inputstarmap()
, which computes the function using arguments obtained from the iterable and is a good alternative tomap()
when items are pre-zippedtakewhile()
, which keeps elements from the iterable as long as the predicate is truetee()
, which returns n number of independent iterators from a single iteratorzip_longest()
, which allows you to zip iterators of unequal length, keeping the longest sequence
There’s quite a lot to cover here, so let’s dive right in!
accumulate(): Accumulating Results in an Iterator
The itertools.accumulate()
function allows you to easily accumulate results over an iterable. To put this more plainly, we can think of this as generating cumulative results, such as cumulative sums.
Let’s take a look at very simple example first by finding the cumulative sum of a list of values:
# Calculating Cumulative Sums of a List
vals = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
accumulated = list(itertools.accumulate(vals))
print(accumulated)
# Returns:
# [3, 7, 13, 15, 16, 25, 25, 32, 37, 45]
We can see in the example above that we passed our list of values into the accumulate()
function. By default, the function will add the values together in a pair-wise manner over the sequence of the iterator.
We can also specify a different function by using the func=
parameter. Let’s see how this changes when we want to calculate the running maximum:
# Using Built-In Functions for accumulate()
vals = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
accumulated = list(itertools.accumulate(vals, max))
print(accumulated)
# Returns:
# [3, 4, 6, 6, 6, 9, 9, 9, 9, 9]
We can even specify our own functions to accumulate. The main restriction here is that the function needs to be a binary function, meaning that it takes two inputs and returns a single input. Let’s take a look at how we can define a function to generate a cumulative product:
# Calculating Cumulative Products with accumulate()
vals = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
def mul(x, y):
return x * y
accumulated = list(itertools.accumulate(vals, mul))
print(accumulated)
# Returns:
# [3, 12, 72, 144, 144, 1296, 0, 0, 0, 0]
We can see here that we defined a custom function that multiplies two values together. We then pass this function into the accumulate()
function.
Calculating Bank Balances with Itertools Accumulate
In this example, we’ll take a look at a practical example that puts all of the parameters of the itertools.accumulate()
function to use.
There’s a third parameters that we haven’t yet covered off: initial=
. The parameter allows you to pass in an initial value to accumulate from. Think of this as prepending an item to an iterable before passing it into the function.
The initial parameter was introduced in Python 3.8
The initial=
parameter wasn’t always available. It’s only been around since Python 3.8, meaning that some of your code may not be backwards compatible. Keep this in mind for when you’re writing code.
Let’s take a look at a practical example. Image we want to keep track of our checking account balance after every transaction. We know what our transactions were and we’ve been dutifully storing them in a Python list. We also know our starting balance.
With these two items in mind, we can create a list that keeps track of our running balance after each transaction:
# List of expenses (negative values) and incomes (positive values) over a month
expenses_and_incomes = [500, -200, -100, 300, -150, 600, -250, -200]
initial_balance = 5000
# Calculate the balance after each transaction
balance = list(itertools.accumulate(expenses_and_incomes, initial=initial_balance))
print(balance)
# Returns:
# [5000, 5500, 5300, 5200, 5500, 5350, 5950, 5700, 5500]
In the example above, we passed in our initial balance. Since we didn’t pass in a function, Python simply adds the values to one another.
batched(): Generating Batches from an Iterator
The itertools.batched()
was introduced in Python 3.12 and represents a true hidden gem of the library. The function allows you to create batches of a certain size from any iterable. Previously, you had to rely on less elegant ways of splitting a list into different chunks.
Let’s take a look at how the function works:
vals = [1, 2, 3, 4, 5, 6]
batches = list(itertools.batched(vals, 3))
print(batches)
# Returns:
# [(1, 2, 3), (4, 5, 6)]
In our example above, we passed a list with six items into the function. We asked it to split the list into sets of length 3, which returned two tuples.
But what happens when a list cannot be split fully into equal-sized chunks? For example, what would happen if we added a seventh item to our list? Let’s take a look to find out:
# Splitting an Iterable With Leftover Values
vals = [1, 2, 3, 4, 5, 6, 7]
batches = list(itertools.batched(vals, 3))
print(batches)
# Returns:
# [(1, 2, 3), (4, 5, 6), (7,)]
We can see that when pass in a list that can’t be evenly divided, then the final item contains only the leftover values.
chain(): Creating an Endless Loop from an Iterator
The itertools.chain()
function is used to chain a set of iterables together into a single iterable. Once the first iterator is exhausted, the function moves seamlessly onto its next.
Let’s take a look at an example of how this works:
vals1 = [1, 2, 3]
vals2 = [4, 5, 6]
chained = list(itertools.chain(vals1, vals2))
print(chained)
# Returns:
# [1, 2, 3, 4, 5, 6]
We can see that in the example above that we were able to combine the two lists together.
But what happens if you have a big list, such as [[1, 2], [3], [4, 5], [6]]
. Passing in this single list into the function won’t work because it’s a single iterator. How, then, can we combine the lists into a single list?
One option is to use the unpacking operator *
to access all of the items, as shown below:
# Using * with itertools.chain
vals = [[1, 2], [3], [4, 5], [6]]
chained = list(itertools.chain(*vals))
print(chained)
# Returns:
# [1, 2, 3, 4, 5, 6]
While this approach works, there’s actually a simpler way to do this. We can call the itertools.chain.from_iterable()
function, which takes a single iterable and unpacks it.
Let’s see what this alternative approach looks like:
# Chaining Iterators Together
vals = [[1, 2], [3], [4, 5], [6]]
chain = itertools.chain.from_iterable(vals)
print(list(chain))
# Returns:
# [1, 2, 3, 4, 5, 6]
This approach feels a bit cleaner. For smaller iterators there’s likely very little difference in performance, but for larger iterators this second approach will work faster.
compress(): Filtering Values in an Iterator
The itertools.compress()
function is used to create an iterator that contains only items that have a corresponding truth-y selector.
The function expects you to pass in two parameters:
data=
, an iterable containing the data you want to compress, andselectors=
, an iterable containing the indices of items you want to keep
Because the function works on truthy-ness, you can use any type of data in selectors=
. For example, a list containing [True, False, False, True]
would create the same selection as [1, 0, 'apple', '']
.
Let’s take a look at a simple example:
# Exploring itertools.compress()
vals = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
filters = itertools.cycle([1, 0])
compressed = list(itertools.compress(vals, filters))
print(compressed)
# Returns:
# [0, 2, 4, 6, 8, 10]
In the example above, we use a list containing the values zero through ten. We then used the magic of itertools.cycle()
to select every second element. We can see that this selects only items that are equal to 1.
Let’s now take a look at a practical example.
Filtering Shipping Orders with Itertools.Compress
Imagine we’re working for a company that needs to ship electronics. Unfortunately, our data are stored in different ways: our orders are stored as dictionaries, but their statuses are stored as a list of booleans.
We can use the compress()
function to filter our values to only show orders that have shipped:
# Sample data: List of orders and corresponding list of shipping statuses
orders = [
{'order_id': 101, 'product': 'Laptop', 'customer': 'John Doe'},
{'order_id': 102, 'product': 'Phone', 'customer': 'Alice Smith'},
{'order_id': 103, 'product': 'Tablet', 'customer': 'Bob Johnson'},
{'order_id': 104, 'product': 'Headphones', 'customer': 'Emily Brown'}
]
shipping_status = [True, False, True, False] # True if shipped, False if pending
# Filter out orders that have been shipped
pending_orders = list(itertools.compress(orders, shipping_status))
# Generate a report of pending orders
print("Pending Orders:")
for order in pending_orders:
print(f"Order ID: {order['order_id']}, Product: {order['product']}, Customer: {order['customer']}")
# Returns:
# Pending Orders:
# Order ID: 101, Product: Laptop, Customer: John Doe
# Order ID: 103, Product: Tablet, Customer: Bob Johnson
Using the code above, we can see that only two items have shipped so far. Personally, I often struggle to find a good use for the function when data are set up properly. However, when working with disparate data, this can be quite useful.
dropwhile(): Dropping Elements from an Iterable
The itertools.dropwhile()
function is used to drop items in an iterable until a condition is met. The function takes two arguments:
predicate=
is the function to evaluate against, anditerable=
is the iterable to drop values from.
Let’s take a look at a simple example:
# Dropping Values Until a Condition is Met
vals = [1, 2, 3, 4, 5]
kept_vals = list(itertools.dropwhile(lambda x: x < 3, vals))
kept_vals
# Returns:
# [3, 4, 5]
In the example above, we first created a list of values containing the numbers one through five. We then passed this into the dropwhile()
function with a predicate that evaluates whether an item is less than three.
What this function call does is drop any items while each item is less than 3. Once the first item is equal to three or more, it keeps all items.
Be mindful of how your data is sorted!
In the example we walked through above, the data are sorted in increasing order. However, if the data were sorted differently, our results might look differently. This is because the dropwhile()
function doesn’t look at all items in a list, but rather only their positions.
Making note of the flag above, let’s take a look at an example that illustrates this:
# Dropping Values Until a Condition is Met
vals = [1, 4, 3, 2, 1]
kept_vals = list(itertools.dropwhile(lambda x: x < 3, vals))
kept_vals
# Returns:
# [4, 3, 2, 1]
In the example above, we switched the original list of values. In our returned list, we actually have values that are below three! This is because the iterator that is created only starts creating values when the first condition is met.
filterfalse(): Filtering False Values in an Iterator
The itertools.filterfalse()
is used to return a generator that contains only items for which a predicate function returns False
.
The function takes two parameters:
predicate=
is the function to evaluate against, anditerable=
is the iterable to drop values from.
This is best example using an example:
# Understanding Filter False
vals = [1, 2, 3, 2, 1]
filtered = list(itertools.filterfalse(lambda x: x <= 1, vals))
print(filtered)
# Returns:
# [2, 3, 2]
In the example above, we passed in a function that evaluates whether an item is than or equal to 1. For any item where this condition is True
, the items are not returned.
groupby(): Grouping Objects in an Iterator
The itertools.groupby()
function is a powerful way to group items based on a consecutive key from an iterable. The function returns both keys and groups.
The function expects two parameters:
iterable=
is the iterable to group values from, andkey=
is a function that computes a key value for each element and defaults toNone
(returning the identity function)
Let’s take a look at a simple example:
# Getting Group Values
vals = [1, 1, 1, 2, 2, 3, 3, 3]
for key, group in itertools.groupby(vals):
print(key, list(group))
# Returns:
# 1 [1, 1, 1]
# 2 [2, 2]
# 3 [3, 3, 3]
In the example above, we grouped our list of values by each item itself (meaning that consecutive 1s are grouped together and so on). The function returns tuples in each generated value, containing the key and the group of values.
While this seems like it’s very similar to the SQL GROUPBY
function, it actually behaves quite differently. The itertools.groupby()
function groups values based on their position (and key, of course). This can yield some unexpected results if your data is not sorted in a meaningful way.
Let’s explore this is in another example:
# Groupby Groups Only Sequences on Repeating Values
vals = [1, 1, 1, 2, 2, 1, 1, 3, 3, 3]
for key, group in itertools.groupby(vals):
print(key, list(group))
# Returns:
# 1 [1, 1, 1]
# 2 [2, 2]
# 1 [1, 1]
# 3 [3, 3, 3]
We can see in the example above that the function actually returns two groups containing a key of 1
! We can work around this by first wrapping our data in the sorted()
function:
# Presorting Values for Different Behaviour
vals = [1, 1, 1, 2, 2, 1, 1, 3, 3, 3]
for key, group in itertools.groupby(sorted(vals)):
print(key, list(group))
# Returns:
# 1 [1, 1, 1, 1, 1]
# 2 [2, 2]
# 3 [3, 3, 3]
So far, we haven’t made use of the key=
parameter at all aside from using its default value of None
. We can make creative use of this function by passing in a different key.
Let’s take a look at how we can group items based on their length by passing in the len
function:
# Groupby With a Key
vals = ['hi', 'hi', 'hey', 'hi', 'bye', 'hey']
for key, group in itertools.groupby(vals, len):
print(key, list(group))
# Returns:
# 2 ['hi', 'hi']
# 3 ['hey']
# 2 ['hi']
# 3 ['bye', 'hey']
In the example above, we passed the len
function object into our groupby()
function call. It grouped consecutive items based on their length. But we can take this one step further and first sort our data:
# Groupby With a Key with Sorted Values
vals = ['hi', 'hi', 'hey', 'hi', 'bye', 'hey']
for key, group in itertools.groupby(sorted(vals, key=len), len):
print(key, list(group))
# Returns:
# 2 ['hi', 'hi', 'hi']
# 3 ['hey', 'bye', 'hey']
Now that you understand how the function works, let’s take a look at a practical example.
Grouping Users by Age
Imagine we have a database that contains a list of dictionaries where each dictionary has a game’s user name and age. We want to group our users by their age.
We can do this first by ensuring that our data are sorted, similar to our example above. We then use a predicate that grabs the user’s age using a lambda function.
Let’s see what this looks like:
# Working with Different Data Structures
data = [
{'Name': 'Nik', 'Age': 35},
{'Name': 'Katie', 'Age': 35},
{'Name': 'Evan', 'Age': 33},
{'Name': 'Kyra', 'Age': 36},
{'Name': 'Steve', 'Age': 33}
]
groups = itertools.groupby(sorted(data, key=lambda x: x['Age']), lambda x: x['Age'])
for key, group in groups:
print(f'Age {key}: {[user["Name"] for user in group]}')
# Returns:
# Age 33: ['Evan', 'Steve']
# Age 35: ['Nik', 'Katie']
# Age 36: ['Kyra']
In the code block above, we are able to return an easily readable print statement where our users are grouped by their age.
islice(): Selecting Elements from an Iterable
The itertools.islice()
function is used to return searched elements from an iterable, without first needing to create a new iterable. This becomes very powerful when working with very large iterables or infinite iterables we explored earlier.
The function has two different call types, depending on if you want start and step parameters added:
# Option 1
itertools.islice(iterable, stop)
# Option 2
itertools.islice(iterable, start, stop[, step])
Let’s take a look at a simple example here using only a stop argument:
vals = [0, 1, 2, 3, 4, 5]
sliced = itertools.islice(vals, 3)
print(list(sliced))
# Returns:
# [0, 1, 2]
On the surface, this doesn’t seem much different from simply indexing our list of values. In fact, with this simple example, we could have simply written vals[:3]
and received the same list back.
Where this becomes much more powerful is when we work with infinite generators, such as itertools.count()
. Let’s see what happens when we apply regular indexing to this scenario:
# Indexing a count() object
counts = itertools.count()
sliced = counts[:3]
# Raises:
# TypeError: 'itertools.count' object is not subscriptable
We can see that this fails as the generator the count()
function returns cannot be indexed. This is where we must use islice()
, if we don’t want to first create a list out of our count object (which might use significant memory).
# Using islice() to slice infinite sequences
counts = itertools.count()
sliced = itertools.islice(counts, 3)
print(list(sliced))
# Returns:
# [0, 1, 2]
We can see that this is able to slice the infinite sequence (without first causing memory-intensive list generations).
pairwise(): Generating Successive Overlapping Pairs
The itertools.pairwise()
function was introduced in Python 3.10 and returns successive overlapping pairs from an iterable. The function will always return one less iterable than there are items in the sequence.
This function is a fantastic addition that makes otherwise ugly code quite readable and intuitive. Let’s take a look at how you may have previously implemented a similar function:
# Implementing a pairwise list of items without itertools
def pairwise(iterable):
return [(iterable[i], iterable[i + 1]) for i in range(len(iterable) - 1)]
While this implementation is short, it isn’t exactly readable. It also doesn’t handle edge-cases, such as when there are less than two items.
Let’s take a look at how the pairwise()
function works:
text = 'datagy.io'
pairs = itertools.pairwise(text)
print(list(pairs))
# Returns:
# [('d', 'a'), ('a', 't'), ('t', 'a'), ('a', 'g'), ('g', 'y'), ('y', '.'), ('.', 'i'), ('i', 'o')]
We can see that the function creates tuples of successive overlapping pairs.
Calculating Percentage Differences Using Pairwise
We can use this function to easily calculate the percentage difference between successive pairs of values. Let’s take a look at how we can do this elegantly:
# Calculating Percentage Differences
stock_prices = [100, 110, 120, 115, 125]
percentage_changes = [(b - a) / a for a, b in itertools.pairwise(stock_prices)]
print([f'{val:.2%}' for val in percentage_changes])
# Returns:
# ['10.00%', '9.09%', '-4.17%', '8.70%']
In the code block above, we used a list comprehension to iterate over the returned pairwise set of pairs. This allows us to easily see how much each successive item has changed.
starmap(): Computing Function Results with Iterable Arguments
The itertools.starmap()
function is used to map a function using arguments obtained from an iterable. What this means is that you might map a function at a list containing [(1, 2), (2, 3), ... ]
.
This is what differentiates it from the regular Python map()
function, in that it assumes that the values are “pre-zipped” already.
Let’s take a look at an example of how this works:
# Understanding the starmap() Function
values = [(1, 2), (2, 3), (3, 4)]
mapped = list(itertools.starmap(pow, values))
print(mapped)
# Returns:
# [1, 8, 81]
We can see that in the example above, the first value in each tuple is used as the base while the second value is used as the power argument. This means that the function would return the following 12, 23, 34.
Starmap Applied: Calculating Areas of Triangles
In this section, we’ll explore how to use the starmap()
function to calculate the areas of different triangles. Imagine that we have a list of tuples that contains the length of the base and the height of different triangles.
We can define a custom function that contains the formula 0.5 * base * height
. We then pass this function in as our predicate.
# Calculating Area of Triangles Using Starmap
def calculate_triangle_area(base, height):
"""Calculate the area of a triangle using base and height."""
return 0.5 * base * height
shapes = [
(3, 4),
(6, 8),
(5, 12)
]
# Apply the calculate_triangle_area function to each tuple in shapes using itertools.starmap
areas = itertools.starmap(calculate_triangle_area, shapes)
# Convert the generator to a list and print the result
print(list(areas))
# Returns:
# [6.0, 24.0, 30.0]
We can see how simply Python allows us to calculate the area of each triangle by making use of the successive pairs.
takewhile(): Taking Elements from an Iterable While a Condition Holds True
The itertools.takewhile()
function works in the opposite way to the dropwhile()
function in that it keeps items until a given condition is met.
It’s important to keep in mind that the function will run until a condition is met, but not filter the list based on the values of the items. Let’s take a look at an example to see how this works:
# Filtering Grades
grades = [90, 85, 88, 70, 65, 75, 80]
passing_grades = list(itertools.takewhile(lambda x: x >= 70, grades))
print(passing_grades)
# Returns:
# [90, 85, 88, 70]
We can see here that the function works by getting items until a condition is met. If we want to filter the values more explicitly, we first need to sort the list, as shown below:
# Working with sorted data
grades = [90, 85, 88, 70, 65, 75, 80]
passing_grades = list(itertools.takewhile(lambda x: x >= 70, sorted(grades, reverse=True)))
print(passing_grades)
# Returns:
# [90, 88, 85, 80, 75, 70]
In the example above, we first sorted our values and then used the takewhile()
function to get all the passing grades.
tee(): Creating Independent Iterators from a Single Iterator
The itertools.tee()
function works by duplicating an iterator, creating multiple independent iterators from a single source iterable. These new iterators can be used separately and concurrently to iterate over the same elements of the original iterable.
Because the iterators are independent, it’s important to note that even if the original iterator is modified, the independent ones will remain unchanged.
Let’s take a look at how this function works:
# Understanding itertools.tee()
vals = [1, 2, 3]
a, b = itertools.tee(vals, 2)
print(list(a))
print(list(b))
# Returns:
# [1, 2, 3]
# [1, 2, 3]
We can see that by using the tee()
function that the function returns two different iterables.
At the surface, the benefit isn’t immediately clear. But the way the function works is by letting you use up values in one iterator while maintaining the original values in another. Let’s see what this looks like:
# Exhausting one iterator with itertools.tee()
vals = [1, 2, 3]
a, b = itertools.tee(vals, 2)
print(list(a))
print(list(a))
print(list(b))
# Returns:
# [1, 2, 3]
# []
# [1, 2, 3]
In the example above, we converted the iterator an
into a list twice. However, the second time it returns an empty list. This is because we exhausted the iterator the first time. However, the second iterator remained unchanged!
zip_longest(): Zipping Iterators of Unequal Length, Handling Longest Sequence
The itertools.zip_longest()
function works in a similar way to the powerful built-in zip function, but allows you to zip the longest iterable, rather than the shortest.
Let’s take a look at how the normal zip()
function works first:
# Using the regular zip function
nums = [1, 2, 3, 4]
letters = ['a', 'b', 'c']
for n, l in zip(nums, letters):
print(n, l)
# Returns:
# 1 a
# 2 b
# 3 c
We can see that in this case the function stops zipping items when the shortest sequence is exhausted.
In some cases, however, you might want to keep zipping items based on the values in the longest sequence. In this case, you can use the zip_longest()
function, as shown below:
nums = [1, 2, 3, 4]
letters = ['a', 'b', 'c']
for n, l in itertools.zip_longest(nums, letters):
print(n, l)
# Returns:
# 1 a
# 2 b
# 3 c
# 4 None
The example above highlights that the function uses a fill value of None
, when nothing is provided. However, you can also specify your own fill value using the fillvalue=
parameter. Let’s include a -99, as is commonly done in social sciences:
nums = [1, 2, 3, 4]
letters = ['a', 'b', 'c']
for n, l in itertools.zip_longest(nums, letters, fillvalue=-99):
print(n, l)
# Returns:
# 1 a
# 2 b
# 3 c
# 4 -99
We can see in the example above that the function now returns out missing values using a provided fill value.
Conclusion
In this comprehensive guide we covered every single function available in the powerful Python itertools
library. We used illustrative examples to highlight how to use some of the functions, hopefully sparking some creativity!
To learn more about itertools, check out the official documentation.