In this tutorial, you’ll learn how to find and work with duplicates in a Python list. Being able to work efficiently with Python lists is an important skill, given how widely used lists are. Because Python lists allow us to store duplicate values, being able to identify, remove, and understand duplicate values is a useful skill to master.
By the end of this tutorial, you’ll have learned how to:
- Find duplicates in a list, as well as how to count them
- Remove duplicates in Python lists
- Find duplicates in a list of dictionaries and lists
Let’s get started!
Table of Contents
How to Find Duplicates in a List in Python
Let’s start this tutorial by covering off how to find duplicates in a list in Python. We can do this by making use of both the set()
function and the list.count()
method.
The .count()
method takes a single argument, the item you want to count, and returns the number of times that item appears in a list. Because of this, we can create a lists comprehension that only returns items that exist more than once. Let’s see how this works and then break it down a bit further:
# Finding Duplicate Items in a Python List
numbers = [1, 2, 3, 2, 5, 3, 3, 5, 6, 3, 4, 5, 7]
duplicates = [number for number in numbers if numbers.count(number) > 1]
unique_duplicates = list(set(duplicates))
print(unique_duplicates)
# Returns: [2, 3, 5]
Let’s break down what we did here:
- We used a list comprehension to include any item that existed more than once in the list
- We then converted this to a set to remove any duplicates from the filtered list
- Finally, we converted the set back to a list
In the next section, you’ll learn how to find duplicates in a Python list and count how often they occur.
How to Find Duplicates in a List and Count Them in Python
In this section, you’ll learn how to count duplicate items in Python lists. This allows you to turn a list of items into a dictionary where the key is the list item and the corresponding value is the number of times the item is duplicated.
In order to accomplish this, we’ll make use of the Counter
class from the collections module. We’ll then filter our resulting dictionary using a dictionary comprehension. Let’s take a look at the code and then we’ll break down the steps line by line:
# Finding Duplicate Items in a Python List and Count Them
from collections import Counter
numbers = [1, 2, 3, 2, 5, 3, 3, 5, 6, 3, 4, 5, 7]
counts = dict(Counter(numbers))
duplicates = {key:value for key, value in counts.items() if value > 1}
print(duplicates)
# Returns: {2: 2, 3: 4, 5: 3}
Let’s break this code down, as it’s a little more complex:
- We import the
Counter
class from the collections library - We load our list of
numbers
- We then create a Counter object of our list and convert it to a dictionary
- We then filter our dictionary to remove any key:value pairs where the key only exists a single time
In the next section, you’ll learn how to remove duplicates from a Python list.
How to Remove Duplicates from a List in Python
Removing duplicates in a Python list is made easy by using the set()
function. Because sets in Python cannot have duplicate items, when we convert a list to a set, it removes any duplicates in that list. We can then turn the set back into a list, using the list()
function.
Let’s see how we can do this in Python:
# Remove Duplicates from a List in Python
from collections import Counter
numbers = [1, 2, 3, 2, 5, 3, 3, 5, 6, 3, 4, 5, 7]
unique = list(set(numbers))
print(unique)
# Returns: [1, 2, 3, 4, 5, 6, 7]
To learn about other ways you can remove duplicates from a list in Python, check out this tutorial covering many different ways to accomplish this! In the next section, you’ll learn how to find duplicates in a list of dictionaries.
How to Remove Duplicates in a List of Dictionaries in Python
Let’s take a look at how we can remove duplicates from a list of dictionaries in Python. You’ll often encounter data from the web in formats that resembles lists of dictionaries. Being able to remove the duplicates from these lists is an important skill to simplify your data.
Let’s see how we can do this in Python by making using a for a loop:
# Remove Duplicates from a List of Dictionaries
items = [{'name':'Nik'}, {'name': 'Kate'}, {'name':'James'}, {'name':'Nik'}, {'name': 'Kate'}]
unique_items = []
for item in items:
if item not in unique_items:
unique_items.append(item)
print(unique_items)
# Returns: [{'name': 'Nik'}, {'name': 'Kate'}, {'name': 'James'}]
This method will only include complete duplicates. This means that if a dictionary had, say, an extra key-value pair it would be included.
How to Remove Duplicates in a List of Lists in Python
We can use the same approach to remove duplicates from a list of lists in Python. Again, this approach will require the list to be complete the same for it to be considered a duplicate. In this case, even different orders will be considered unique.
Let’s take a look at what this looks like:
# Remove Duplicates from a List of Lists in Python
list_of_lists = [[1,2,3], [1,2], [2,3], [1,2,3], [2,3], [1,2,3,4]]
unique = []
for sublist in list_of_lists:
if sublist not in unique:
unique.append(sublist)
print(unique)
# Returns: [[1, 2, 3], [1, 2], [2, 3], [1, 2, 3, 4]]
What we do here is loop over each sublist in our list of lists and assess whether the item exists in our unique list. If it doesn’t already exist (i.e., it’s unique so far), then it’s added to our list. This ensures that an item is only added a single time to our list.
Conclusion
In this tutorial, you learned how to work with duplicate items in Python lists. First, you learned how to identify duplicate elements and how to count how often they occur. You then learned how to remove duplicate elements from a list using the set()
function. From there, you learned how to remove duplicate items from a list of dictionaries as well as a list of lists in Python.
Being able to work with lists greatly improves your Python programming skills. Because these data structures are incredibly common, being able to work with them makes you a much more confident and capable developer.
To learn more about the Counter
class from the collections library, check out the official documentation here.
Additional Resources
To learn about related topics, check out the tutorials below:
There are much more efficient ways of finding duplicates in list in python (yours is O(n^2)), and its probably just better using numpy library to do it:
import numpy as np
u,c = np.unique(list, return_counts=True)
duplicate_elements = u[c>1]
On a list of 40k elements its the difference between 13s and 8ms.
Thanks for the tip 🙂 I’ll update the post soon!