In this tutorial, you’ll learn how to use Python to count unique values in a list. You’ll also learn what the fastest way to do this is! You’ll learn how to accomplish this using a naive, brute-force method, the collections
module, using the set()
function, as well as using numpy
. We’ll close off the tutorial by exploring which of these methods is the fastest to make sure you’re getting the best performance out of your script.
The Quick Answer: Use Python Sets
# Using sets to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
num_values = len(set(a_list))
print(num_values)
# Returns 5
Table of Contents
Why count unique values?
Python lists are a useful built-in data structure! One of the perks that they offer is the ability to have duplicate items within them.
There may be many times when you may to count unique values contained within a list. For example, if you receive data in a list that tracks the number of log in into a site, you could determine how many unique people actually logged in.
Using Collections to Count Unique Values in a List
The built-in collections
module can be used to count unique values in a list. The module has a built-in object called Counter
that returns a dictionary-like object with the unique values as keys and the number of occurrences for values.
Because of this, we can counts the number of keys to count the number of unique values.
Tip! Want to learn more about the Python collections module and its Counter class? Check out my in-depth tutorial here, where you’ll learn how to count occurrences of a substring in a string.
Let’s see how we can use the Counter
object to count unique values in a Python list:
# Use Counter from collections to count unique values in a Python list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
from collections import Counter
counter_object = Counter(a_list)
keys = counter_object.keys()
num_values = len(keys)
print(num_values)
# Returns 5
Let’s see what we’ve done here:
- We passed our list into the Counter object to create a unique object
- We get the keys using the .keys() attribute
- Finally, we get the length of that new object
We can make this much easier to write by simply chaining the process together, as shown below.
# Using Counter from collections to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
from collections import Counter
num_values = len(Counter(a_list).keys())
print(num_values)
# Returns 5
This process returns the same thing, but is much quicker to write!
Using Sets to Count Unique Values in a Python List
Another built-in data structure from Python are sets. One of the things that separate sets from lists is that they can only contain unique values.
Python comes built with a set()
function that lets you create a set based on something being passed into the function as a parameter. When we pass a list into the function, it turns the list into a set, thereby stripping out duplicate values.
Now, let’s see how we can use sets to count unique values in a list:
# Using sets to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
set = set(a_list)
num_values = len(set)
print(num_values)
# Returns: 5
What we’ve done here is:
- Turned our list into a set using the built-in
set()
function - Returned the number of values by counting the length of the set, using the
len()
function
We can also make this process a little faster by simply chaining our methods together, as demonstrated below:
# Using sets to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
num_values = len(set(a_list))
print(num_values)
# Returns 5
This returns the same value but is a little faster to write out.
Want to learn more? Learn four different ways to append to a list in Python using this extensive tutorial here.
Use Numpy to Count Unique Values in a Python List
You can also use numpy to count unique values in a list. Numpy uses a data structure called a numpy array, which behaves similar to a list but also has many other helpful methods associated with it, such as the ability to remove duplicates.
Let’s see how we can use numpy to count unique values in a list:
# Use numpy in Python to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
import numpy as np
array = np.array(a_list)
unique = np.unique(array)
num_values = len(unique)
print(num_values)
Let’s see what we’ve done here:
- We imported numpy as np and created an array using the
array()
function - We used the
unique()
function from numpy to remove any duplicates - Finally, we calculated the length of that array
We can also write this out in a much faster way, using method chaining. Let’s see how this can be done:
# Use numpy in Python to count unique values in a list
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
import numpy as np
num_values = len(np.unique(np.array(a_list)))
print(num_values)
# Returns 5
This returns the same result as before. Under the hood, this is the same approach that the Pandas unique method uses.
Use a For Loop in Python to Count Unique Values in a List
Finally, let’s take a look at a more naive method to count unique items in a list. For this, we’ll use a Python for loop to iterate over a list and count its unique items.
a_list = ['apple', 'orage', 'apple', 'banana', 'apple', 'apple', 'orange', 'grape', 'grape', 'apple']
unique_list = list()
unique_items = 0
for item in a_list:
if item not in unique_list:
unique_list.append(item)
unique_items += 1
print(unique_items)
Let’s see what we’ve done here:
- We create a new list called
unique_list
and an integer of 0 calledunique_items
- We then loop over our original list and see if the current item is in the
unique_list
- If it isn’t, then we append it to the list and add 1 to our counter
unique_items
Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!
What Method is Fastest to Count Unique Values in a Python List?
Now that you’ve learned four unique ways of counting unique values in a Python list, let’s take a look at which method is fastest.
What we’ll do is create a Python decorator to time each method. We’ll create a function that executes each method and decorate it to identify how long its execution takes.
For out sample list, we’ll use the first few paragraphs of A Christmas Carol, where each word is a list, and multiply that list by 10,000 to make it a bit of a challenge:
import time
def time_it(func):
"""Print the runtime of a decorated function."""
def wrapper_time_it(*args, **kwargs):
start_time = time.perf_counter()
value = func(*args, **kwargs)
end_time = time.perf_counter()
run_time = end_time - start_time
print(f"Finished {func.__name__!r} in {run_time:.10f} seconds")
return value
return wrapper_time_it
@time_it
def counter_method(a_list):
from collections import Counter
return len(Counter(a_list).keys())
@time_it
def set_method(a_list):
return len(set(a_list))
@time_it
def numpy_method(a_list):
import numpy as np
return len(np.unique(np.array(list)))
@time_it
def for_loop_method(a_list):
unique_list = list()
unique_items = 0
for item in a_list:
if item not in unique_list:
unique_list.append(item)
unique_items += 1
return unique_items
sample_list = ['Marley', 'was', 'dead:', 'to', 'begin', 'with.', 'There', 'is', 'no', 'doubt', 'whatever', 'about', 'that.', 'The', 'register', 'of', 'his', 'burial', 'was', 'signed', 'by', 'the', 'clergyman,', 'the', 'clerk,', 'the', 'undertaker,', 'and', 'the', 'chief', 'mourner.', 'Scrooge', 'signed', 'it:', 'and', 'Scrooge’s', 'name', 'was', 'good', 'upon', '’Change,', 'for', 'anything', 'he', 'chose', 'to', 'put', 'his', 'hand', 'to.', 'Old', 'Marley', 'was', 'as', 'dead', 'as', 'a', 'door-nail.', 'Mind!', 'I', 'don’t', 'mean', 'to', 'say', 'that', 'I', 'know,', 'of', 'my', 'own', 'knowledge,', 'what', 'there', 'is', 'particularly', 'dead', 'about', 'a', 'door-nail.', 'I', 'might', 'have', 'been', 'inclined,', 'myself,', 'to', 'regard', 'a', 'coffin-nail', 'as', 'the', 'deadest', 'piece', 'of', 'ironmongery', 'in', 'the', 'trade.', 'But', 'the', 'wisdom', 'of', 'our', 'ancestors', 'is', 'in', 'the', 'simile;', 'and', 'my', 'unhallowed', 'hands', 'shall', 'not', 'disturb', 'it,', 'or', 'the', 'Country’s', 'done', 'for.', 'You', 'will', 'therefore', 'permit', 'me', 'to', 'repeat,', 'emphatically,', 'that', 'Marley', 'was', 'as', 'dead', 'as', 'a', 'door-nail.', 'Scrooge', 'knew', 'he', 'was', 'dead?', 'Of', 'course', 'he', 'did.', 'How', 'could', 'it', 'be', 'otherwise?', 'Scrooge', 'and', 'he', 'were', 'partners', 'for', 'I', 'don’t', 'know', 'how', 'many', 'years.', 'Scrooge', 'was', 'his', 'sole', 'executor,', 'his', 'sole', 'administrator,', 'his', 'sole', 'assign,', 'his', 'sole', 'residuary', 'legatee,', 'his', 'sole', 'friend,', 'and', 'sole', 'mourner.']
sample_list *= 10000
counter_method(sample_list)
set_method(sample_list)
numpy_method(sample_list)
for_loop_method(sample_list)
# Returns
# Finished 'counter_method' in 0.2321387500 seconds
# Finished 'set_method' in 0.0463015000 seconds
# Finished 'numpy_method' in 0.2570261250 seconds
# Finished 'for_loop_method' in 7.1416198340 seconds
From this, we can see that while the Counter method and the Numpy methods are reasonably fast, the set method is the fastest of the bunch! This could be attributed to the fact that it doesn’t require the import of another method.
Conclusion
In this post, you learned how to count unique values in a Python list. You learned how to do this using built-in sets, using the collections
module, using numpy
, and finally using a for-loop. You then learned which of these methods is the fastest method to execute, to ensure you’re not bogging down your script unnecessarily.
To learn more about the Counter object in the collections module, you can check out the official documentation here.
On example 2 – you really shouldn’t be naming your variable list as it will override the list function from the standard library.
If I do ever give it a generic name, I usually call is `lst`
Thanks for catching that, Lewis! I have updated the article.