Relative Frequencies and Absolute Frequencies in Python and Pandas

pandas relative frequencies

In this post, you’ll learn how to calculate relative frequencies and absolute frequencies using pure Python, as well as the popular data science library, Pandas.

A relative frequency, measures how often a certain value occurs in a dataset, relative to the total number of values in that dataset.

An absolute frequency, meanwhile, simply measures how often a certain value occurs.

The Quick Answer: Calculating Absolute and Relative Frequencies in Pandas

If you’re not interested in the mechanics of doing this, simply use the Pandas .value_counts() method. This generates an array of absolute frequencies. If you want relative frequencies, use the normalize=True argument:

import pandas as pd
df = pd.DataFrame(data = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple'], columns=['Fruit'])

absolute_frequencies = df['Fruit'].value_counts()
relative_frequencies = df['Fruit'].value_counts(normalize=True)

To learn about the .value_counts() method in detail, check out my other post here.

Loading out data and dataset

Let’s begin by loading in some data. Since you will be learning how to calculate relative and absolute frequencies in both pure Python as well as in Pandas, we’ll be loading data as both a simple list as well as a Pandas dataframe.

Let’s load it in now:

# Create a list of data
fruits = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple']

# If using Python, create a dataframe called df
import pandas as pd
df = pd.DataFrame(data=fruits, columns=['Fruit'])

print(df.head())
# Returns
#     Fruit
# 0   apple
# 1   apple
# 2  banana
# 3  orange
# 4   apple

So, we have a list that contains a number of different strings, as well as a dataframe with a single column, named Fruit.

How to calculate absolute frequencies with Python

An absolute frequency, simply measures how often a certain value occurs.

Calculate Absolute Frequencies with a Dictionary

The easiest way to do this is by using a dictionary and looping over your list:

  1. Initialize a dictionary,
  2. Loop over the list,
  3. If the list item exists in the dictionary, add one,
  4. If not, set the value to one.
fruits = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple']

absolute_frequencies = dict()
for fruit in fruits:
    if fruit in absolute_frequencies.keys():
        absolute_frequencies[fruit] += 1
    else:
        absolute_frequencies[fruit] = 1

print(absolute_frequencies)

This returns:

{'apple': 5, 'banana': 4, 'orange': 2}

Similarly, you could use a dictionary comprehension and write:

absolute_frequencies_dict = {fruit:fruits.count(fruit) for fruit in fruits}

This returns the exact same result, in a much shorter syntax (that’s, unfortunately, a little easier to read). If you want to dive deeper into dictionary comprehensions, check out my detailed post with a lot of examples here.

Calculate Absolute Frequencies with a Dictionary

Similar to our example above, we can use a list comprehension to generate a list of sets of values and their frequencies. If you want to learn more about list comprehensions, check out my post here, or watch my YouTube on them:

Let’s see how we can create a list comprehension for our list of absolute frequencies.

In order to do this, we need find items to iterate over, so we’ll create a set of items, as that will filter our list to unique items only. Then we’ll take the count of each item of the set in the list and add that to a tuple:

fruits = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple']

absolute_frequencies_list_comprehension = [(fruit, fruits.count(fruit)) for fruit in set(fruits)]
print(absolute_frequencies_list_comprehension)

This returns:

[('banana', 4), ('orange', 2), ('apple', 5)]

How to calculate relative frequencies with Python

A relative frequency, measures how often a certain value occurs in a dataset, relative to the total number of values in that dataset.

In order to calculate the relative frequencies, we’ll need to divide each absolute frequency by the total number of values in the array. Let’s see how we’ll do this with each of the methods above.

We can calculate the number of items in the list by using the len() method:

num_items = len(fruits)

print(num_items)

# Returns
11

Let’s now implement relative frequencies into each of our methods from above:

Using a Loop and a Dictionary to Calculate Relative Frequencies

# Loop and Dictionary
relative_frequencies_dict_loop = dict()
for fruit in fruits:
    if fruit in relative_frequencies_dict_loop.keys():
        relative_frequencies_dict_loop[fruit] +=1
    else:
        relative_frequencies_dict_loop[fruit] = 1

for fruit in relative_frequencies_dict_loop.keys():
    relative_frequencies_dict_loop[fruit] /= len(fruits)

print(relative_frequencies_dict_loop)

# Returns
# {'apple': 0.45454545454545453, 'banana': 0.36363636363636365, 'orange': 0.18181818181818182}

This tells us that the string ‘apple’ represents 45% of the values in the list.

Using a Dictionary Comprehension to Calculate Relative Frequencies

Let’s see how to do this with a dictionary comprehension:

relative_frequencies_dict_comprehension = {fruit:fruits.count(fruit)/len(fruits) for fruit in fruits}

print(relative_frequencies_dict_comprehension)

# Returns
# {'apple': 0.45454545454545453, 'banana': 0.36363636363636365, 'orange': 0.18181818181818182}

Using a List Comprehension to Calculate Relative Frequencies

Finally, let’s take a look at how to calculate relative frequencies using a list comprehension:

relative_frequencies_list_comprehension = [(fruit, fruits.count(fruit) / len(fruits)) for fruit in set(fruits)]

print(relative_frequencies_list_comprehension)

# Returns
# [('orange', 0.18181818181818182), ('banana', 0.36363636363636365), ('apple', 0.45454545454545453)]

How to calculate absolute frequencies with Pandas

Pandas makes it very easy to calculate absolute frequencies by using the .value_counts() method. Let’s see how this works in practise:

import pandas as pd
fruits = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple']

df = pd.DataFrame(data=fruits, columns=['Fruit'])
print(df['Fruit'].value_counts())

This returns:

apple     5
banana    4
orange    2
Name: Fruit, dtype: float64

How to calculate relative frequencies with Pandas

In order to calculate relative frequencies with Pandas, you can use the .value_counts() method and apply the normalize=True parameter:

import pandas as pd
fruits = ['apple', 'apple', 'banana', 'orange', 'apple', 'apple', 'banana', 'banana', 'orange', 'banana', 'apple']

df = pd.DataFrame(data=fruits, columns=['Fruit'])
print(df['Fruit'].value_counts(normalize=True))

This returns:

apple     0.454545
banana    0.363636
orange    0.181818
Name: Fruit, dtype: float64

Conclusion

In this post, you learned how to calculate both absolute and relative frequencies using pure Python as well Pandas. In particular, you used dictionaries and list comprehensions, as well as the Pandas .value_counts() method to calculate frequencies.

To learn more about the .value_counts() method, check out the official documentation.

Cover of Introduction to Python for Data Science

Want to learn Python for Data Science? Check out my ebook for as little as $10!