Skip to content

Pandas IQR: Calculate the Interquartile Range in Python

Pandas IQR Calculate the Interquartile Range in Python Cover Image

In this post, you’ll learn how to calculate the interquartile range in Pandas with Python. When working with data, it’s important to understand the variability of your dataset. The IQR represents the spread of the middle 50% of the data, allowing you to get a good sense of the variability of data.

In this post, you’ll learn how to calculate the IQR in Pandas for a single column as well as for an entire DataFrame. You’ll also learn what the IQR is and how to interpret it. Finally, you’ll also learn how to visualize the IQR using the popular Seaborn library.

The Quick Answer: Use Pandas quantile()

To calculate the interquartile range for a Pandas column, you can use the Pandas .quantile() method. This allows you to calculate the percentiles for the 75th and 25th percentiles. Because the IQR represents the difference between these two, you can then subtract them.

Take a look at what this looks like below:

# Calculating the IQR of a Pandas Column
quartiles = df[col].quantile([0.25, 0.75])
iqr = quartiles[0.75] - quartiles[0.25]
print(iqr)

# Returns: 16.5

What Is the Interquartile Range?

Definition of the Interquartile Range

The interquartile range (IQR, for short) is a measure of statistical dispersion, which represents the spread of the data. The interquartile range is also referred to as the midspread, the middle 50%, or the H-spread.

Mathematically, it represents the difference between the 75th and 25th percentiles of the data. The interquartile range is often used to find outliers in data. Outliers here are defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR. 

Calculation of the Interquartile Range

To calculate the IQR, the dataset is divided into quartiles. These Quarters are denoted by Q1 (the lower quartile), Q2 (the median), and Q3 (the upper quartile). Because the lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, the IQR is calculated as:

IQR = Q3 - Q1

Loading a Sample Pandas DataFrame

To follow along with the tutorial, I have created a sample Pandas DataFrame that includes the scores of different students in various courses. Feel free to copy and paste the code block below into your favorite code editor to follow along:

# Loading a Sample Pandas DataFrame
import pandas as pd
df = pd.DataFrame.from_dict({
    'Student': ['Nik', 'Kate', 'Kevin', 'Evan', 'Jane', 'Kyra', 'Melissa'],
    'English': [90, 95, 75, 93, 60, 85, 75],
    'Chemistry': [95, 95, 75, 65, 50, 85, 100],
    'Math': [100, 95, 50, 75, 90, 50, 80]
}).set_index('Student')
print(df)

# Returns:
#          English  Chemistry  Math
# Student                          
# Nik           90         95   100
# Kate          95         95    95
# Kevin         75         75    50
# Evan          93         65    75
# Jane          60         50    90
# Kyra          85         85    50
# Melissa       75        100    80

Let’s now dive into how to calculate the interquartile range with Pandas for a single column.

Calculating the Interquartile Range with Pandas for a Single Column

In order to calculate the interquartile range (IQR) for a Pandas DataFrame column, you can use the Pandas quantile method. The Pandas quantile method can be used to calculate different quantiles – in this case, we’ll use it to calculate the 25th and 75th quartiles.

Let’s see what this looks like:

# Calculating Percentiles of a Column
print(df['English'].quantile([0.25, 0.75]))

# Returns:
# 0.25    75.0
# 0.75    91.5
# Name: English, dtype: float64

We can see that this returns a Pandas Series, containing the 25th and 75th quartiles. We can now subtract these two values to get the interquartile range:

# Calculating the IQR of a Pandas Column
quartiles = df['English'].quantile([0.25, 0.75])
iqr = quartiles[0.75] - quartiles[0.25]
print(iqr)

# Returns: 16.5

In the code block above, we assigned the quantiles to a variable, quartiles. We then calculated the IQR by indexing the 25th and 75th percentiles. Finally, we printed it out to get the value of 16.5.

Let’s now take a look at how we can calculate the interquartile range with Pandas for an entire DataFrame.

Calculating the Interquartile Range with Pandas for a DataFrame

In order to calculate the interquartile range (IQR) for an entire Pandas DataFrame, we can apply the quantile method to get the 75th and 25th percentiles and subtract the two.

This method works in a similar way as the previous example. We first calculate the 75th and 25th percentiles. Then, we subtract the two Series by indexing them.

Note: rather than applying the quantile method to the DataFrame, we apply the method twice. This is because we want to return two Series objects, rather than a DataFrame.

Let’s see what this looks like:

# Calculating the IQR for a DataFrame
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1

print(IQR)

# Returns:
# English      16.5
# Chemistry    25.0
# Math         30.0
# dtype: float64

In the code block above, we created two Series objects, representing the quantiles for the DataFrame. We then subtracted the two in order to return a Series that includes the IQR for each of the columns in the DataFrame.

Interpreting the Interquartile Range

The IQR is a measure of variability, which allows us to identify the spread of a dataset. In general, a larger IQR indicates greater variability in the data, while a smaller IQR indicates less variability. However, the interpretation of the IQR depends on the distribution of the data.

For example, if the data is normally distributed, we can use the IQR to identify outliers that fall outside the range of 1.5 times the IQR below Q1 or above Q3. The IQR can be used to identify outliers, which are data points that fall outside the range of 1.5 times the IQR below Q1 or above Q3.

However, if the data is skewed, the IQR may not be a good measure of variability, and other measures such as the standard deviation may be more appropriate.

In addition to identifying outliers and comparing variability, the IQR can also be used to identify the shape of the distribution. For example, if the IQR is small and the median is close to the mean, the data is likely to be normally distributed. On the other hand, if the IQR is large and the median is far from the mean, the data is likely to be skewed.

Visualizing the Interquartile Range with Boxplots

Boxplots are helpful charts that clearly illustrate the distribution in a dataset, by visualizing the range, distribution, and extreme values. A boxplot is a helpful data visualization that illustrates five different summary statistics for your data. It helps you understand the data in a much clearer way than just seeing a single summary statistic.

In this section, we’ll explore how to create boxplots in Seaborn.

Specifically, boxplots show a five-number summary that includes:

  • the minimum,
  • the first quartile (25th percentile),
  • the median,
  • the third quartile (75th percentile),
  • the maximum

Let’s take a look at how boxplots are developed:

Creating a boxplot in Seaborn is made easy by using the sns.boxplot() function. Let’s start by creating a boxplot that breaks the data out by Class column on the x-axis and shows the Grade column on the y-axis. Let’s see how we’d do this in Python:

# Creating a Boxplot in Python
import seaborn as sns
import matplotlib.pyplot as plt

df = df.reset_index().melt(
    id_vars='Student',
    var_name='Class',
    value_name='Grade'
)
sns.boxplot(data=df, x='Class', y='Grade')
plt.show()

In the code block above, we first reset our DataFrames index. We then used the melt method to unpivot our DataFrame, turning it into a long dataset. Finally, we created a boxplot with the sns.boxplot() function. This returned the image below:

Boxplot in Python

We can see that the English class has much less variability, while the Math class has the highest variability.

Conclusion

The interquartile range (IQR) is a measure of statistical dispersion that represents the spread of the data. It is a useful tool for identifying outliers, comparing variability across datasets, and identifying the shape of the distribution. In this blog post, we explored how to calculate the IQR for both a single column and an entire Pandas DataFrame using the quantile() method. We also discussed how to interpret the IQR and how to visualize it using boxplots in Seaborn. By using these techniques, you can gain insights into the distribution of your data and make more informed decisions in your data analysis.

To learn more about the Pandas quantile method, check out the official documentation here.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *