Seaborn Boxplot – How to create box and whisker plots

Seaborn Boxplot Cover Image

In this tutorial, you’ll learn all you need to know about the Seaborn boxplot, using the sns.boxplot() function. You learn all about boxplots and we’ll provide examples of how to customize your line chart.

Check out the sections below If you’re interested in something specific. If you want to learn more about Seaborn, check out my other Seaborn tutorials, like the bar chart tutorial or line chart tutorial.

What is a boxplot?

A boxplot is a helpful data visualization that illustrates a five different summary statistics for your data. It helps you understand the data in a much clearer way than just seeing a single summary statistic.

Specifically, boxplots show a five number summary that includes:

  • the minimum,
  • the first quartile (25th percentile),
  • the median,
  • the third quartile (75th percentile),
  • the maximum

Additionally, boxplots will identify any outliers that exist in the data. Outliers are generally classified as being outside 1.5 times the interquartile range.

Understanding boxplots

The median line can be very descriptive as well. If the line is higher in the interquartile range (the box), the data is said to be negatively skewed. Inversely, if the median line is lower in the box, the data is said to be positively skewed.

How to create a Seaborn boxplot?

Seaborn has an aptly named sns.boxplot() function that is used to create, well, boxplots.

To demonstrate the sns.boxplot() function, let’s import the libraries we’ll need as well as load a sample dataframe that comes bundled with Seaborn:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')

print(df.head())

This returns the following dataframe:

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

Let’s start by creating a boxplot that breaks the data out by date on the x-axis and shows the total bill on the y-axis. Let’s see how we’d do this in Python:

sns.boxplot(data=df, x='day', y='total_bill')
plt.show()

This returns the following image:

simple Seaborn boxplot

Styling a Seaborn boxplot

The default boxplot generated by Seaborn is not the prettiest. Let’s learn how we can apply some style and a different colour palette to the Seaborn boxplot.

We can use the sns.set_style() function and the sns.set_palette() function to apply both a style and a palette. You can learn more about the style function by checking out the official documentation.

Let’s apply the darkgrid style and the Set2 palette:

sns.set_style('darkgrid')
sns.set_palette('Set2')

sns.boxplot(data=df, x='day', y='total_bill')
plt.show()

Now our boxplot looks much nicer!

styling a Seaborn boxplot

Adding titles and axis labels to Seaborn boxplots

We can also use Matplotlib to add some descriptive titles and axis labels to our plot to help guide the interpretation of the data even further.

To do this, we use the pyplot module from matplotlib.

By default, Seaborn will infer the column names as the axis labels.

Let’s now add a descriptive title and some axis labels that aren’t based on the dataset.

sns.boxplot(data=df, x='day', y='total_bill')

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns our chart, with a helpful title and some axis labels added:

adding a title to Seaborn boxplot

Ordering Seaborn boxplots

There may be times when you want to sort your data in different ways. Currently, Seaborn is inferring the order us, but we can also specify a particular order.

To do this, we use the order= parameter. For example, if we wanted to place the weekend days first, we could write:

sns.boxplot(data=df, x='day', y='total_bill', order=['Sat', 'Sun', 'Thur', 'Fri'])

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns our updated boxplot:

ordering Seaborn boxplots

Rotating your Seaborn boxplot

Sometimes, you may want to rotate your data, if it’s easier to explore in a horizontal format.

By default, Seaborn will infer the orientation of your boxplot based on the data that exists in the dataset.

Seaborn provides two different methods to do this. If both your variables are numerical (or if you’re using a wide-dataset) you can specify orient='h' to display your data in a horizontal format.

Alternatively, if you’re not plotting two numerical variables, you can simply flip the x= and y= parameters. Let’s see how we can do this in Python:

sns.boxplot(data=df, y='day', x='total_bill')

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns the following boxplot:

horizontal Seaborn boxplot

Changing Whisker Length in Seaborn Boxplot

By default, Seaborn boxplots will use a whisker length of 1.5. What this means, is that values that sit outside of 1.5 times the interquartile range (in either a positive or negative direction) from the lower and upper bounds of the box.

Seaborn provides two different methods for changing the whisker length:

  1. Changing the proportion which determine outliers, and
  2. Setting upper and lower percentile bounds to capture data

Setting Interquartile Range Proportion in Seaborn Boxplots

Say we wanted to include data points that exist within the range of two times the interquartile range, we can specify the whis= parameter.

Let’s try this in Python:

sns.boxplot(data=df, x='day', y='total_bill', whis=2)

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns the following boxplot:

Adjusting Seaborn boxplot whisker length

Setting Percentile Limits on Seaborn Boxplot Whiskers

There may be times when you want to set upper and lower limits on the percentages of data points to include.

For example, if you wanted to include everything except for the bottom and top 5% of your data within the box and whiskers, you could write:

sns.boxplot(data=df, x='day', y='total_bill', whis=[0.05, 0.95])

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns the following boxplot:

Seaborn whisker length

Create grouped Seaborn boxplots

There may be times when you want to add another dimension to the data. In the example we have been using, it may be helpful to split the data also by gender to see how the data differs based on different genders.

We can do this, similar to other Seaborn plots, using the hue= parameter.

Let’s add this to our plot to see how this changes the plot:

sns.boxplot(data=df, x='day', y='total_bill', hue='sex')

plt.title('Tips by Day')
plt.xlabel('Day of Week')
plt.ylabel('Total Bill Amount ($)')

plt.show()

This returns the following image:

Grouped Seaborn boxplot

This allows us to see how the spread of data differs, not only by the day of week but also by gender.

Conclusion

In this post, you learned what a boxplot is and how to create a boxplot in Seaborn. Specially, you learned how to customize the plot using styles and palettes, adding a title and axis labels to the chart, as well as modifying different data elements within the chart. Finally, you learned how to plot a second dimension to create a grouped boxplot.