In this tutorial, you’ll learn all you need to know about the Seaborn boxplot, using the `sns.boxplot()`

function. You learn all about boxplots and we’ll provide examples of how to customize your line chart.

Check out the sections below If you’re interested in something specific. If you want to learn more about Seaborn, check out my other Seaborn tutorials, like the bar chart tutorial or line chart tutorial.

Table of Contents

## What is a boxplot?

A boxplot is a helpful data visualization that illustrates a five different summary statistics for your data. It helps you understand the data in a much clearer way than just seeing a single summary statistic.

Specifically, boxplots show a five number summary that includes:

- the minimum,
- the first quartile (25th percentile),
- the median,
- the third quartile (75th percentile),
- the maximum

Additionally, boxplots will identify any outliers that exist in the data. Outliers are generally classified as being outside 1.5 times the interquartile range.

The median line can be very descriptive as well. If the line is higher in the interquartile range (the box), the data is said to be negatively skewed. Inversely, if the median line is lower in the box, the data is said to be positively skewed.

## How to create a Seaborn boxplot?

Seaborn has an aptly named `sns.boxplot()`

function that is used to create, well, boxplots.

To demonstrate the `sns.boxplot()`

function, let’s import the libraries we’ll need as well as load a sample dataframe that comes bundled with Seaborn:

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = sns.load_dataset('tips') print(df.head())

This returns the following dataframe:

total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4

Let’s start by creating a boxplot that breaks the data out by date on the x-axis and shows the total bill on the y-axis. Let’s see how we’d do this in Python:

sns.boxplot(data=df, x='day', y='total_bill') plt.show()

This returns the following image:

## Styling a Seaborn boxplot

The default boxplot generated by Seaborn is not the prettiest. Let’s learn how we can apply some style and a different colour palette to the Seaborn boxplot.

We can use the `sns.set_style()`

function and the `sns.set_palette()`

function to apply both a style and a palette. You can learn more about the style function by checking out the official documentation.

Let’s apply the `darkgrid`

style and the `Set2`

palette:

sns.set_style('darkgrid') sns.set_palette('Set2') sns.boxplot(data=df, x='day', y='total_bill') plt.show()

Now our boxplot looks much nicer!

## Adding titles and axis labels to Seaborn boxplots

We can also use Matplotlib to add some descriptive titles and axis labels to our plot to help guide the interpretation of the data even further.

To do this, we use the `pyplot`

module from matplotlib.

By default, Seaborn will infer the column names as the axis labels.

Let’s now add a descriptive title and some axis labels that aren’t based on the dataset.

sns.boxplot(data=df, x='day', y='total_bill') plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns our chart, with a helpful title and some axis labels added:

## Ordering Seaborn boxplots

There may be times when you want to sort your data in different ways. Currently, Seaborn is inferring the order us, but we can also specify a particular order.

To do this, we use the `order=`

parameter. For example, if we wanted to place the weekend days first, we could write:

sns.boxplot(data=df, x='day', y='total_bill', order=['Sat', 'Sun', 'Thur', 'Fri']) plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns our updated boxplot:

## Rotating your Seaborn boxplot

Sometimes, you may want to rotate your data, if it’s easier to explore in a horizontal format.

By default, Seaborn will infer the orientation of your boxplot based on the data that exists in the dataset.

Seaborn provides two different methods to do this. If both your variables are numerical (or if you’re using a wide-dataset) you can specify `orient='h'`

to display your data in a horizontal format.

Alternatively, if you’re not plotting two numerical variables, you can simply flip the `x=`

and `y=`

parameters. Let’s see how we can do this in Python:

sns.boxplot(data=df, y='day', x='total_bill') plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns the following boxplot:

## Changing Whisker Length in Seaborn Boxplot

By default, Seaborn boxplots will use a whisker length of 1.5. What this means, is that values that sit outside of 1.5 times the interquartile range (in either a positive or negative direction) from the lower and upper bounds of the box.

Seaborn provides two different methods for changing the whisker length:

- Changing the proportion which determine outliers, and
- Setting upper and lower percentile bounds to capture data

### Setting Interquartile Range Proportion in Seaborn Boxplots

Say we wanted to include data points that exist within the range of two times the interquartile range, we can specify the `whis=`

parameter.

Let’s try this in Python:

sns.boxplot(data=df, x='day', y='total_bill', whis=2) plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns the following boxplot:

### Setting Percentile Limits on Seaborn Boxplot Whiskers

There may be times when you want to set upper and lower limits on the percentages of data points to include.

For example, if you wanted to include everything except for the bottom and top 5% of your data within the box and whiskers, you could write:

sns.boxplot(data=df, x='day', y='total_bill', whis=[0.05, 0.95]) plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns the following boxplot:

## Create grouped Seaborn boxplots

There may be times when you want to add another dimension to the data. In the example we have been using, it may be helpful to split the data also by gender to see how the data differs based on different genders.

We can do this, similar to other Seaborn plots, using the `hue=`

parameter.

Let’s add this to our plot to see how this changes the plot:

sns.boxplot(data=df, x='day', y='total_bill', hue='sex') plt.title('Tips by Day') plt.xlabel('Day of Week') plt.ylabel('Total Bill Amount ($)') plt.show()

This returns the following image:

This allows us to see how the spread of data differs, not only by the day of week but also by gender.

## Conclusion

In this post, you learned what a boxplot is and how to create a boxplot in Seaborn. Specially, you learned how to customize the plot using styles and palettes, adding a title and axis labels to the chart, as well as modifying different data elements within the chart. Finally, you learned how to plot a second dimension to create a grouped boxplot.

Pingback: Seaborn in Python for Data Visualization • The Ultimate Guide • datagy