Skip to content

Seaborn Violin Plots in Python: Complete Guide

Seaborn Violin Plots in Python Complete Guide Cover Image

In this tutorial, you’ll learn how to create Seaborn violin plots using the sns.violinplot() function. A violin plot is similar to a box and whisker plot in that it shows a visual representation of the distribution of the data. However, the violin plot opens much more data by displaying the data distribution. Violin plots are particularly useful when you want to compare the distribution of multiple datasets and be able to identify outliers.

By the end of this tutorial, you’ll have learned the following:

  • What violin plots are and when you’ll want to use them
  • How the sns.violinplot() function works
  • How to create simple violin plots in Seaborn
  • How to customize violin plots in Seaborn by splitting by color to add additional variables
  • How to create advanced violin plots in Seaborn by trimming, adding detail lines, and changing how the width of a violin plot is calculated

Understanding Violin Plots

A violin plot is very similar to a box and whisker plot, which you can also easily create in Seaborn. The plot allows you to see the distribution of quantitative data, split by one or more categorical variables. Unlike, a box plot, however, the graph is a kernel density estimate of the underlying data.

Let’s take a look at what a violin plot can look like:

Modifying the Color Palette in Seaborn Violin Plots
A sample violin plot created in Seaborn

Let’s break down some of the key components of the violin plot:

  1. The white dot in the center of the plot shows the median of the distribution
  2. The thicker black bar shows the interquartile range of the data
  3. The thinner black bar shows the data that extends to 1.5 times the interquartile range
  4. The wider the plot is on a given data point, the more likely a point will fall into the range

We can see that the data looks quite smooth. This is because the data is that kernel density estimation. However, for larger sample sizes, this can be a very accurate representation of how data are distributed.

While a box plot can show us high amounts of details, violin plots take this even further!

Understanding the Seaborn violinplot Function

Seaborn uses the sns.violinplot() function to generate violin plots. The function has a total of 21 parameters. While this may sound intimidating, you’ll learn about the important ones in this guide. In fact, you don’t need many to generate meaningful violin plots, since Seaborn helps abstract away much of the complexity.

The table below breaks down the parameters of the sns.violinplot() function, as well as their default arguments and accepted values:

ParameterDefault ArgumentDescriptionAccepted Values
data=NoneThe dataset to plot. If x and y are not included, the dataset is interpreted as wide-format.DataFrame, array, list of arrays
x=, y=, hue=NoneInputs for plotting long-form dataString, vector
order=, hue_order=NoneThe order to plot categorical levels in. If left blank, inferred from the data.list of strings
bw='scott'The reference rule or the scale factor to use when calculating the kernel bandwidth.{‘scott’, ‘silverman’, float}
cut=2How far to extend the density past extreme data points (in terms of bandwidth)float
scale='area'The method used to scale the width of each violin{'area', 'count', 'width'}
scale_hue=TrueWhen violins are nested using hues, this determines how the scaling is calculated (whether by each major grouping or all violins)boolean
gridsize=100Number of points in the discrete grid used to compute the kernel density estimateinteger
width=0.8Width of a full element when not using hue nestingFloat
inner='box'The representation of the datapoints in the violin interior{“box”, “quartile”, “point”, “stick”, None}
split=FalseWhen using hue nesting with a variable that takes two levels, setting split to True will draw half a violin for each levelBoolean
dodge=TrueWhen nesting by hue, whether the elements should be shiftedBoolean
orient=NoneHow to orient the plot'v', 'h'
linewidth=NoneThe width of the gray lines that frame the plotFloat
color=NoneSingle color for all the elements of the plotMatplotlib color
palette=NoneColors to use for the different levels of the hue variable.palette name, list, dict
saturation=0.75Proportion of the original saturation to draw colors at. Setting at 1 uses the full saturation.Float
ax=NoneAxes object to draw the plt onMatplotlib axes
The parameters and default arguments of the sns.violinplot() function

As you can see from the table above, the function offers a lot of parameters to help you customize the violin plots that you create. Don’t be intimidated, however. This guide will make creating violin plots simple and intuitive. Let’s dive in.

Loading a Sample Dataset

For this tutorial, we’ll use the 'tips' dataset that comes bundled with Seaborn. The dataset breaks down bill and tip amounts for a number of different transactions. The dataset also provides information on the time and day and additional information about the transaction.

# Loading a Sample Dataset
import seaborn as sns
df = sns.load_dataset('tips')
df.head()

# Returns:
#    total_bill   tip     sex smoker  day    time  size
# 0       16.99  1.01  Female     No  Sun  Dinner     2
# 1       10.34  1.66    Male     No  Sun  Dinner     3
# 2       21.01  3.50    Male     No  Sun  Dinner     3
# 3       23.68  3.31    Male     No  Sun  Dinner     2
# 4       24.59  3.61  Female     No  Sun  Dinner     4

Now that you have an understanding of how we can start building our violin plots with Seaborn.

How to Create Python Seaborn Violin Plots

Let’s see how we can create a simple Seaborn violin plot using the sns.violinplot() function. The function makes it easy to build either single or multiple violin plots. Let’s first explore how we can create a single violin plot using Seaborn.

How to Plot a Single Violin Plot in Seaborn

The Seaborn violinplot() function uses a similar format to all plotting functions in the library. This means that we can pass in a dataset in the form of a Pandas DataFrame and then plot data using familiar x= and y= parameters. In order to create a single violin plot in Seaborn, simply pass the DataFrame into the data= parameter and a column header into the y= parameter.

Let’s see what this looks like in Seaborn and Python:

# Creating a Violin Plot with Seaborn
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')

sns.violinplot(data=df, y='tip')
plt.show()

In the code block above, we passed our DataFrame, df, into the data= parameter. We also passed the column header for 'tip' into the y= parameter. With this, we’re telling Seaborn we want to plot the distribution of the 'tip' column. This returns the visualization below:

Creating a Violin Plot in Seaborn
Creating a Violin Plot in Seaborn

The visualization shows the data split out by the tip amount. Note that we actually only have a single axis populated – the y-axis showing the overall tip amount. The spread of the data (i.e., the kernel density function) shows how many data points fall under the different distribution points of a given tip amount.

We can learn quite about the visualization here. For example, the median is around $3, and half of the data falls between roughly $1.75 – $2.75. Let’s see how we can break this dataset down further by adding multiple violin plots.

How to Plot Multiple Violin Plots in Seaborn

In order to plot multiple violin plots in Seaborn, you can pass an additional column label into the x= parameter. This will split a categorical variable into separate violin charts using labels on the x-axis. Let’s see how we can add multiple violin plots, broken out by the day of the week:

# Creating Multiple Violin Plots with Seaborn
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')

sns.violinplot(data=df, x='day', y='tip')
plt.show()

In the example above, we added one additional argument: x='day'. This instructs Seaborn to split the visualization by the categorical day variable and creates one violin plot per unique day of the week. This returns the following image:

Creating Multiple Violin Plots in Seaborn
Creating Multiple Violin Plots in Seaborn

We can see that because the dataset has data for four different days, four violin plots are created. This allows you to better visualize the distribution of tips for each of these four days. For example, we can see that while the median tip is higher for Fridays than Thursdays, there are higher outliers for Thursdays.

How to Add Color to Seaborn Violin Plots with Hue

We can further split out Seaborn violin plots by splitting categorical variables into subcategories. For example, while we have split our data by day, we can further split day by gender. In order to split violin plots by color, you can use the hue= parameter. This will add one additional violin plot for each split. Let’s see what this looks like:

# Adding Additional Variables to Seaborn Violin Plots with hue
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')

sns.violinplot(data=df, x='day', y='tip', hue='sex')
plt.show()

We can see from the example above that we added an additional argument, hue='sex'. This instructs Seaborn to split the data for each split further by the sex variable. Because the data has values for Male and Female in that column, we add two additional splits and return the following image:

Splitting Variables by Color Using Hue in Seaborn Violin Plots
Splitting Variables by Color Using Hue in Seaborn Violin Plots

In the visualization above, we added an additional split into our violin plot. Note also that Seaborn automatically added the legend to the visualization, indicating which color refers to which sex. Seaborn allows you to customize this differently by splitting each violin in half, rather than creating additional violin plots.

How to Split Seaborn Violin Plots into Variables

Rather than creating separate violin plots for each sub-variable, we can split each violin in half. This allows you to better visualize the differences between each category. In order to do this, we can pass in the split=True argument.

# Splitting Additional Variables with Hue
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip', hue='sex', split=True)
plt.show()

Keep in mind that we’re still splitting our data by using the hue= parameter. However, by passing in split=True, each violin is split in half. This returns the image below:

Splitting a Variable by Color in Seaborn Violin Plots
Splitting a Variable by Color in Seaborn Violin Plots

While this shows the same data as before, it’s now much easier to understand the differences between each gender.

In the following section, you’ll learn how to rotate violin plots horizontally.

How to Rotate Seaborn Violin Plots to Horizontal

In some cases, you may want to rotate your Seaborn violin plots so that the shapes are horizontal, rather than vertical. This can allow you better see the spread for some types of distributions. In order to do this, we need to reverse the x= and y= parameters. In most cases, Seaborn will infer the orientation. However, if you want to be more explicit, you can pass in orient='h'. Let’s see what this looks like:

# Rotating a Seaborn Violin Plot
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='tip', y='day')
# sns.violinplot(data=df, x='tip', y='day', orient='h')
plt.show()

By rotating our violin plot, we return the image shown below:

Rotating a Seaborn Violin Plot to Horizontal
Rotating a Seaborn Violin Plot to Horizontal

In the following section, you’ll learn how to plot violin plots to subplots.

How to Add Seaborn Violin Plots to Subplots

In some cases, you’ll want to create multiple Seaborn plots using subplots. This allows you to show different distributions in the same figure. For example, we can visualize how the distribution for the tip and total_bill columns differ. Let’s see how this can be done:

# Using Subplots with Seaborn Violin Plots
import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('tips')
fig = plt.figure(figsize=(10, 8))
grid = fig.add_gridspec(1, 2)

ax = fig.add_subplot(grid[0, 0])
sns.violinplot(data=df, y='tip')

ax = fig.add_subplot(grid[0, 1])
sns.violinplot(data=df, y='total_bill')

fig.tight_layout()
plt.show()

In the example above, we used subplots to add two different distributions to the same figure. We did this by creating a grid spec, which has one row and two columns. From there, we were able to add axes objects to the grid spec by using the add_subplot() function. This returns the following image:

Adding Violin Plots to Subplots in Matplotlib
Adding Violin Plots to Subplots in Matplotlib

In the following sections, we’ll explore how to show different data elements in violin plots.

How to Show Data in Seaborn Violin Plots

Seaborn provides many different options to show data in violin plots. For example, you can show data as points using strip plots. Additionally, you can show data as concentrated lines showing the distribution across an axis.

Let’s dive into how this can be done in Seaborn!

How to Show Data as Points in Seaborn Violin Plots

To show data as points within the Seaborn violin plot, we can layer in an additional plot, the strip plot. Because Seaborn returns an implicit axes, Seaborn can map to the same axes by calling two functions. Let’s see how this works:

# Show Data as Points in Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip')
sns.stripplot(data=df, x='day', y='tip', color='black', alpha=0.5)
plt.show()

In the code above, we used the same parameters for data=, x=, and y=. We added additional customizations for the strip plot to make the data more visible when points overlap. This returns the image below:

Adding Data as Points to Violin Plots in Seaborn
Adding Data as Points to Violin Plots in Seaborn

In the image above, the distribution is shown as both a violin plot and as data points, relative to their distribution. We can also show our data as lines, rather than points, which is what you’ll learn in the following section.

How to Show Data as Lines in Seaborn Violin Plots

The Seaborn violinplot() function also allows you to show data as horizontal lines using the inner= parameter. By passing 'stick' into the inner= parameter, the function will add horizontal lines at a density proportional to the kernel density.

# Show Data as Lines in Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip', inner='stick')
plt.show()

The inner= parameter accepts some different options. 'stick' and 'point' will return lines and points, respectively. 'quartiles' will return lines representing the interquartile range and 'box' will return a small inner boxplot (and is the default argument). By passing in 'stick', we return the image below:

Removing Outliers from Seaborn Violin Plots
Removing Outliers from Seaborn Violin Plots

In this following section, you’ll learn how to remove outliers from Seaborn violin plots.

How to Cut or Include Outlier Values from Seaborn Violin Plots

By default, the Seaborn violin plot will extend to two times the interquartile range of the plot. This means that some outliers will be cut. However, violin plots are excellent tools for identifying outliers. Because of this, we can use the cut= parameter to remove (or include) certain outliers.

For example, if we wanted to include all data points in our violin plot, we could use cut=0, which includes all outliers. If we modified the value to be 1.75, the violin plot would include values up to 1.75 times the interquartile range.

Let’s see how we can include all values from our distribution:

# Include Outliers in Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip', cut=0)
plt.show()

In the image below, it may look like we’re cutting our graph off early. However, we’re actually removing the smoothing that Seaborn implies when generating the violin. By default, the line will extend to 2 times the interquartile range to create a smoothed effect. However, by passing in 0, we cut the values at their true start and end points.

Removing Outliers from Seaborn Violin Plots
Removing Outliers from Seaborn Violin Plots

In the following section, you’ll learn how to modify how Seaborn calculates the width of each violin plot.

How to Change the Scaling Rule in Seaborn Violin Plots

Seaborn allows you to modify how the violins are shaped using the scale= parameter. In fact, it provides three different options for this parameter:

  • 'width' indicates that each violin should have the same width,
  • 'area' indicates that each violin should have the same area (and is the default parameter), and
  • 'count' indicates that the width should be scaled by the number of observations in each bin

Let’s take a look at how we can modify the parameter to scale each violin to be the same width:

# Modify the Scaling Rule for Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip', scale='width')
plt.show()

By setting this parameter, each violin plot will now have the same width. This allows you to see where the greatest values exist in each distribution, even if those total values are different among the categories.

Changing how Violin Plots are Scaled in Seaborn
Changing how Violin Plots are Scaled in Seaborn

In the final sections, you’ll learn how to customize your violin plots. First, we’ll take a look at how to add titles and labels to the plots and then how to modify the color palette of your visualization.

How to Add Titles and Labels to Seaborn Violin Plots

Seaborn makes adding titles and axis labels to your visualizations simple and intuitive. By using Matplotlib axes methods, we can easily set these customizations. For example, we can use the following parameters in the .set() method:

ParameterTo Set
title=The title of our visualization
xlabel=The x-axis label of our visualization
ylabel=The y-axis label of our visualization
Parameters of the ax.set() method to customize titles and labels

Let’s see what this looks like in Python:

# Add Titles to Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
ax = sns.violinplot(data=df, x='day', y='tip')
ax.set(
    title='A Violin Plot Made in Seaborn', 
    xlabel='Weekday', 
    ylabel='Tip Amount'
    )
plt.show()

In the code block above, we customized the title and axis labels of our violin plot. This returned the following visualization:

Adding Titles and Axis Labels to Seaborn Violin Plots
Adding Titles and Axis Labels to Seaborn Violin Plots

In the section below, you’ll learn how to use Seaborn’s built in color palettes to customize the coloring of the charts.

How to Change the Color Palette in Seaborn Violin Plots

Seaborn provides a number of different color palettes. You can learn all about how these palettes work and how to apply them in this guide. However, to keep things simple and actionable, you can easily pass in a palette into the palette= parameter.

# Change the Color Palette in Seaborn Violin Plots
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('tips')
sns.violinplot(data=df, x='day', y='tip', palette='pastel')
plt.show()

In the code block above, we passed in palette='pastel', which indicates that we want to use that specific palette. This returns the following image below:

Modifying the Color Palette in Seaborn Violin Plots
Modifying the Color Palette in Seaborn Violin Plots

Seaborn provides a lot of flexibility in terms of customizing how we want to style our data visualizations. Using palettes is a simple option to easily add style to your data.

Conclusion

In this guide, you learned how to use the Seaborn violinplot() function to create informative violin plots in Seaborn. You first learned what violin plots are and when you may want to use them. From there, you learned about the sns.violinplot() function and its various parameters.

Then, you began to walk through hands-on examples. We first created a number of different simple violin plots. From there, we customized the visualizations by adding additional details using color and inner data representations. Finally, we modified the axes objects by customizing titles, axis labels, and the color palette.

Additional Resources

To learn more about related topics, check out the resources below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *