Seaborn catplot - Categorical Data Visualizations in Python • datagy

In this tutorial, you’ll learn how to create Seaborn relational plots using the sns.catplot() function. Categorical plots show the relationship between a numerical and one or more categorical variables. Seaborn provides many different categorical data visualization functions that cover an entire breadth of categorical scatterplots, categorical distribution plots, and categorical estimate plots.

Seaborn provides dedicated functions for both of these visualizations. So, why would you want to use the catplot() function? The Seaborn catplot() function is a figure-level function, rather than an axes-level function. This opens up different possibilities in terms of how you put together your visualizations.

By the end of this tutorial, you’ll have learned the following:

What the Seaborn catplot() function is
When to use the Seaborn catplot() function instead of the dedicated functions
How to plot multiple plots using the sns.catplot() figure-level function
How to customize titles, colors, and more

Table of Contents

Understanding the Seaborn catplot() Function

The Seaborn catplot() function is used to create figure-level relational plots onto a Seaborn FacetGrid. You can customize the type of visualization that is created by using the kind= parameter.

The Seaborn catplot() function provides a figure-level interface for creating categorical plots. This means that the function allows you to map to a figure, rather than an axes object. This opens up much more possibilities.

Let’s take a look at how the function is written:

# Understanding the Seaborn catplot() Function
import seaborn as sns
sns.catplot(data=None, *, x=None, y=None, hue=None, row=None, col=None, col_wrap=None, estimator='mean', errorbar=('ci', 95), n_boot=1000, units=None, seed=None, order=None, hue_order=None, row_order=None, col_order=None, height=5, aspect=1, kind='strip', native_scale=False, formatter=None, orient=None, color=None, palette=None, hue_norm=None, legend='auto', legend_out=True, sharex=True, sharey=True, margin_titles=False, facet_kws=None, ci='deprecated', **kwargs)

The function allows you to plot the following visualization types, modified by the kind= parameter:

Categorical Chart Type	Plot Type	Function Argument	Dedicated Function Tutorial
Categorical Scatter Plot	Strip Plot	`kind='strip'`	`sns.stripplot()`
Categorical Scatter Plot	Swarm Plot	`kind='swarm'`	`sns.swarmplot()`
Categorical Distribution Plot	Box Plot	`kind='box'`	`sns.boxplot()`
Categorical Distribution Plot	Violin Plot	`kind='violin'`	`sns.violinplot()`
Categorical Distribution Plot	Boxen Plot	`kind='boxen'`	`sns.boxenplot()`
Categorical Estimate Plot	Point Plot	`kind='point'`	`sns.pointplot()`
Categorical Estimate Plot	Bar Plot	`kind='bar'`	`sns.barplot()`
Categorical Estimate Plot	Count Plot	`kind='count'`	`sns.countplot()`

The different visualizations you can make with the Seaborn catplot() function

Some of these visualizations are a little bit more specific and niche. The image below shows what a similar distribution looks like using different plots:

Different Categorical Plots Made with Seaborn catplots

The function has a very similar interface to the other relational plotting functions. Let’s take a look at some of the key options:

data= provides the data to plot via a Pandas DataFrame
x= and y= provide the variables to plot on the x- and y-axis respectively
hue= adds an additional variable to plot via a color mapping

Additionally, the function offers some extra parameters available only in the catplot() function. Let’s explore these:

kind= determines what type of chart to create. By default, it will create a strip plot, using the keyword argument 'strip'
row= allows you to split your dataset into additional rows of visualizations
col= allows you to split your dataset into additional columns of visualizations
height= and aspect= control the size of your data visualization

Now that you have a strong understanding of what’s possible, let’s dive into how we can use the function to create useful data visualizations.

Loading a Sample Dataset

To follow along with this tutorial, let’s use a dataset provided by the Seaborn library. We’ll use the popular Penguins dataset, which I cover in detail in my K-Nearest Neighbor tutorial, if you’d like to learn more about the dataset.

Let’s see how we can read the dataset and explore its first five rows:

# Exploring the Sample Dataset
import seaborn as sns
df = sns.load_dataset('penguins')

print(df.head())

# Returns:
# species                 Adelie
# island               Torgersen
# bill_length_mm            39.1
# bill_depth_mm             18.7
# flipper_length_mm        181.0
# body_mass_g             3750.0
# sex                       Male
# Name: 0, dtype: object

We printed out the first record of the dataset using the iloc accessor. We can see that we have a variety of variables available to us, including some categorical ones as well as some continuous ones.

Creating a Basic catplot with Seaborn

By default, the Seaborn catplot() function will create a scatterplot. In order to create the most basic visualization, we can simply pass in the following parameters:

data= to pass in our DataFrame
x= and y= to pass in the column labels that we want to explore in a scatterplot

Let’s see what this code looks like:

# By default Seaborn will return a Strip Plot
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm')
plt.show()

In the code block above, we passed in our DataFrame df as well as the 'island' and 'bill_length_mm' column labels. This returned the following visualization:

We can see that because we’re plotting two variables that a stripplot has been created for us. The plot allows us to explore the relationship between two variables by identifying how the two variables interact. Note that one variable is categorical and the other is continuous.

Creating a Bar Chart with Seaborn catplot

While the Seaborn catplot() function will default to creating strip plots, we can also create bars charts by passing in kind='bar'. This is especially useful when you want to aggregate data to a single measure, such as the mean of a dataset.

# Creating a Bar Chart with Seaborn's catplot()
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm', kind='bar')
plt.show()

In the code block above, we added one additional keyword argument: kind=. This allowed us to create an entirely different data visualization, as shown below:

Creating a Bar Chart with Seaborn Catplot

Because the catplot() function will actually use the barplot() function under the hood, the behavior is the same. By default, the function will aggregate the data to a single value. Because we have three different data points for each date, Seaborn will return the mean of each data point.

Doing this also introduces some need to understand how this data varies. This is why Seaborn adds the error bars to its visualization. Let’s explore these error bars a little further.

Understanding Error Bands in Seaborn catplot

When data are aggregated in Seaborn catplots, Seaborn will add an error bar to the visualization. In the example above, we created a bar plot, which returned the mean value for each category.

By default, Seaborn will use a process called bootstrapping to return a 95% confidence interval that new data will fall within the error band.

This means that Seaborn will use sampling with replacement to calculate a mean and repeat this process a number of times. By default, this is repeated a thousand times per value in on the x axis.

What this generates is a confidence band that new values have a 95% confidence of falling within this range. however, you also have the option to modify both the confidence interval and the number of bootstrap iterations Seaborn performs.

We can also modify the percentage to use in our confidence interval by passing in a tuple that contains ('ci', n) where n represents the percentage we want to use. Let’s modify our band to show a 99% confidence interval:

# Modifying Error Bars in Seaborn captplot
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm', kind='bar', errorbar=('ci', 99))
plt.show()

This returns the following visualization. Note that the band is now narrower since the error band is much less certain now.

You may also notice that the errorbar= parameter isn’t part of the definition of the catplot() function. However, it is part of the barplot() function. Seaborn allows you to use any of the keyword arguments from that function when plotting a line plot.

What if we want to change the type of error calculation? Seaborn makes this easy as well!

Seaborn accepts the following error bar calculations: 'ci', 'pi', 'se', or 'sd', which represent the following calculations:

'ci': confidence interval, which calculates the non-parametric uncertainty
'pi': percentile interval, which calculates the non-parametric spread
'se': standard error, which calculates the parametric uncertainty
'sd': standard deviation, which calculates the parametric spread

Let’s now dive back into customizing our relational plot by adding color, shapes, and sizes.

Modifying Seaborn catplot with Color

We can add additional detail to our Seaborn graphs by using color. This allows you to add additional dimensions (or columns of data) to your visualization. This means that, while our graphs will remain 2-dimensional, we can actually plot additional dimensions.

We can add these using the hue= parameterto add additional parameters in color. Let’s explore how we can add additional levels of detail using color.

Adding Color to Seaborn Catplot

To add an additional variable into your Seaborn catplot(), you can use the hue= parameter to pass in a DataFrame column that will break the data into multiple colors.

Seaborn will create a color for each of the different unique values in that column. If you’re working with categorical data, Seaborn will add one color for each unique value.

Adding Color Styles versus Adding Color Dimensions

In this case, we’ll be adding color to represent a different dimension of data. If, instead, you wanted to control the styling of your plot, you could use the palette= parameter. For the remainder of the tutorial, we’ll apply a style to make the default styling a little more aesthetic.

Let’s see how we can use Seaborn to add more detail to our plot using the hue= parameter:

# Adding Color with Hue
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', hue='sex')
plt.show()

In the code block above, we passed in hue='Name'. This means that we want to color the points in our scatterplot differently based on the gender of the penguin. (Note: we’ve also applied the palette, though this is entirely for styling the plot). This returns the following image:

Adding Color with hue in a Seaborn catplot

We can see that the data visualization is now much clearer. We can clearly see differences in the data better.

Creating Subsets of Plots with Rows and Columns

Seaborn provides significant flexibility in creating subsets of plots (or, subplots) by spreading data across rows and columns of data. This allows you to generate “small-multiples” of plots.

Rather than splitting a visualization using color or style (though you can do this, too), Seaborn will split the visualization into multiple subplots. However, rather than needing to explicitly define the subplots, Seaborn will plot them onto a figure FacetGrid for you.

Let’s now explore how we can add columns of data visualizations first.

Adding Columns to Seaborn Catplot

In order to create columns of subplots, we can use the col= parameter. The parameter accepts either a Pandas DataFrame column label or an array of data. Let’s split our data visualization into columns based on the stock that they belong to:

# Adding Columns of Data
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', col='sex')
plt.show()

In the code block above, we instructed Seaborn to create columns of small multiples with the 'sex' column. This means that Seaborn will create an individual subplot in the broader FacetGrid for each unique value in the 'sex' column.

Adding Columns of Data with Seaborn catplot

But, what happens when we have a lot of unique values? Seaborn will actually keep adding more and more columns.

Because of this, we can wrap the columns using the col_wrap= parameter. The parameter accepts an integer representing how many columns we should have before the charts are wrapped down to another row.

# Adding Column Wrap to Charts
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='sex', y='bill_length_mm', kind='violin', col='island', col_wrap=2)
plt.show()

This returns the following data visualization, where our small multiples have been wrapped around the second column:

Wrapping Columns of Small Multiples in Seaborn catplot

In the following section, you’ll learn how to also add additional rows of visualizations.

Adding Rows to Seaborn Replot

Seaborn also allows you to pass in rows of small multiples. This works in the same way as adding columns. However, you can also combine the rows= parameter with the col= parameter to create rows and columns of small multiples.

Let’s see what this looks like:

# Adding Columns and Rows of Data
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', row='sex', col='species')
plt.show()

In the code block above, we passed in row='sex' and col='species' to split the small multiples based on both of these columns. This returns the following data visualization:

Adding Rows and Columns of Small Multiples in Seaborn Catplot

Let’s now take a look at how we can customize the data visualizations by adding titles and axis labels in our charts.

Changing Titles and Axis Labels in Seaborn Catplot

Adding titles and descriptive axis labels is a great way to make your data visualization more communicative. In many cases, your readers will want to know specifically what a data point and graph represent. Because of this, it’s important to understand how to customize these in Seaborn.

Adding a Title to a Seaborn Catplot

To add a title to a Seaborn catplot(), we can use the fig.suptitle() method available in Matplotlib. In order to do this, we’ll need to first adjust the spacing of our figure object. This process can be a bit heuristic and require some trial and error.

Take a look at the code block below:

# Adding Titles to Seaborn Catplot
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

cat = sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', row='sex', col='species')
cat.fig.subplots_adjust(top=0.92)
cat.fig.suptitle('Comparing Penguins')

plt.show()

In the code block above, we made a number of important changes:

We filtered the DataFrame to make the visual easier to see
We assigned the catplot to a variable, cat
We then adjusted the top margin using fig.subplots_adjust()
Then, we passed in a suptitle() onto the figure object

This returned the following data visualization:

Similarly, we can customize the titles of each of the subplots that we create. Let’s take a look at that next.

Adding Titles to Rows and Columns in Seaborn Catplot

Seaborn provides incredibly flexible formatting options for styling small multiples created with the col= and row= parameters.

# Adding Row and Column Titles
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

cat = sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', row='sex', col='species')
cat.fig.subplots_adjust(top=0.92)
cat.fig.suptitle('Comparing Penguins')
cat.set_titles(row_template='Penguin Sex: {row_name}', col_template='Penguin Species: {col_name}')

plt.show()

In the code block above, we used the .set_titles() method which is available to FacetGrid objects. The method allows you to use the row_template= and col_template= parameters which allow you to access the col_name and row_name variables in f-string like formatting.

This returns the data visualization below:

Adding Titles to Rows and Columns of Small Multiples in Seaborn catplots

In the following section, you’ll learn how to customize the axis labels in a Seaborn catplot.

Changing Axis Labels in Seaborn Catplot

By default, Seaborn will use the column labels as the axis labels in the visualization. In many cases, however, this isn’t a very descriptive title to use. Because the catplot() function returns a FacetGrid object, we can use helper methods to solve this, including:

.set_xlabel() which sets the x-axis label
.set_ylabel() which sets the y-axis label
.set_axis_labels() which sets both the x- and y-axis labels at once

Let’s see what this looks like in Seaborn:

# Modifying Axis Labels in Seaborn
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

cat = sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', row='sex', col='species')
cat.fig.subplots_adjust(top=0.92)
cat.fig.suptitle('Comparing Penguins')
cat.set_titles(row_template='Penguin Sex: {row_name}', col_template='Penguin Species: {col_name}')

cat.set_xlabels('Bill Length (mm)')
cat.set_ylabels('Island Name')

plt.show()

In the code block above, we added two additional lines of code toward the end to customize the axis labels of our data visualization. This returns the following data visualization:

In the section below, you’ll learn how to change the size of a Seaborn catplot.

Changing the Size of a Seaborn catplot

Because the Seaborn catplot() function returns a FacetGrid object, we can easily modify the size of the figure object that is returned. In order to do this, we can use the two following parameters:

height= which determines the height in inches of each facet
aspect= which determines the aspect ratio, so that the width is height * aspect

Let’s see how we can change the size of a simpler data visualization in Seaborn:

# Changing the Size of a Seaborn catplot()
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')

cat = sns.catplot(data=df, x='island', y='bill_length_mm', kind='violin', height=5, aspect=1.6)

plt.show()

In the code block above, we passed in height=5, aspect=1.6. This means that the height of the facet will be 5 inches, while the width will be 8 inches (5 * 1.6). This returns the following data visualization:

Changing a Figure Size in a Seaborn catplot

It’s incredibly simply to modify the size of your visualization. This can be very useful when dealing with data that are spread horizontally or vertically while reducing whitespace.

Conclusion

In this tutorial, you learned how to use the Seaborn catplot() function to create figure-level relational visualizations. The function allows you to easily create categorical plots, including bar plots and box plots, while providing a familiar and consistent interface.

You first learned how to create simple figure-level objects, then worked through to more complex examples by adding additional detail using color. From there, you learned how to create small multiples by adding rows and columns of charts. Finally, you learned how to customize the visualizations by modifying titles, axis labels, and the size of the visual.

Additional Resources

To learn more about related topics, check out the resources below:

Seaborn catplot – Categorical Data Visualizations in Python

Understanding the Seaborn catplot() Function

Loading a Sample Dataset

Creating a Basic catplot with Seaborn

Creating a Bar Chart with Seaborn catplot

Understanding Error Bands in Seaborn catplot

Modifying Seaborn catplot with Color

Adding Color to Seaborn Catplot

Adding Color Styles versus Adding Color Dimensions

Creating Subsets of Plots with Rows and Columns

Adding Columns to Seaborn Catplot

Adding Rows to Seaborn Replot

Changing Titles and Axis Labels in Seaborn Catplot

Adding a Title to a Seaborn Catplot

Adding Titles to Rows and Columns in Seaborn Catplot

Changing Axis Labels in Seaborn Catplot

Changing the Size of a Seaborn catplot

Conclusion

Additional Resources

Nik Piepenbreier

Leave a Reply Cancel reply

Seaborn catplot – Categorical Data Visualizations in Python

Understanding the Seaborn catplot() Function

Loading a Sample Dataset

Creating a Basic catplot with Seaborn

Creating a Bar Chart with Seaborn catplot

Understanding Error Bands in Seaborn catplot

Modifying Seaborn catplot with Color

Adding Color to Seaborn Catplot

Adding Color Styles versus Adding Color Dimensions

Creating Subsets of Plots with Rows and Columns

Adding Columns to Seaborn Catplot

Adding Rows to Seaborn Replot

Changing Titles and Axis Labels in Seaborn Catplot

Adding a Title to a Seaborn Catplot

Adding Titles to Rows and Columns in Seaborn Catplot

Changing Axis Labels in Seaborn Catplot

Changing the Size of a Seaborn catplot

Conclusion

Additional Resources

Nik Piepenbreier

Leave a Reply Cancel reply

Thank you!