In this tutorial, you’ll learn how to create a wide variety of different plots using Seaborn in Python, as well as how to apply different styling options to these plots.
If you’ve used Matplotlib in the past, you’ll probably be familiar with writing tons of lines of code to produce a decent looking visualization. This is where Seaborn comes in – it allows you to create visually pleasing plots with very few lines of code. In addition, it’s built on top of Matplotlib, allowing you to access its massive customization API.
In this tutorial, you’ll learn:
- What Seaborn is
- How to install and load Seaborn
- Customize Seaborn visualizations
- Create relational charts
- How to create categorical charts
- How to create distribution charts
- Plot multiple charts in Seaborn
What Is Seaborn in Python?
In short, Seaborn provides an API over Matplotlib that offers high-level functions for statistical plots, integrates with Pandas dataframes, and provides beautiful color and plot style defaults.
Matplotlib has been around for decades and provides low-level plotting functionality. While this is great, it also means writing a lot of boilerplate code to develop statistical plots. While Seaborn uses Matplotlib under its hood, it ensures that visualizations can be developed in much less code.
Matplotlib predates the development of Pandas and, while it has made some strides towards compatibility with Pandas dataframes, it does not intuitively support them. Seaborn comes with built-in support for the ever-popular data science library.
Seaborn also offers built-in color palettes that meet specific purposes. It includes color palettes that designed for qualitative data, sequential data, and diverging data representations.
Are you enjoying the content? Check out my YouTube channel for even more Python content!
Installing and Loading Seaborn in Python
Installing Seaborn can be done using either
conda. Depending on your preference, type one of the following commands to install Seaborn:
$ pip install seaborn $ conda install seaborn
You have now installed seaborn using either
Throughout this tutorial, you’ll learn how to use Seaborn with Pandas dataframes. Because of this, you may need to install Pandas as well.
Similar to Seaborn, Pandas can be installed with either pip or conda:
$ pip install pandas $ conda install pandas
You’ve now successfully installed Pandas!
Need a refresher in Pandas? To learn more about Pandas, check out my other tutorials on Pandas here!
Loading Seaborn, Pandas, and Your Dataset
Now that you have both Seaborn and Pandas installed, let’s load these into your Python environment. After, you’ll load a dataframe to follow along with this tutorial.
Throughout this tutorial, you’ll be using a dataset provided by Five Thirty Eight. Information about the dataset and the dataset itself can be found here: https://github.com/fivethirtyeight/WNBA-stats. Specifically, you’ll be using Player Stats dataset.
Let’s load the libraries first:
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt
Using the code above, we have imported both Seaborn and Pandas. We assign both of these aliases to make calling their methods easier.
seaborn is assigned the alias
sns, pandas is assigned the alias
pd, and pyplot is assigned the alias
Now let’s load our dataset. We can do this by using the Pandas `read_csv()` function available within Pandas. Let’s assign the dataframe to the variable
df = pd.read_csv("https://github.com/fivethirtyeight/WNBA-stats/raw/master/wnba-player-stats.csv")
Let’s take a quick look at the dataset. We can explore the first five records using the Pandas
player_ID Player ... Composite_Rating Wins_Generated 0 montgre01w Renee Montgomery ... -2.4 1.22 1 williel01w Elizabeth Williams ... 0.6 2.51 2 sykesbr01w Brittney Sykes ... -3.4 0.70 3 hayesti01w Tiffany Hayes ... -1.5 1.45 4 brelaje01w Jessica Breland ... -0.8 1.62 [5 rows x 28 columns]
.head() method returns the first five rows of the dataset, as shown above.
Five Thirty Eight provides an overview of all of the different columns on their Github page, which would certainly be worth exploring. The Github page can be found here.
Creating Your First Seaborn Plot
Let’s create your first Seaborn plot!
For this for plot, you’ll create a scatter plot. Seaborn makes this easy by using the
lmplot() function. The function requires an
x and y parameter that integrate nicely with the Pandas dataframe you created earlier:
sns.lmplot(data=df, x="G", y="MP")
If you’re running this in a Jupyter environment, the plot will show immediately. However, if you’re using the Shell, youll need to write `plt.show()` to generate your plot. Running this generates the image below:
There are a few things we can take note of right off the bat:
- The integration with Pandas make generating visualization very easy. For example, the column names used in the x and y parameters are referred to as strings, rather than having to tie back to the Pandas dataframe.
- The visualization has a much more modern aesthetic, compared to vanilla Matplotlib.
- The plot includes a regression line by default.
You’ll learn about the regression plot in more detail later on, including some of the other parameters that the function accepts.
Styling and Customizing Seaborn in Python
In this section, you’ll learn more about how to style your Seaborn plots. This is a huge benefit of Seaborn, where many aesthetically-pleasing styles have been built in. Specifically, you’ll learn how to use built-in themes, how to use color palettes, and how to add titles and labels to plots.
Styles for Seaborn in Python
One of the benefits of Seaborn is that controlling aesthetics is much simpler than Matplotlib.
Seaborn has five built-in themes:
- White, and
By default, Seaborn uses the
Let’s apply the
This piece of code tell Seaborn to use the
whitegrid style. Now you can reprint the plot we previously made to see what this style looks like:
sns.lmplot(data=df, x="G", y="MP")
Running this code generates the image below:
Here you have generated a plot with the `whitegrid` style.
Using Color Palettes
Seaborn also gives you great flexibility to customize color palettes. Seaborn comes with a number of built-in color palettes, that can be used for different purposes, depending on the type of data you’re visualizing.
- Qualitative Color Palettes,
- Sequential Color Palettes, and
- Diverging Color Palettes.
To see a color palette, Seaborn has a built-in function
palplot() to display a particular palette.
One of the built-in palettes is the pastel palette. Let’s build a palplot with the pastel palette:
palette = sns.color_palette("pastel") sns.palplot(palette)
With the code above, you first assign the pastel color palette to a variable named palette, and then pass it into a palplot to generate the plot.
Qualitative Color Palettes
Qualitative color palettes are used to show discrete types of data that don’t have any inherent ordering. The colors will be different enough to easily discern categories and without implying any inherent ordering.
Following suit with our dataset, a good example of this would be a player’s position – each position belongs to a basketball team, each is different, and each position is equally important.
By default this palette will have ten colors (take a look below to see how to adjust this). The built-in options are: deep, muted, pastel, bright, dark, and colorblind.
Let’s build another palette plot with a qualitative color palette:
Running this code generates the following palplot:
Here, you have assigned the muted color palette!
If you wanted to adjust the number of colors, simply follow the name of the palette with an integer. For example, to change the number of colors to eight, you could write:
Running this code generates the following:
You can see here that you’ve now generated a palplot with a different number of colors!
Sequential Color Palettes
Sequential color palettes, as the name implies, show colors in a sequential pattern, going from lighter to darker.
This type of color palette is useful when there is logical ordering in discrete variables (such as shoe size) or in continuous pieces of data (such as height).
This code generates the following plot, where we use the Blues palette:
Sequential color palletes have a number of pre-built options named after their dominant colors.
Diverging Color Palettes
Diverging color palettes are useful when both extreme high and low values are useful. There is also typically a relevant midpoint
For example, point differential (the difference between points scored and points conceded) could be useful to show in this way.
This generates the following plot:
Learn more about color palettes on the official documentation.
Adding Titles and Labels to Seaborn in Python
To add titles and axis labels to Seaborn visualizations, you first assign the visualization to an object. That object can then assign titles and axis labels to the object, similar to in Matplotlib.
Let’s add a title and labels to the visualization you made earlier:
sns.lmplot(data=df, x="G", y="MP") plt.title("My first Seaborn visualization") plt.ylabel("Minutes Played") plt.xlabel("Games Played")
This generates the image below:
Now you’ve generated your first lmplot!
Creating Relational Plots With Seaborn
Relational plots allow you to identify relationships between two variables. This allows you to visually identify potential correlation between two variables. The two plots we’ll cover off in this tutorial are scattter plots and lineplots.
Creating Scatter Plots With Seaborn
The scatter plot is one of the most important visualizations. It uses a scattering of points to visualize the distribution of two variables, where each point depicts an observation in a dataset.
Let’s create a scatterplot that illustrates the relationship between the Game Played (G) and Minutes Played (MP) variables.
Seaborn uses the
relplot() function to plot out a scatter plot (or relationship plot) between two variables.
sns.relplot(data=df, x="G", y="MP")
This generates the following image:
This is very messy – let’s limit the dataframe to only the Atlanta team. We can do this directly in the plotting function:
sns.relplot(data=df[df["Tm"] == "ATL"], x="G", y="MP")
We’ve filtered the Pandas dataframe to only show teams belonging to Atlanta, where the abbreviation ‘ATL’ is used.
Now we can see even more clearly that an increase in games played is correlated with an increase in minutes played.
You can add further detail by adding a hue to the dataset. This will allow you to see an additional layer of detail to help identify patterns. Let’s change the hue of the year_ID variable.
sns.relplot(data=df[df["Tm"] == "ATL"], x="G", y="MP", hue="year_ID")
Similar to above, we’ve narrows the dataframe to only show Atlanta (‘ATL’).
This generates the following image:
You can also change the size of each dot by using the size argument to represent another variable. We can accomplish this by using:
sns.relplot( data=df[df["Tm"] == "ATL"], x="G", y="MP", hue="year_ID", size="Wins_Generated", )
This generates the following plot:
By adding in the size variable, we can see that the players that generate more wins also played more game and minutes.
Creating Line Plots With Seaborn
Line plots are a wonderful tool for illustrating the relationship between one variable along a continuous axis (such as time).
We can plot across the different seasons. Let’s create a line plot that illustrates the change in Player Efficiency Rating (PER) year-over-year for Atlanta players:
sns.relplot(data=df[df["Tm"] == "ATL"], x="year_ID", y="PER", kind="line")
This generates the following plot:
We can also plot multiple lines on the same chart. This can be accomplished using the color argument. Let’s create a new chart and include some additional teams:
data = df[df["Tm"].isin(["ATL", "CHI"])] sns.relplot(data=data, x="year_ID", y="PER", kind="line", hue="Tm")
This generates the following:
There’s now a bit more going on in this plot. Seaborn attempts to add the confidence intervals by default. This can be useful, but can also slow down the datasets. You can disable this by using the
data = df[df["Tm"].isin(["ATL", "CHI"])] sns.relplot(data=data, x="year_ID", y="PER", kind="line", hue="Tm", ci=None)
This plots the following:
Finally, let’s add a different style and some different colors to this plot to make it a little easier to read:
sns.set_style("darkgrid") sns.set() data = df[df["Tm"].isin(["ATL", "CHI"])] sns.relplot(data=data, x="year_ID", y="PER", kind="line", hue="Tm", ci=None, )
The first line sets a new style and the second line resets the original color palette. The third line filters down the Pandas dataframe to only Atlanta and Chicago. Finally, the fourth line creates the
.relplot() of the data.
Here you’ve created a narrowed down visualization for just two teams!
Creating Categorical Plots With Seaborn in Python
Categorical plots are useful plots when viewing data that naturally falls into different categories (such as teams, ages, etc.).
Seaborn uses a high-level interface to generate categorical plots, using the
.catplot() function, and passing the plot type in as the
kind= argument. It also lets you generate individual-style plots using functions for each plot type, such as
Let’s explore a few of these!
Creating Bar Plots in Seaborn in Python
We’ll begin by creating a barplot that shows the average number of games played by players broken out by team.
We’ll do this using the
teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"] data = df[df["Tm"].isin(teams)] sns.catplot(data=data, x="Tm", y="G", kind="bar")
This plot shows us the mean value of each player’s amount of game played, with an error line imposted on it.
Another useful function of a bar plot is to show the number of observations in a category. This can be done by using the
Let’s see how many players a number of teams have had that have played less than 5 games.
teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"] data=df[(df["Tm"].isin(teams) & (df["G"] <= 5))] sns.catplot(data=data, x="Tm", kind="count")
This produces the following:
We can see here how we have changed the aggregation method with the
Creating Box Plots With Seaborn
Box plots are another very useful plot for showing distributions within a range of categories. Boxplots break your data in four different quartiles and show outliers to the data (at both extremes).
To demonstrate this, let’s create a boxplot. Similar to other categorical plots, Seaborn uses the
.catplot() function and passes
box as an argument to the kind parameter. We’ll use the same subset of teams as above and use the age variable as our y-axis.
teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"] data = df[df["Tm"].isin(teams)] sns.catplot(data=data, x="Tm", y="Age", kind="box")
Boxplots allow us to visualize the spread of our values. The bottom whisker (“line”) shows the bottom value (excluding outliers). The bottom 25% of the data extends to the bottom of the box. The box is seperated by a line, which seperates the next two quartiles. Finally, the top whisker extends to the largest value (excluding outliers).
We can see from the visualization that Dallas has the smallest range of ages, while Los Angeles has the largest range.
Creating Violin Plots With Seaborn in Python
Violin plots take boxplots one step further by showing the kernel density distribution within each category.
To demonstrate this, let’s create the same plot as above, but as a violin plot:
teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"] data = df[df["Tm"].isin(teams)] sns.catplot(data=data, x="Tm", y="Age", kind="violin")
There’s a couple of important pieces to note here:
- The white dot represents the median value,
- The thick dark line represents the interquartile range,
- The thin line represents the rest of the range (excluding outliers)
- The width of the violin represents the amount of values that fall into that y-range.
Creating Distribution Plots With Seaborn in Python
Distribution plots are useful for, well, determining the distribution of variables.
This type of plot includes the histogram and the kernel density plot.
Creating Histograms in Seaborn
The most common of this is the histogram, which forms bins to show groups of data and their frequencies within a dataset. For example, age or game played may be grouped into buckets of different sizes.
Let’s create a histogram of the age variable, across all teams.
There’s a couple of things to note here:
- Seaborn did not create any bins, as each age is represented by its own bar. To generate your own bins, you can use the bins parameter to specify how many bins you want. You can either pass in a number to specify the number of bins, or a list of values that explicitly define the bin edges.
- Seaborn automatically adds a kernel density line to the graph, which helps visualize the distribution of these variables. This can be turned off using the kde parameter and setting it to False.
Let’s recreate the histogram with bins in 5-year increments and turn off the kernel density line.
To learn how to create a histogram using just Pandas or Matplotlib, check out my other tutorial here.
The bins can be done by typing out the full list (i.e., [0,5,10,15,20,25,30,35,40,45,50,55,60]), or by using the range function which generates this for us. To save time, let’s use the range function:
sns.distplot(df["Age"], bins=range(0,60, 5), kde=False)
kde to False, the y-axis also changes to show the count (rather than proportion) of instances.
Creating Kernel Density Plots in Seaborn
Kernel density plots are similar to histograms in that they plot out the distributions. In fact, it’s the same line that is on by default in the histogram shown above. To plot only the kernel density estimation, simply set the hist parameter to False:
You’ve created a kernel density plot!
Plotting Multiple Charts With Seaborn
It may be useful to generate multiple charts at the same time to better be able to explore relationships across a number of variables.
Seaborn in Python makes this relatively straightforward. We can pass in column (
col) and row (
row) parameters in order to create a grid of plots.
For example, let’s create a grid of plots where we map out different teams as columns and different years as rows. In order to keep this manageable, let’s filter down to three teams and three years.
teams = ["ATL", "CHI", "CON"] years = [2017, 2018, 2019] data = df[(df["Tm"].isin(teams)) & (df["year_ID"].isin(years))] sns.relplot(data=data, x='G', y='MP', col="Tm", row="year_ID")
This code first filters down the teams and years using the
.isin() method, using lists to define the teams and years we want to filter to. Finally, line 4 uses the
.relplot() function to create scatter plots, where we set the Team and Years as columns and rows, respectively.
Using the code above, you’ve produced a grid of plots that are filtered down to different sets of data.
In this tutorial, you learned the basics of creating and customizing plots with Seaborn in Python. You now know how to change default styles and colors, as well as how to create relational, categorical, and distribution plots with Seaborn.