Seaborn in Python for Data Visualization

In this tutorial, you’ll learn how to create a wide variety of different plots using Seaborn in Python, as well as how to apply different styling options to these plots.

If you’ve used Matplotlib in the past, you’ll probably be familiar with writing tons of lines of code to produce a decent looking visualization. This is where Seaborn comes in – it allows you to create visually pleasing plots with very few lines of code. In addition, it’s built on top of Matplotlib, allowing you to access its massive customization API.

What Is Seaborn in Python?

In short, Seaborn provides an API over Matplotlib that offers high-level functions for statistical plots, integrates with Pandas dataframes, and provides beautiful color and plot style defaults.

Matplotlib has been around for decades and provides low-level plotting functionality. While this is great, it also means writing a lot of boilerplate code to develop statistical plots. While Seaborn uses Matplotlib under its hood, it ensures that visualizations can be developed in much less code.

Matplotlib predates the development of Pandas and, while it has made some strides towards compatibility with Pandas dataframes, it does not intuitively support them. Seaborn comes with built-in support for the ever-popular data science library.

Seaborn also offers built-in color palettes that meet specific purposes. It includes color palettes that designed for qualitative data, sequential data, and diverging data representations. 

Are you enjoying the content? Check out my YouTube channel for even more Python content!

Installing and Loading Seaborn in Python

Installing Seaborn can be done using either pip or conda. Depending on your preference, type one of the following commands to install Seaborn:

$ pip install seaborn
$ conda install seaborn

You have now installed seaborn using either pip or conda!

Installing Pandas

Throughout this tutorial, you’ll learn how to use Seaborn with Pandas dataframes. Because of this, you may need to install Pandas as well.

Similar to Seaborn, Pandas can be installed with either pip or conda:

$ pip install pandas
$ conda install pandas

You’ve now successfully installed Pandas!

Need a refresher in Pandas? To learn more about Pandas, check out my other tutorials on Pandas here!

Loading Seaborn, Pandas, and Your Dataset

Now that you have both Seaborn and Pandas installed, let’s load these into your Python environment. After, you’ll load a dataframe to follow along with this tutorial.

Throughout this tutorial, you’ll be using a dataset provided by Five Thirty Eight. Information about the dataset and the dataset itself can be found here: https://github.com/fivethirtyeight/WNBA-stats. Specifically, you’ll be using Player Stats dataset.

Let’s load the libraries first:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Using the code above, we have imported both Seaborn and Pandas. We assign both of these aliases to make calling their methods easier.

seaborn is assigned the alias sns, pandas is assigned the alias pd, and pyplot is assigned the alias plt.

Now let’s load our dataset. We can do this by using the Pandas `read_csv()` function available within Pandas. Let’s assign the dataframe to the variable df:

df = pd.read_csv("https://github.com/fivethirtyeight/WNBA-stats/raw/master/wnba-player-stats.csv")

Let’s take a quick look at the dataset. We can explore the first five records using the Pandas .head() method:

df.head()
    player_ID              Player  ...  Composite_Rating  Wins_Generated
0  montgre01w    Renee Montgomery  ...              -2.4            1.22
1  williel01w  Elizabeth Williams  ...               0.6            2.51
2  sykesbr01w      Brittney Sykes  ...              -3.4            0.70
3  hayesti01w       Tiffany Hayes  ...              -1.5            1.45
4  brelaje01w     Jessica Breland  ...              -0.8            1.62

[5 rows x 28 columns]

The .head() method returns the first five rows of the dataset, as shown above.

Five Thirty Eight provides an overview of all of the different columns on their Github page, which would certainly be worth exploring. The Github page can be found here.

Creating Your First Seaborn Plot

Let’s create your first Seaborn plot!

For this for plot, you’ll create a scatter plot. Seaborn makes this easy by using the lmplot() function. The function requires an x and y parameter that integrate nicely with the Pandas dataframe you created earlier:

sns.lmplot(data=df, x="G", y="MP")

If you’re running this in a Jupyter environment, the plot will show immediately. However, if you’re using the Shell, youll need to write `plt.show()` to generate your plot. Running this generates the image below:

Your First Seaborn Plot
Your first Seaborn plot!

There are a few things we can take note of right off the bat:

  • The integration with Pandas make generating visualization very easy. For example, the column names used in the x and y parameters are referred to as strings, rather than having to tie back to the Pandas dataframe.
  • The visualization has a much more modern aesthetic, compared to vanilla Matplotlib.
  • The plot includes a regression line by default.

You’ll learn about the regression plot in more detail later on, including some of the other parameters that the function accepts.

Styling and Customizing Seaborn in Python

In this section, you’ll learn more about how to style your Seaborn plots. This is a huge benefit of Seaborn, where many aesthetically-pleasing styles have been built in. Specifically, you’ll learn how to use built-in themes, how to use color palettes, and how to add titles and labels to plots.

Styles for Seaborn in Python

One of the benefits of Seaborn is that controlling aesthetics is much simpler than Matplotlib.

Seaborn has five built-in themes:

  • Darkgrid,
  • Whitegrid,
  • Dark,
  • White, and
  • Ticks

By default, Seaborn uses the darkgrid style.

Let’s apply the whitegrid style:

sns.set_style("whitegrid")

This piece of code tell Seaborn to use the whitegrid style. Now you can reprint the plot we previously made to see what this style looks like:

sns.lmplot(data=df, x="G", y="MP")

Running this code generates the image below:

Seaborn whitegrid style
Applying the Seaborn whitegrid style

Here you have generated a plot with the `whitegrid` style.

Using Color Palettes

Seaborn also gives you great flexibility to customize color palettes. Seaborn comes with a number of built-in color palettes, that can be used for different purposes, depending on the type of data you’re visualizing.

These include:

  • Qualitative Color Palettes,
  • Sequential Color Palettes, and
  • Diverging Color Palettes.

To see a color palette, Seaborn has a built-in function palplot() to display a particular palette.

One of the built-in palettes is the pastel palette. Let’s build a palplot with the pastel palette:

palette = sns.color_palette("pastel")
sns.palplot(palette)

With the code above, you first assign the pastel color palette to a variable named palette, and then pass it into a palplot to generate the plot.

Seaborn Pastel Color Palette
Applying a Seaborn pastel color palette

Qualitative Color Palettes

Qualitative color palettes are used to show discrete types of data that don’t have any inherent ordering. The colors will be different enough to easily discern categories and without implying any inherent ordering.

Following suit with our dataset, a good example of this would be a player’s position – each position belongs to a basketball team, each is different, and each position is equally important.

By default this palette will have ten colors (take a look below to see how to adjust this). The built-in options are: deep, muted, pastel, bright, dark, and colorblind.

Let’s build another palette plot with a qualitative color palette:

sns.palplot(sns.color_palette("muted"))

Running this code generates the following palplot:

Muted Colour Palette 8
Applying a muted color palette with eight colors

Here, you have assigned the muted color palette!

If you wanted to adjust the number of colors, simply follow the name of the palette with an integer. For example, to change the number of colors to eight, you could write:

sns.palplot(sns.color_palette("muted", 8))

Running this code generates the following:

Muted Colour Palette 8

You can see here that you’ve now generated a palplot with a different number of colors!

Sequential Color Palettes

Sequential color palettes, as the name implies, show colors in a sequential pattern, going from lighter to darker.

This type of color palette is useful when there is logical ordering in discrete variables (such as shoe size) or in continuous pieces of data (such as height).

sns.palplot(sns.color_palette("Blues"))

This code generates the following plot, where we use the Blues palette:

Blues Color Palette

Sequential color palletes have a number of pre-built options named after their dominant colors.

Diverging Color Palettes

Diverging color palettes are useful when both extreme high and low values are useful. There is also typically a relevant midpoint

For example, point differential (the difference between points scored and points conceded) could be useful to show in this way.

sns.palplot(sns.color_palette("RdBu_r", 7))

This generates the following plot:

Seaborn diverging color palette

Learn more about color palettes on the official documentation.

Adding Titles and Labels to Seaborn in Python

To add titles and axis labels to Seaborn visualizations, you first assign the visualization to an object. That object can then assign titles and axis labels to the object, similar to in Matplotlib.

Let’s add a title and labels to the visualization you made earlier:

sns.lmplot(data=df, x="G", y="MP")
plt.title("My first Seaborn visualization")
plt.ylabel("Minutes Played")
plt.xlabel("Games Played")

This generates the image below:

Seaborn visualization
Your first Seaborn visualization!

Now you’ve generated your first lmplot!

Creating Relational Plots With Seaborn

Relational plots allow you to identify relationships between two variables. This allows you to visually identify potential correlation between two variables. The two plots we’ll cover off in this tutorial are scattter plots and lineplots.

Creating Scatter Plots With Seaborn

The scatter plot is one of the most important visualizations. It uses a scattering of points to visualize the distribution of two variables, where each point depicts an observation in a dataset.

Let’s create a scatterplot that illustrates the relationship between the Game Played (G) and Minutes Played (MP) variables.

Seaborn uses the relplot() function to plot out a scatter plot (or relationship plot) between two variables.

sns.relplot(data=df, x="G", y="MP")

This generates the following image:

First scatter plot

This is very messy – let’s limit the dataframe to only the Atlanta team. We can do this directly in the plotting function:

sns.relplot(data=df[df["Tm"] == "ATL"], x="G", y="MP")
First scatter plot filtered
Filtering your Seaborn scatter plot

We’ve filtered the Pandas dataframe to only show teams belonging to Atlanta, where the abbreviation ‘ATL’ is used.

Now we can see even more clearly that an increase in games played is correlated with an increase in minutes played.

You can add further detail by adding a hue to the dataset. This will allow you to see an additional layer of detail to help identify patterns. Let’s change the hue of the year_ID variable.

sns.relplot(data=df[df["Tm"] == "ATL"], x="G", y="MP", hue="year_ID")

Similar to above, we’ve narrows the dataframe to only show Atlanta (‘ATL’).

This generates the following image:

Seaborn first scatter plot hue
Applying hues to your Seaborn scatter plot data

You can also change the size of each dot by using the size argument to represent another variable. We can accomplish this by using:

sns.relplot(
    data=df[df["Tm"] == "ATL"],
    x="G",
    y="MP",
    hue="year_ID",
    size="Wins_Generated",
)

This generates the following plot:

Seaborn first scatter plot hue and size
Applying hue and size to your scatter plot data

By adding in the size variable, we can see that the players that generate more wins also played more game and minutes.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Creating Line Plots With Seaborn

Line plots are a wonderful tool for illustrating the relationship between one variable along a continuous axis (such as time).

We can plot across the different seasons. Let’s create a line plot that illustrates the change in Player Efficiency Rating (PER) year-over-year for Atlanta players:

sns.relplot(data=df[df["Tm"] == "ATL"], 
            x="year_ID", 
            y="PER", 
            kind="line")

This generates the following plot:

Relplot lineplot

We can also plot multiple lines on the same chart. This can be accomplished using the color argument. Let’s create a new chart and include some additional teams:

data = df[df["Tm"].isin(["ATL", "CHI"])]
sns.relplot(data=data, 
            x="year_ID", 
            y="PER", 
            kind="line", 
            hue="Tm")

This generates the following:

Relplot lineplot multiple lines

There’s now a bit more going on in this plot. Seaborn attempts to add the confidence intervals by default. This can be useful, but can also slow down the datasets. You can disable this by using the ci argument:

data = df[df["Tm"].isin(["ATL", "CHI"])]
sns.relplot(data=data, 
            x="year_ID", 
            y="PER", 
            kind="line", 
            hue="Tm", 
            ci=None)

This plots the following:

Seaborn Line Plot with no error bars
Your Seaborn Line Plot with no error bars

Finally, let’s add a different style and some different colors to this plot to make it a little easier to read:

sns.set_style("darkgrid")
sns.set() 
data = df[df["Tm"].isin(["ATL", "CHI"])]
sns.relplot(data=data, 
            x="year_ID", 
            y="PER", 
            kind="line", 
            hue="Tm", 
            ci=None, )

The first line sets a new style and the second line resets the original color palette. The third line filters down the Pandas dataframe to only Atlanta and Chicago. Finally, the fourth line creates the .relplot() of the data.

This generates:

Seaborn relplot or Lineplot
Your Seaborn Line Plot with no error bars

Here you’ve created a narrowed down visualization for just two teams!

Creating Categorical Plots With Seaborn in Python

Categorical plots are useful plots when viewing data that naturally falls into different categories (such as teams, ages, etc.).

Seaborn uses a high-level interface to generate categorical plots, using the .catplot() function, and passing the plot type in as the kind= argument. It also lets you generate individual-style plots using functions for each plot type, such as .boxplot() and .barplot().

Let’s explore a few of these!

Creating Bar Plots in Seaborn in Python

We’ll begin by creating a barplot that shows the average number of games played by players broken out by team.

We’ll do this using the .catplot() function:

teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"]
data = df[df["Tm"].isin(teams)]
sns.catplot(data=data, x="Tm", y="G", kind="bar")

This generates:

Seaborn Bargraph Catplot

This plot shows us the mean value of each player’s amount of game played, with an error line imposted on it.

Another useful function of a bar plot is to show the number of observations in a category. This can be done by using the type='count' argument.

Let’s see how many players a number of teams have had that have played less than 5 games.

teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"]
data=df[(df["Tm"].isin(teams) & (df["G"] <= 5))]
sns.catplot(data=data, x="Tm", kind="count")

This produces the following:

Generating a Bargraph Catplot
Generating a Seaborn Bargraph

We can see here how we have changed the aggregation method with the kind= argument.

Creating Box Plots With Seaborn

Box plots are another very useful plot for showing distributions within a range of categories. Boxplots break your data in four different quartiles and show outliers to the data (at both extremes).

To demonstrate this, let’s create a boxplot. Similar to other categorical plots, Seaborn uses the .catplot() function and passes box as an argument to the kind parameter. We’ll use the same subset of teams as above and use the age variable as our y-axis.

teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"]
data = df[df["Tm"].isin(teams)]
sns.catplot(data=data, x="Tm", y="Age", kind="box")

This generates:

Generating a Seaborn Box Plot
Generating a Seaborn Box Plot

Boxplots allow us to visualize the spread of our values. The bottom whisker (“line”) shows the bottom value (excluding outliers). The bottom 25% of the data extends to the bottom of the box. The box is seperated by a line, which seperates the next two quartiles. Finally, the top whisker extends to the largest value (excluding outliers).

We can see from the visualization that Dallas has the smallest range of ages, while Los Angeles has the largest range.

Creating Violin Plots With Seaborn in Python

Violin plots take boxplots one step further by showing the kernel density distribution within each category.

To demonstrate this, let’s create the same plot as above, but as a violin plot:

teams = ["ATL", "CHI", "CON", "DAL", "IND", "LAS", "LVA"]
data = df[df["Tm"].isin(teams)]
sns.catplot(data=data, x="Tm", y="Age", kind="violin")

This generates:

Seaborn Violin Plot
Generating a Seaborn Violin Plot

There’s a couple of important pieces to note here:

  • The white dot represents the median value,
  • The thick dark line represents the interquartile range,
  • The thin line represents the rest of the range (excluding outliers)
  • The width of the violin represents the amount of values that fall into that y-range.

Creating Distribution Plots With Seaborn in Python

Distribution plots are useful for, well, determining the distribution of variables.

This type of plot includes the histogram and the kernel density plot.

Creating Histograms in Seaborn

The most common of this is the histogram, which forms bins to show groups of data and their frequencies within a dataset. For example, age or game played may be grouped into buckets of different sizes.

Let’s create a histogram of the age variable, across all teams.

sns.distplot(df["Age"])

This generates:

Seaborn historgram with density
Creating a Seaborn histogram with a kernel density line

There’s a couple of things to note here:

  • Seaborn did not create any bins, as each age is represented by its own bar. To generate your own bins, you can use the bins parameter to specify how many bins you want. You can either pass in a number to specify the number of bins, or a list of values that explicitly define the bin edges.
  • Seaborn automatically adds a kernel density line to the graph, which helps visualize the distribution of these variables. This can be turned off using the kde parameter and setting it to False.

Let’s recreate the histogram with bins in 5-year increments and turn off the kernel density line.

To learn how to create a histogram using just Pandas or Matplotlib, check out my other tutorial here.

The bins can be done by typing out the full list (i.e., [0,5,10,15,20,25,30,35,40,45,50,55,60]), or by using the range function which generates this for us. To save time, let’s use the range function:

sns.distplot(df["Age"], bins=range(0,60, 5), kde=False)

This generates:

Filtered Seaborn histogram
Filtering your Seaborn histogram

By setting kde to False, the y-axis also changes to show the count (rather than proportion) of instances.

Creating Kernel Density Plots in Seaborn

Kernel density plots are similar to histograms in that they plot out the distributions. In fact, it’s the same line that is on by default in the histogram shown above. To plot only the kernel density estimation, simply set the hist parameter to False:

sns.distplot(df["Age"], hist=False)

This generates:

Seaborn distplot
Generating a density Seaborn plot

You’ve created a kernel density plot!

Plotting Multiple Charts With Seaborn

It may be useful to generate multiple charts at the same time to better be able to explore relationships across a number of variables.

Seaborn in Python makes this relatively straightforward. We can pass in column (col) and row (row) parameters in order to create a grid of plots.

For example, let’s create a grid of plots where we map out different teams as columns and different years as rows. In order to keep this manageable, let’s filter down to three teams and three years.

teams = ["ATL", "CHI", "CON"]
years = [2017, 2018, 2019]
data = df[(df["Tm"].isin(teams)) & (df["year_ID"].isin(years))]
sns.relplot(data=data, x='G', y='MP', col="Tm", row="year_ID")

This code first filters down the teams and years using the .isin() method, using lists to define the teams and years we want to filter to. Finally, line 4 uses the .relplot() function to create scatter plots, where we set the Team and Years as columns and rows, respectively.

This generates:

Multiple Seaborn Charts
Applying multiple Seaborn charts

Using the code above, you’ve produced a grid of plots that are filtered down to different sets of data.

Conclusion

In this tutorial, you learned the basics of creating and customizing plots with Seaborn in Python. You now know how to change default styles and colors, as well as how to create relational, categorical, and distribution plots with Seaborn.