Skip to content

Seaborn histplot – Creating Histograms in Seaborn

Creating Histograms in Seaborn with sns histplot() Cover Image

In this guide, you’ll learn how to use the Seaborn histplot() function to create histograms to visualize the distribution of a dataset. Histograms are valuable tools to visualize how datasets are distributed, allowing you to gain strong insight into your data. In this tutorial, you’ll learn about the different parameters and options of the Seaborn histplot function.

By the end of this tutorial, you’ll have learned the following:

  • How the Seaborn histplot() function works
  • How to customize your Seaborn histograms using color, kernel density estimates, and different bins
  • How to visualize two continuous variables using the Seaborn histplot function

If you want to learn how to check if a distribution is normal, check out my guide on using Python to test for normality.

Understanding the Seaborn histplot Function

Before diving into creating histograms in Seaborn, let’s explore the sns.histplot() function. Take a look at the code block below to see the various parameters the function has to offer:

# Understanding the Seaborn histplot() Function
seaborn.histplot(data=None, *, x=None, y=None, hue=None, weights=None, stat='count', bins='auto', binwidth=None, binrange=None, discrete=None, cumulative=False, common_bins=True, common_norm=True, multiple='layer', element='bars', fill=True, shrink=1, kde=False, kde_kws=None, line_kws=None, thresh=0, pthresh=None, pmax=None, cbar=False, cbar_ax=None, cbar_kws=None, palette=None, hue_order=None, hue_norm=None, color=None, log_scale=None, legend=True, ax=None, **kwargs)

We can see that the function offers a huge variety of parameters. While this guide won’t cover all of them, you’ll learn about the most important ones, including:

  • data= provides the data to plot via a Pandas DataFrame
  • x= and y= provide the variables to plot on the x- and y-axis respectively
  • hue= adds an additional variable to plot via a color mapping
  • binwidth= and bins= determine how wide and how many bins should be plotted, respectively

Now that you have a good understanding of the parameters the sns.histplot() function offers, let’s dive into creating histograms.

Creating a Seaborn Histogram with histplot

In order to create a histogram in Seaborn using a Pandas DataFrame, you only need to use two parameters:

  1. data= refers to the DataFrame you want to plot, and
  2. x= refers to the column label that you want to create a histogram of

Let’s see what this looks like:

# Creating a Simple Histogram
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price')
plt.show()

In the code block above, we imported both Seaborn and Matplotlib. We then created a histogram using the sns.histplot() function, returning the visual below:

Creating a Simple Histogram
Creating a Simple Histogram

We can see that Seaborn automatically created bins for us. This is something that you’ll learn how to customize shortly. However, for now, let’s focus on rotating the data to create a horizontal histogram.

Creating a Horizontal Histogram in Seaborn

In order to rotate the histogram and create a horizontal histogram in Seaborn, you can simply map the column label you used to the y= parameter. This will create lengths of bins, rather than heights of bins.

Let’s see what this looks like in Python:

# Creating a Horizontal Histogram
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, y='price')
plt.show()

In the code block above, we simply switched the x= and y= parameters. This allowed us to plot the graph horizontally, rather than vertically.

Creating a Horizontal Histogram
Creating a Horizontal Histogram

Now, let’s dive into some of the more exciting pieces of the function and explore how we can modify the bin width in histograms.

Modifying Bin Width in Seaborn Histograms

In order to modify bin widths in a Seaborn histogram, you can use the binwidth= parameter. The parameter accepts a number that represents how wide each bin should be.

By default, Seaborn will attempt to find the most appropriate bin width. However, being able to customize this will allow you to modify this representation. Let’s see how we can use the histplot() function to create bins that are 500 wide.

# Modifying Bin Width in a Seaborn Histogram
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', binwidth=500)
plt.show()

In the code block above, we only modified the binwidth= parameter, passing in 500 as the argument. This returned the image below:

Modifying Bin Width in a Histogram
Modifying Bin Width in a Histogram

In the image above, we customized the bin width of our histogram. We can see that this changed some of the details of our visualization, while the overall trend and skew remained the same.

Modifying Bin Counts in Seaborn Histograms

Similarly, we can modify the number of bins, rather than just their width. This can be incredibly helpful when you want to create a set number of bins. In order to modify the number of bins in Seaborn histograms, you can pass the bin number into the bins= parameter.

Let’s see how we can instruct Seaborn to create ten bins. By doing this, we no longer control the bin width, meaning that Seaborn will create evenly-spaced bins.

# Customizing Bin Count in a Seaborn Histogram
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', bins=10)
plt.show()

In the code block above, we instructed Seaborn to create a histogram with only ten bins. Take a look at the image below for the result of this. While we now have a set number of bins, this may result in less “clean” bins, as shown below:

Modifying Bin Count in a Histogram
Modifying Bin Count in a Histogram

In the following section, you’ll learn how to add a kernel density estimate to the visualization.

Adding a Kernel Density Estimate in Seaborn Histograms

Seaborn allows you to easily draw a kernel density estimate on top of a histogram. This allows you to get a sense of how the data are distributed, which can be helpful for more complicated histograms.

In order to draw a kernel density estimate onto Seaborn histograms, you can set the kde= parameter to True. By default the argument is set to False, meaning that the estimate isn’t drawn.

# Adding a Kernel Density Estimate to the Histogram
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', kde=True)
plt.show()

Let’s see what Seaborn returns when we ask it to draw the kernel density estimate:

Adding a Kernel Density Estimate to the Seaborn Histogram
Adding a Kernel Density Estimate to the Seaborn Histogram

In the following section, you’ll learn how to add additional data using color to Seaborn histograms.

Adding Additional Data with Color in Seaborn Histograms

We can add additional detail to a Seaborn histogram by plotting another variable using color. While the underlying distribution won’t change, the detail that is represented will give you greater insight into the data.

In order to add an additional variable using color, you can pass the column label of the column into the hue= parameter.

# Adding Additional Detail with Hue
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', bins=10, hue='cut')
plt.show()

In the code block above, we added the 'cut' column to our visualization. This returns the visualization below.

Adding Additional Variables by Adding Color
Adding Additional Variables by Adding Color

We can see that while the data elements have been added, the elements are overlapping. This can be very difficult to discern. Let’s see how we can modify this approach in the following section.

Stacking Additional Colors in Seaborn Histograms

In order to stack data points split up by color, we can use the multiple='stack' argument. Rather than creating overlapping bars, this will instruct Seaborn to stack overlapping values on top of one another.

Let’s see what this looks like:

# Stacking Additional Detail with Hue
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', bins=10, hue='cut', multiple='stack')
plt.show()

By stacking each category, Seaborn’s histogram will look following a similar spread to the one without adding hue=. However, the distribution will be split across the different categories, as shown below:

Stacking Additional Variables Using Color
Stacking Additional Variables Using Color

In the following section, you’ll learn how to show percentages rather than counts.

Showing Percentages Rather than Counts in Seaborn Histograms

By default, Seaborn histograms will show counts in the y-axis. However, we can modify this behavior to show percentages instead. In order to show percentages rather than counts in Seaborn histograms, we can pass in stat='percent'.

Let’s see what this code looks like:

# Showing Percentages Rather Than Counts
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', bins=10, stat='percent')
plt.show()

In the code block above, we modified the behavior to calculate percentages rather than counts, returning the image below:

Show Percentages Rather Than Counts in a Seaborn Histogram
Show Percentages Rather Than Counts in a Seaborn Histogram

Seaborn allows you to calculate a number of different statistics. You can use the following arguments in the stat= parameter:

  • count: show the number of observations in each bin
  • frequency: show the number of observations divided by the bin width
  • probability or proportion: normalize such that bar heights sum to 1
  • percent: normalize such that bar heights sum to 100
  • density: normalize such that the total area of the histogram equals 1

In the following section, you’ll learn how to use a log scale in a histogram.

Using a Log Scale in a Seaborn Histogram

In some datasets, it will make more sense to plot values on a log scale. This allows you to better understand the distribution without needing to create very stretched visualizations.

In order to plot a histogram using the log scale, you can pass in log_scale=True. This will modify the scale of the axis to be along a log scale instead:

# Using a Log Scale Rather Than an Absolute Scale
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', log_scale=True)
plt.show()

In the code block above, we added instructions to plot the data on a log scale instead. This returned the image below:

Using a Log Scale in a Seaborn Histogram
Using a Log Scale in a Seaborn Histogram

We can see that this returned a tidy visualization on a log scale.

Create a Cumulative Histogram in Seaborn

So far, you have learned how to create histograms that show the distribution at each bin. However, we can also create a cumulative distribution, which will show the distribution cumulatively.

In order to create a cumulative histogram in Seaborn, you can pass in cumulative=True into the sns.histogram() function. This will continue to cumulative add values into the histogram.

# Showing Cumulative Distributions
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='price', cumulative=True)
plt.show()

By creating cumulative histograms, you’re able to more easily see how a distribution changes over the course of the bins. For example, you’re able to easily see large jumps in data, in a more intuitive way, as shown below:

Showing the Cumulative Distribution in a Seaborn Histogram
Showing the Cumulative Distribution in a Seaborn Histogram

Seaborn can also plot two continuous variables into a histogram. Let’s take a look at what this looks like in the following section.

Creating a Seaborn Heat Map Histogram

By plotting two continuous variables in the Seaborn histplot() function, Seaborn will create a heat map, rather than a traditional histogram. In order to accomplish this, we can pass two column labels into both the x= and y= parameters.

Let’s see what this looks like:

# Showing 2 Continuous Values Creates a Heat Map
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='carat', y='price')
plt.show()

In the code block above, we passed the 'carat' column into the x= parameter and the 'price' column into the y= parameter. This returned the heatmap below:

Showing Two Continuous Variables in a Seaborn Histogram
Showing Two Continuous Variables in a Seaborn Histogram

In the visualization above, the darker elements are more frequent, while lighter elements are less frequent. This allows you to see a histogram on two dimensions, giving you a sense of where the clustering of two variables happens.

We can also extend this by adding another variable using the color semantic. In order to do this, we can use the hue= parameter.

# Showing 2 Continuous Values Creates a Heat Map with Color
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data=df, x='carat', y='price', hue='cut')
plt.show()

By doing this, we’ve added a categorical variable. This means that individual categories in the 'cut' column are shown as separate colors. This allows you to see where clustering of different cut types fall.

Adding Color to a Seaborn Histogram Heat Map
Adding Color to a Seaborn Histogram Heat Map

At this point, we’ve added a lot of information to our data visualization. Keep in mind that a data visualization is only as effective as its ability to actually communicate data.

Conclusion

In this guide, you learned how to use the Seaborn histplot() function to create informative histograms in Seaborn. Histograms allow you to get a strong understanding of the distribution of data. While similar to the Seaborn countplot, they provide significant flexibility in terms of customizing your data visualization.

You first learned what the Seaborn histplot function offers in terms of parameters and default arguments. Then, you learned how to create simple histograms. From there, you built on what you learned to create more complex and informative histograms by adding colors, changing scales, and more. Finally, you learned how to create two-dimensional histograms in the form of heatmaps.

Additional Resources

To learn more about related topics, check out the resources below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *