Skip to content

NumPy Histogram: Understanding the np.histogram Function

NumPy Histogram Understanding the np.histogram Function Cover Image

In this tutorial, you’ll learn how to use the NumPy histogram function to calculate a histogram of a given dataset. A histogram shows the frequency of numerical data in bins of grouped ranges. By using NumPy to calculate histograms, you can easily calculate and access the frequencies (relative or absolute) of different values.

By the end of this tutorial, you’ll have learned:

  • How the NumPy histogram function works
  • How to customize the number and range of bins in the resulting histogram
  • How to return either absolute values or the probability density function of the bin

If you want to learn how to check if a distribution is normal, check out my guide on using Python to test for normality.

Understanding the NumPy Histogram Function

In this section, you’ll learn about the np.histogram() function and the various parameters and default arguments the function provides. The function has six different parameters, one of which is required. Let’s take a look at what the function looks like:

# Understanding the np.histogram() Function
import numpy as np
np.histogram(a, bins=10, range=None, normed=None, weights=None, density=None)

We can see that the function provides a number of different parameters. The table below breaks down the parameters and their default arguments:

ParameterDescriptionDefault ArgumentAccepted Values
a=The input data, where the histogram is calculated over the flattened arrayN/Aarray-like
bins=The number of equal-width bins or the ranges to use as bins10int or sequence
range=The lower and upper range of the binsNone(float, float)
normed=Equivalent to the density argument (deprecated since 1.6.0). Will produce incorrect results if bins are unequal.Nonebool
weights=An array of weights of the same shape as a, where each value contributes only its associated weightNonearray-like
density=If False, returns the number of samples in each bin. If True, returns the probability density function at the bin.Nonebool
The parameters and default arguments of the np.histogram() function

Now that you have a strong understanding of how the function works, let’s take a look at how it can be used.

Creating a Histogram with NumPy in Python

In this section, you’ll learn how to create a basic histogram with the NumPy histogram function. In order to do this, let’s create an array of random values between 0 and 100, using the np.random.randint() function:

# Generating a NumPy Array
import numpy as np
np.random.seed(100)

arr = np.array(np.random.randint(0, 101, 100))
print(arr)

# Returns:
# [  8  24  67  87  79  48  10  94  52  98  53  66  98  14  34  24  15 100
#   60  58  16   9  93  86   2  27   4  31   1  13  83 100   4  91  59  67
#    7  49  47  65  61  14  55  71  80   2  94  19  98  63  53  27  56  30
#   48  47  39  38  44  18  64  56  34  53  74  17  72  13  30  17  53  68
#   50  91  91  83  53  78   0  13  57  76   3  70   3  84  79  10  87  60
#    3  48  52  43  36   5  71  38  86  94]

We generated an array after creating a seed. Using the np.random.seed() function allows us to generate reproducible results. Now that we have our array, let’s pass this into the np.histogram() function with its default arguments. This will allow us to better understand how the function works:

# Creating a Histogram with np.histogram()
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

print(np.histogram(arr))

# Returns:
# (array([13, 13,  4,  9,  8, 14, 10,  9,  8, 12]), array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.]))

Let’s break down what the code above is doing:

  1. We load our array in the same way as we did previously. Note that our values contain the minimum value of 0 and the maximum value of 100.
  2. We then pass this array into the np.histogram() function and print the results

The function returns two arrays: (1) the number of values falling into the bin and (2) the bin edges. The bin edges are all half-open, except for the last pair. This means that the first bin goes from 0 inclusive up to 10 exclusive, and so on.

From the results, we can see that 13 values fall into the first bin, meaning that 13 values are between [0, 10). Because the default argument for the function is bins=10, the bins are the range of the minimum value (0) and the maximum value (100) divided by 10.

Since the function returns two values, we can assign both of the results to their own variables, as shown below:

# Returning Values and Bins with np.histogram()
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

hist, bin_edges = np.histogram(arr)

print(f'{hist=}')
print(f'{bin_edges=}')

# Returns:
# hist=array([13, 13,  4,  9,  8, 14, 10,  9,  8, 12])
# bin_edges=array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In the code above, we used Python f-strings to print the variables neatly (this function is available only in Python 3.8+). Now that you’ve seen what the function produces with its default arguments, let’s see how you can customize the function by modifying the bins= parameter.

Customizing the Bins in NumPy Histograms

In this section, you’ll learn how to customize the bins generated in the NumPy histograms. By default, the NumPy histogram function will pass in bins=10. This means that NumPy will split the range of values into ten equal-sized buckets.

Customizing the Number of Bins in NumPy Histograms

We can modify the number of bins in a NumPy histogram by passing an integer into the bins= argument. As mentioned earlier, NumPy will generate 10 bins by default. Let’s see how we can modify the function to generate five bins, instead of ten:

# Customizing the Number of Bins in NumPy Histograms
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

hist, bin_edges = np.histogram(arr, bins=5)

print(f'{hist=}')
print(f'{bin_edges=}')

# Returns:
# hist=array([26, 13, 22, 19, 20])
# bin_edges=array([  0.,  20.,  40.,  60.,  80., 100.])

In the following section, you’ll learn how to customize the ranges of bins.

Customizing the Ranges of Bins in NumPy Histograms

The NumPy histogram function also allows you to manually define the edges of the bins. The benefit of this is that it allows you to customize unevenly sized bins. This can be particularly helpful if you’re working with categorical data, such as age groups.

Let’s see how we can define some logical bins for our NumPy histogram, that emulates age groups:

# Customizing the Bins of NumPy Histograms
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

hist, bin_edges = np.histogram(arr, bins=[arr.min(), 18, 65, arr.max()])

print(f'{hist=}')
print(f'{bin_edges=}')

# Returns:
# hist=array([24, 42, 34])
# bin_edges=array([  0,  18,  65, 100])

NumPy will define the edges as left inclusive and right exclusive. This means that the left edge will be included and all values up to (but not including) the right edge will be as well.

Returning a Probability Density Function with NumPy Histograms

NumPy also allows us to return the probability density function of the histogram. This means that the values are normalized in such a way that their integral adds up to 1. This, effectively, shows the proportion of values that fall into each group.

Let’s see how we can return the probability density function in NumPy histograms:

# Calculating a Probability Density Function with NumPy Histograms
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

hist, bin_edges = np.histogram(arr, density=True)

print(f'{hist=}')
print(f'{bin_edges=}')

# Returns:
# hist=array([0.013, 0.013, 0.004, 0.009, 0.008, 0.014, 0.01 , 0.009, 0.008,     0.012])
# bin_edges=array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In the following section, you’ll learn how to modify the range of values that a NumPy histogram covers.

Modifying the Range of Values with NumPy Histograms

By default, NumPy will include the entire range of values in the histograms generated by the np.histogram() function. You can override this behavior by assigning a tuple of floats to the range= parameter.

Let’s see how we can modify the function’s behavior to only show values between 0 and 50:

# Modifying the Range of Values in NumPy Histograms
import numpy as np
np.random.seed(100)
arr = np.array(np.random.randint(0, 101, 100))

hist, bin_edges = np.histogram(arr, range=(0.0, 50.0))

print(f'{hist=}')
print(f'{bin_edges=}')

# Returns:
# hist=array([9, 4, 7, 6, 2, 2, 5, 4, 2, 7])
# bin_edges=array([ 0.,  5., 10., 15., 20., 25., 30., 35., 40., 45., 50.])

Conclusion

In this tutorial, you learned how to use the np.histogram() to generate histograms in NumPy. You first learned how the function works by understanding its parameters and default arguments. Then, you learned how to use the function to create histograms. Following that, you learned how to customize the number and ranges of bins. You also learned how to calculate the probability density function and how to modify the overall range of the values.

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *