Skip to content

How to Calculate Percentiles in NumPy with np.percentile

How to Calculate Percentiles in NumPy with np percentile Cover Image

In this tutorial, you’ll learn how to calculate percentiles in NumPy using the np.percentile() function. A percentile is a measure that indicates the value below which a percentage of observations in a group fall. For example, the 50th percentile will return the value from which half of the values are under. Knowing how to calculate percentiles in NumPy allows you to better approach statistics and machine learning projects.

By the end of this tutorial, you’ll have learned:

  • How the np.percentile() works
  • How to calculate percentiles of NumPy arrays, including in multiple dimensions
  • Practical examples of the np.percentile() function

Understanding the np.percentile() Function

Before we dive into how to use the np.percentile() function, it’s crucial to understand how the function works. Let’s take a look at what the function looks like and break down the different parameters that it has to offer:

# Understanding the np.percentile() Function
np.percentile(
   a, 
   q, 
   axis=None, 
   out=None, 
   overwrite_input=False, 
   method='linear', 
   keepdims=False, 
   *, 
   interpolation=None
)

You can see from the code block above that the function offers a large number of parameters. The table below breaks down these parameters, explaining their default and expected values, as well as their behavior:

ParameterDefault ValueAccepted ValuesDescription
a=N/A{int, tuple of int, None}The input array or an object that can be converted to an array
q=N/Afloat or array-like of floatThe percentile of a sequence of percentiles, between 0-100 inclusive
axis=NoneThe axis or axes along which to compute
out=NoneAlternative output array in which to place the result. 
overwrite_input=FalseIf True, then allow the input array a to be modified by intermediate calculations, to save memory.
method='linear'See below. Formerly known as interpolation=Specifies the method to use for estimating the percentile.
keepdims=FalseIf this is set to True, the axes which are reduced are left in the result as dimensions with size one. 
The parameters of the np.percentile() function

There are many different parameters available that allow you to customize the function. However, the most important parameters are the a and q parameters, which allow you to pass in the array and percentile you want to calculate.

The options available for the method= parameter are listed below:

  • ‘lower’
  • ‘higher’,
  • ‘midpoint’
  • ‘nearest’
  • ‘inverted_cdf’
  • ‘averaged_inverted_cdf’
  • ‘closest_observation’
  • ‘interpolated_inverted_cdf’
  • ‘hazen’
  • ‘weibull’
  • ‘linear’ (default)
  • ‘median_unbiased’
  • ‘normal_unbiased’

Now that you have a strong understanding of what the function can do, let’s start looking at some practical examples of how to use the function.

Calculate a Percentile with np.percentile() on a 1-D NumPy Array

In this section, you’ll learn how to use the np.percentile() function to calculate a percentile of a NumPy array. We’ll first take a look at passing in a single value into the q= parameter, followed by looking at how to pass in multiple percentile values.

By passing in a single value into the q= parameter into a we can return the value for which a= values fall under. Let’s take a look at what this means by using a very straightforward example:

# Understanding the np.percentile() Function
import numpy as np

arr = np.arange(11)
perc = np.percentile(arr, 50)

print(perc)

# Returns: 5.0

In the example above, we first create an array, arr, that contains the values from 0-10. We then calculate the 50th percentile of the array. We used a simple example containing the values from 0-10 to make the calculation clear. Let’s take a look at calculating the 25th pecentile:

# Calculating the 25th Percentile using the np.percentile() Function
import numpy as np

arr = np.arange(11)
perc = np.percentile(arr, 25)

print(perc)

# Returns: 2.5

We can see in the example above that the function returns a value that doesn’t exist in the original array. This is because the 25th percentile falls between the values of 2 and 3 and the default argument for method= is set to 'linear'. This means it uses linear interpolation between those two values to calculate the value.

Let’s modify the value to return the 'lower' value of the two:

# Returning the Lower Value when Calculating a Percentile
import numpy as np

arr = np.arange(11)
perc = np.percentile(arr, 25, method='lower')

print(perc)

# Returns: 2

By using the method='lower' argument, we’re able to ensure that the function returns a value that exists in the array.

Calculating Multiple Percentiles in a 1-D NumPy Array

In this section, we’ll take a look at calculating multiple percentiles using the np.percentile() function. This allows you to easily process multiple percentiles returning as many values as we pass in. Let’s take a look at how we can replicate our earlier examples of calculating the 25th and 50th percentiles of an array:

# Calculating Multiple Percentiles with np.percentile()
import numpy as np

arr = np.arange(11)
perc = np.percentile(arr, [25, 50], method='lower')

print(perc)

# Returns: [2 5]

In the example above, we return the 25th and 50th percentiles of an array.

Calculate a Percentile with np.percentile() on a 2-D NumPy Array

In this section, you’ll learn how to calculate the percentile of a 2-dimensional array. There are different ways of making this work, using the axis= parameter.

Let’s create a 3×3 array as shown below:

# Creating a 3x3 NumPy Array
import numpy as np

arr = np.arange(9).reshape(3,3)
print(arr)

# Returns:
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]

Let’s take a look at when we leave the axis= parameter set to its default value, None.

# Finding the 50th Percentile for a 2-D Array
import numpy as np

arr = np.arange(9).reshape(3,3)
perc = np.percentile(arr, 50)

print(perc)

# Returns: 4.0

In the example above, we calculated the 50th percentile. What’s interesting is that the percentile is returned across all of the values, regardless of which dimension it’s in.

In some cases this behavior is not what you’re hoping for, you can customize the behavior using the axis= parameter. Let’s see how we can calculate the percentile across the 0th axis, which calculates the percentile across the “columns” of the array:

# Calculate the Percentile Across "Columns"
import numpy as np

arr = np.arange(9).reshape(3,3)
perc = np.percentile(arr, 50, axis=0)

print(perc)

# Returns: [3. 4. 5.]

Similarly, we can pass in axis=1 to calculate the percentile across the “rows” of the matrix:

# Calculate the Percentile Across "Rows"
import numpy as np

arr = np.arange(9).reshape(3,3)
perc = np.percentile(arr, 50, axis=1)

print(perc)

# Returns: [1. 4. 7.]

In the final section below we’ll take a look at a practical example of calculate the percentile of an array: calculating percentiles of student grades.

Practical Example of np.percentile()

In this section, we’ll take a look at how to use the np.percentile() function to calculate different percentiles of student grades. This can be helpful when using NumPy to evaluate performance in a classroom.

Let’s load an array of student grades and then pass in values representing student grades.

# Calculating Percentiles of Student Grades
import numpy as np

arr = np.random.randint(low=30, high=100, size=30)
perc = np.percentile(arr, [50, 80, 90])

print(perc)

# Returns: [61.  82.8 90.3]

In the example above, we loaded 30 random values ranging from 30 to 100. We then calculated the 50th, 80th, and 90th percentiles of the values. Want to learn more about generating random values? Check out this post here.

Conclusion

In this tutorial, you learned how to use the NumPy percentile() function to calculate percentiles. Being able to calculate percentiles can give you a good insight into the distribution of your data but also serves many practical purposes.

You first learned how to understand the many parameters the function offers. Then, you learned how to use the function on one-dimensional arrays, including customizing how to deal with interpolating values. Then, you learned how to work with multi-dimensional arrays, customizing the way in which percentiles are calculated by using the axis parameter.

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *