How to Sort Data in a Pandas Dataframe (with Examples)

  • by
Pandas sort_values function cover image
  • Save

Sorting data is an essential method to better understand your data. In this post, you’ll learn how to sort data in a Pandas dataframe using the Pandas .sort_values() function, in ascending and descending order, as well as sorting by multiple columns. Specifically, you’ll learn how to use the by=, ascending=, inplace=, and na_position= parameters.

Video Tutorial

Table of Contents

Loading the dataset and required libraries

To start, let’s load the Pandas library and a dataset created for this tutorial. We’ll print out the first five rows, using the .head() method and take a quick look at the dataset:

import pandas as pd
df = pd.read_excel('https://github.com/datagy/mediumdata/raw/master/sort_values.xlsx')
print(df.head())

In the code above, you first imported the Pandas library, then used the .read_excel() method to load a dataset. Finally, you printed the first five rows of the dataset using the .head() method.

Let’s take a quick look at what the dataset looks like:

 	Date 	        Name 	Score
0 	2020-07-01 	John 	10
1 	2020-07-01 	Jane 	35
2 	2020-07-01 	Matt 	23
3 	2020-07-02 	John 	19
4 	2020-07-02 	Jane 	38

The dataset contains three columns: (1) Date, (2), Name, and (3) Score. Specifically, these columns are made up of datetime, string, and integer datatypes, meaning we have a large variety of sorting options!

Exploring the Pandas Sort_Values() Function

Now that you’ve loaded the Pandas library and assigned a dataset to the dataframe df, let’s take a look at some of the key parameters available in the Pandas .sort_values() function:

DataFrame.sort_values(by, ascending=True, inplace=False, na_position='last')

The .sort_value() function is applied directly to a DataFrame object and take more arguments than listed above, but these are the key ones found in most applications.

Let’s a take a deeper look at these:

  • The by= argument identifies, which column or columns to use to sort your data,
  • The ascending= argument defaults to True and setting it to False will sort your data in descending order,
  • The inplace= argument will modify the DataFrame object when set to False, without having to reassign it,
  • The na_position= defaults to ‘last’, but changing to ‘first’ will place those values first in sorting order.

Now let’s dive into actually sorting your data.

Sort Data in a Pandas Dataframe Column

The most important parameter in the .sort_values() function is the by= parameter, as it tells Pandas which column(s) to sort by. Let’s say you wanted to sort the DataFrame df you created earlier in the tutorial by the Name column. You could then write:

df.sort_values(by='Name')

Here, you’ve applied the .sort_values() method to the DataFrame object, df. You’ve also applied the by='Name' parameter and argument.

This returns the following printout, which I’ve truncated to five records to save space:

 	Date 	        Name 	Score
1 	2020-07-01 	Jane 	35
4 	2020-07-02 	Jane 	38
7 	2020-07-03 	Jane 	34
0 	2020-07-01 	John 	10
3 	2020-07-02 	John 	19

With this, you’ve sorted your dataset by the Name column in ascending order.

Sort Data in Multiple Pandas Dataframe Columns

In the example above, you sorted your dataframe by a single column. You can sort your data by multiple columns by passing in a list of column items into the by= parameter. Let’s try this again by sorting by both the Name and Score columns:

df.sort_values(by=['Name', 'Score'])

Again, let’s take a look at what this looks like when it’s returned:

        Date 	        Name 	Score
7 	2020-07-03 	Jane 	34
1 	2020-07-01 	Jane 	35
4 	2020-07-02 	Jane 	38
0 	2020-07-01 	John 	10
3 	2020-07-02 	John 	19

You can see here that the dataframe is first sorted by the Name column (meaning Jane precedes John, and John precedes Matt), then for each unique item in the Name column, the values in the Score column are further sorted in ascending order.

Change Sort Order in Pandas

By default, the .sort_values() method will sort values in ascending order – but you may wish to change the sort order to descending.

Let’s take a look at how to do this. We’ll sort the dataframe again first by the Name and Score columns, but this time add in the ascending=False argument:

df.sort_values(by=['Name', 'Score'], ascending=False)

Here, you’re sorting the data by the Name and Score columns, but in descending order:

        Date 	        Name 	Score
2 	2020-07-01 	Matt 	23
8 	2020-07-03 	Matt 	8
6 	2020-07-03 	John 	36
3 	2020-07-02 	John 	19
0 	2020-07-01 	John 	10

This is really handy, but say you wanted to sort columns in different orders. To do this, you would simply pass a list of orders into the ascending= argument.

Let’s try this by sorting the Name column in ascending order and Score column in descending order:

df.sort_values(by=['Name', 'Score'], ascending=[True, False])

This returns the following dataframe, with the Name column sorted in ascending order and the Score column sorted in descending order:

        Date 	        Name 	Score
4 	2020-07-02 	Jane 	38
1 	2020-07-01 	Jane 	35
7 	2020-07-03 	Jane 	34
6 	2020-07-03 	John 	36
3 	2020-07-02 	John 	19

Now let’s take a look at how to change the sort order of missing values.

Pandas Sort_Values Na_Position Parameter

By default, Pandas will sort any missing values to the last position. In order to change this behavior, you can use the na_position='first' argument.

Let’s try this out by sorting the Name column and placing missing values first:

df.sort_values(by='Name', na_position='first')

By applying this code, you’re generating the following dataframe:

 	Date 	        Name 	Score
5 	2020-07-02 	NaN 	34
1 	2020-07-01 	Jane 	35
4 	2020-07-02 	Jane 	38
7 	2020-07-03 	Jane 	34
0 	2020-07-01 	John 	10

Finally, let’s see how to apply the change in sort order in place.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Changing Sort Order In Place in Pandas Sort_Values

All of the examples you’ve learned above haven’t actually been applied to the dataframe itself, meaning that the dataframe object hasn’t actually been modified.

You could reassign the dataframe (such as, to itself), or you can modify the dataframe directly by using the inplace= argument.

Let’s change the sort order and apply the changes in place:

df.sort_values(by='Name', inplace=True)

This has now modified the dataframe, meaning that if you now print the head of the dataframe using the .head() method, you’d receive the following:

print(df.head())
Date 	Name 	Score
1 	2020-07-01 	Jane 	35
4 	2020-07-02 	Jane 	38
7 	2020-07-03 	Jane 	34
0 	2020-07-01 	John 	10
3 	2020-07-02 	John 	19

Conclusion

In this post, you learned how to use the Pandas sort_values() function to sort data in a Pandas dataframe. Specifically, you learned how to sort by a single or by multiple columns, how to change the sort order, how to place missing values at the tail or the head, and how to change the sort order in place.

To learn more about the function, check out the official documentation here.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!