In this post, you’ll learn how to calculate the Pandas mean (average) for one column, multiple columns, or an entire dataframe. You’ll also learn how to skip na
values or include them in your calculation.
Table of Contents
Loading a Sample Dataframe
If you want a sample dataframe to follow along with, load the sample dataframe below. The data represents people’s salaries over a period of four years:
import pandas as pd
df = pd.DataFrame.from_dict(
{
'Year': [2018, 2019, 2020, 2021],
'Carl': [1000, 2300, 1900, 3400],
'Jane': [1500, 1700, 1300, 800],
'Melissa': [800, 2300, None, 2300]
}
).set_index('Year')
print(df)
This returns the following dataframe:
Carl Jane Melissa
Year
2018 1000 1500 800.0
2019 2300 1700 2300.0
2020 1900 1300 NaN
2021 3400 800 2300.0
Pandas Mean on a Single Column
It’s very easy to calculate a mean for a single column. We can simply call the .mean()
method on a single column and it returns the mean of that column.
For example, let’s calculate the average salary Carl had over the years:
>>> carl = df['Carl'].mean()
>>> print(carl)
2150.0
We can see here that Carl’s average salary over the four years has been 2150
.
Pandas Mean on a Row
Now, say you wanted to calculate the average for a dataframe row. We can do this by simply modifying the axis=
parameter.
Let’s say we wanted to return the average for everyone’s salaries for the year 2018. We can access the 2018 row data by using .loc
(which you can learn more about by checking out my tutorial here).
YOUTUBE: https://www.youtube.com/watch?v=VIa1ETYnFuc
>>> year_2018 = df.loc[2018,:].mean()
>>> print(year_2018)
1100
Now, alternatively, you could return the mean for everyone row. You can do this by not including the row selection and modifying the axis=
parameter.
Let’s give this a shot:
row_averages = df.mean(axis=1)
print(row_averages)
This returns the following series:
Year
2018 1100.000000
2019 2100.000000
2020 1600.000000
2021 2166.666667
dtype: float64
Pandas Average on Multiple Columns
If you wanted to calculate the average of multiple columns, you can simply pass in the .mean()
method to multiple columns being selected.
In the example below, we return the average salaries for Carl and Jane. Note that you need to use double square brackets in order to properly select the data:
averages = df[['Carl', 'Jane']].mean()
print(averages)
This returns the following:
Carl 2150.0
Jane 1325.0
dtype: float64
Pandas Mean on Entire Dataframe
Finally, if you wanted to return the mean for every column in a Pandas dataframe, you can simply apply the .mean()
method to the entire dataframe.
Let’s give this a shot by writing the code below:
>>> entire_dataframe = df.mean()
>>> print(entire_dataframe)
Carl 2150.0
Jane 1325.0
Melissa 1800.0
dtype: float64
Now you’re able to calculate the mean for the entire dataframe.
Include NAs in Calculating Pandas Mean
One important thing to note is that by default, missing values will be excluded from calculating means. It thereby treats a missing value, rather than a 0.
If you wanted to calculate the mean by including missing values, you could first assign values using the Pandas .fillna()
method. Check out my tutorial here to learn more:
Let’s calculate the mean with both including and excluding the missing value in Melissa’s column:
>>> print(df['Melissa'].mean())
>>> print(df['Melissa'].fillna(0).mean())
1800.0
1350.0
Use Pandas Describe to Calculate Means
Finally, let’s use the Pandas .describe()
method to calculate the mean (as well as some other helpful statistics). To learn more about the Pandas .describe()
method, check out my tutorial here.
Let’s see how we can get the mean and some other helpful statistics:
>>> print(df.describe())
Carl Jane Melissa
count 4.000000 4.000000 3.000000
mean 2150.000000 1325.000000 1800.000000
std 994.987437 386.221008 866.025404
min 1000.000000 800.000000 800.000000
25% 1675.000000 1175.000000 1550.000000
50% 2100.000000 1400.000000 2300.000000
75% 2575.000000 1550.000000 2300.000000
max 3400.000000 1700.000000 2300.000000
If you only wanted to return the mean, you could simply use the .loc
accessor to access the data:
>>> print(df.describe().loc['mean'])
Carl 2150.0
Jane 1325.0
Melissa 1800.0
Name: mean, dtype: float64
Conclusion
In this post, you learned how to calculate the Pandas mean, using the .mean()
method. You learned how to calculate a mean based on a column, a row, multiple columns, and the entire dataframe. Additionally, you learned how to calculate the mean by including missing values.
To learn more about the Pandas .mean()
method, check out the official documentation here.