In this tutorial, you’ll learn how to use the Pandas describe function to create descriptive statistics. One of the beautiful things about Python is the ease with which you can generate useful information from a given data set. In this example, we’ll use Pandas to generate some high-level descriptive statistics.
By the end of this tutorial, you’ll have learned:
- How the Pandas describe method works
- How to generate descriptive statistics with Pandas
Loading a Sample Pandas DataFrame
Let’s import Pandas and assign it the alias pd
as is convention.
import pandas as pd
Once these are imported, we can generate a simple DataFrame that we can later use for analysis. First, we’ll create a dictionary:
my_dict = {
'name' : ["Kevin", "George", "Jane", "Mel", "Thomas","Erica", "Lisa"],
'age' : [20,27, 35, 55, 18, 21, 35],
'salary': [30000, 45000, 55000, 78000, 28000, 32000, 70000],
'gender': ['m', 'm', 'f', 'f', 'm', 'f', 'f']
}
Using Pandas, we can turn this regular Python dictionary into a dataframe, using the DataFrame object.
data = pd.DataFrame(my_dict)
To make sure that the conversion happened correctly, we can use the .head function to print out some data.
print(data.head())
This prints out the first five records of our data, which are printed below:
name | age | salary | gender | |
---|---|---|---|---|
0 | Kevin | 20 | 30000 | m |
1 | George | 27 | 45000 | m |
2 | Jane | 35 | 55000 | f |
3 | Mel | 55 | 78000 | f |
4 | Thomas | 18 | 28000 | m |
Generating Simple Descriptive Statistics with Pandas
While Pandas provides functions to return descriptive statistics individually on the median, the max, and the min (among others), we can use the .describe function to easily print out key descriptive statistics.
print(data.describe())
This will return the output below!
Note that descriptive statistics are only displayed for numeric data types.
age | salary | |
---|---|---|
count | 7.000000 | 7.000000 |
mean | 30.142857 | 48285.714286 |
std | 12.966991 | 20089.087301 |
min | 18.000000 | 28000.000000 |
25% | 20.500000 | 31000.000000 |
50% | 27.000000 | 45000.000000 |
75% | 35.000000 | 62500.000000 |
max | 55.000000 | 78000.000000 |
Pandas makes it very easy to generate simple descriptive statistics of whatever dataset that you are working with.
Additional Resources
- Summarizing and Analyzing a Pandas DataFrame
- Introduction to Pandas for Data Science
- Indexing, Selecting, and Assigning Data in Pandas
- You can learn more about the describe function by checking out the official documentation here.