Using Pandas for Descriptive Statistics in Python

Cover Image to Learn Pandas Descriptive Statistics

One of the beautiful things about Python is the ease with which you can generate useful information from a given data set. In this example, we’ll use Pandas to generate some high-level descriptive statistics.

Importing Numpy and Pandas

Let’s import Pandas and assign it the alias pd as is convention.

import pandas as pd

Once these are imported, we can generate a simple dataframe that we can later use for analysis. First we’ll create a dictionary:

my_dict = { 
     'name' : ["Kevin", "George", "Jane", "Mel", "Thomas","Erica", "Lisa"],
     'age' : [20,27, 35, 55, 18, 21, 35],
     'salary': [30000, 45000, 55000, 78000, 28000, 32000, 70000],
     'gender': ['m', 'm', 'f', 'f', 'm', 'f', 'f']
}

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Using Pandas, we can turn this regular Python dictionary into a dataframe, using the DataFrame object.

data = pd.DataFrame(my_dict)

To make sure that the conversion happened correctly, we can use the .head function to print out some data.

print(data.head())

This prints out the first five records of our data, which are printed below:

nameagesalarygender
0Kevin2030000m
1George2745000m
2Jane3555000f
3Mel5578000f
4Thomas1828000m

Generating Simple Descriptive Statistics with Pandas

While Pandas provides functions to return descriptive statistics individually on the median, the max, and the min (among others), we can use the .describe function to easily print out key descriptive statistics.

print(data.describe())

This will return the output below!

Note that descriptive statistics are only displayed for numeric data types.

agesalary
count7.0000007.000000
mean30.14285748285.714286
std12.96699120089.087301
min18.00000028000.000000
25%20.50000031000.000000
50%27.00000045000.000000
75%35.00000062500.000000
max55.00000078000.000000

Pandas makes it very easily to generate simple descriptive statistics of whatever dataset that you are working with.

You can learn more about the describe function by checking out the official documentation here.

Cover of Introduction to Python for Data Science

Want to learn Python for Data Science? Check out my ebook!