In this tutorial, you’ll learn how to convert a list of Python dictionaries into a Pandas DataFrame. Pandas provides a number of different ways in which to convert dictionaries into a DataFrame. You’ll learn how to use the Pandas from_dict method, the DataFrame constructor, and the json_normalize function.
By the end of this tutorial, you’ll have learned:
- How to convert a list of dictionaries to a Pandas DataFrame
- How to work with different sets of columns across dictionaries
- How to set an index when converting a list of dictionaries to a DataFrame
- How to convert nested dictionaries to a Pandas DataFrame
Table of Contents
Summary of Methods
The table below breaks down the different ways in which you can read a list of dictionaries to a Pandas DataFrame. Each of these are covered in-depth throughout the tutorial:
Method Name | Works with missing keys | Read only some columns | Set an index | Read nested dictionaries |
---|---|---|---|---|
DataFrame() | Yes | Yes | Yes | No |
from_dict() | Yes | Yes | Only using .set_index() | No |
from_records() | Yes | Yes | Yes | No |
json_normalize() | Yes | Yes | Yes | Yes |
Convert a List of Dictionaries to a Pandas DataFrame
In this section, you’ll learn how to convert a list of dictionaries to a Pandas DataFrame using the Pandas DataFrame
class. By passing in a list of dictionaries, you’re easily able to create a DataFrame.
Each dictionary will represent a record in the DataFrame, while the keys become the columns. Let’s take a look at an example where each dictionary contains every key:
# Converting a List of Dictionaries to a DataFrame
import pandas as pd
list_of_dicts = [
{'Name': 'Nik', 'Age': 33, 'Location': 'Toronto'},
{'Name': 'Kate', 'Age': 32, 'Location': 'London'},
{'Name': 'Evan', 'Age': 36, 'Location': 'London'}]
df = pd.DataFrame(list_of_dicts)
print(df)
# Returns:
# Name Age Location
# 0 Nik 33 Toronto
# 1 Kate 32 London
# 2 Evan 36 London
Because each dictionary in the list contains the same keys, we’re able to use a number of different methods to accomplish this. The other following methods would also work:
# These methods all produce the same result
df = pd.DataFrame(list_of_dicts)
df = pd.DataFrame.from_dict(list_of_dicts)
df = pd.DataFrame.from_records(list_of_dicts)
Working with Missing Keys When Converting a List of Dictionaries to a Pandas DataFrame
Let’s now take a look at a more complex example. In the example below, we’ll provide dictionaries where one dictionary will be missing a key. Let’s use the .from_dict()
method to read the list to see how the data will be read:
# Reading Dictionaries with Missing Keys
import pandas as pd
list_of_dicts = [{'Name': 'Nik', 'Age': 33, 'Location': 'Toronto'},
{'Name': 'Kate', 'Age': 32, 'Location': 'London'},
{'Name': 'Evan', 'Age': 36}]
df = pd.DataFrame.from_dict(list_of_dicts)
print(df)
# Returns:
# Name Age Location
# 0 Nik 33 Toronto
# 1 Kate 32 London
# 2 Evan 36 NaN
This method returns the same version, even if you were to use the pd.DataFrame()
constructor, the .from_dict()
method, or the .from_records()
method. Any dictionary that is missing a key will return a missing value, NaN
.
Reading Only Some Columns When Converting a List of Dictionaries to a Pandas DataFrame
There may be many times when you want to read dictionaries into a Pandas DataFrame, but only want to read a subset of the columns. In this case, you can use the columns=
parameter. Note that this parameter is only available in the pd.DataFrame()
constructor and the pd.DataFrame.from_records()
method. Using this parameter in the pd.DataFrame.from_dict()
method will raise a ValueError
.
Let’s load the same list of dictionaries but only read two of the columns:
# Reading only a subset of columns
import pandas as pd
list_of_dicts = [{'Name': 'Nik', 'Age': 33, 'Location': 'Toronto'},
{'Name': 'Kate', 'Age': 32, 'Location': 'London'},
{'Name': 'Evan', 'Age': 36}]
df = pd.DataFrame.from_records(list_of_dicts, columns=['Name', 'Age'])
# Same as: df = pd.DataFrame(list_of_dicts, columns=['Name', 'Age'])
print(df)
# Returns:
# Name Age
# 0 Nik 33
# 1 Kate 32
# 2 Evan 36
Setting an Index When Converting a List of Dictionaries to a Pandas DataFrame
There are two different types of indices you may want to set when creating a DataFrame:
- A DataFrame index that is not part of the data you’re reading (such as 1, 2, 3), or
- A DataFrame index from the data that you’re reading (such as one of the columns)
Let’s take a look at the first use case. For this, we can only rely on the pd.DataFrame()
constructor and the pd.DataFrame.from_records()
method. To pass in an arbitrary index, we can use the index=
parameter to pass in a list of values.
Let’s see how this is done in Pandas:
# Setting an index when reading a list of dictionaries
import pandas as pd
list_of_dicts = [{'Name': 'Nik', 'Age': 33, 'Location': 'Toronto'},
{'Name': 'Kate', 'Age': 32, 'Location': 'London'},
{'Name': 'Evan', 'Age': 36, 'Location': 'New York'}]
df = pd.DataFrame.from_records(list_of_dicts, index=['Employee_001', 'Employee_002', 'Employee_003'])
# Same as: df = pd.DataFrame(list_of_dicts, index=['Employee_001', 'Employee_002', 'Employee_003'])
print(df)
# Returns:
# Name Age Location
# Employee_001 Nik 33 Toronto
# Employee_002 Kate 32 London
# Employee_003 Evan 36 New York
In order to read a list of dictionaries and set an index based on one of the keys, we can use any of the three methods covered above. While Pandas doesn’t directly provide a parameter to do this, we can use the .set_index()
method to accomplish this.
Let’s read our data and use the 'Name'
column as the index:
# Setting a column as an index
import pandas as pd
list_of_dicts = [{'Name': 'Nik', 'Age': 33, 'Location': 'Toronto'},
{'Name': 'Kate', 'Age': 32, 'Location': 'London'},
{'Name': 'Evan', 'Age': 36, 'Location': 'New York'}]
df = pd.DataFrame(list_of_dicts).set_index('Name')
# Same as: df = pd.DataFrame.from_dict(list_of_dicts).set_index('Name')
# Same as: df = pd.DataFrame.from_records(list_of_dicts).set_index('Name')
print(df)
# Returns:
# Age Location
# Name
# Nik 33 Toronto
# Kate 32 London
# Evan 36 New York
In the final section, you’ll learn how to use the json_normalize()
function to read a list of nested dictionaries to a Pandas DataFrame.
json_normalize: Reading Nested Dictionaries to a Pandas DataFrame
When loading data from different sources, such as web APIs, you may get a list of nested dictionaries returned to you. When reading these lists of dictionaries using the methods shown above, the nested dictionaries will simply be returned as dictionaries in a column.
However, in many cases, you’ll want each of these fields to return its own column. For this, we can use the pd.json_normalize()
function.
Let’s take a look at an example where our list’s dictionaries are nested and use the json_normalize function to convert it to a DataFrame:
# Convert a List of Nested Dictionaries to a DataFrame
import pandas as pd
list_of_dicts = [
{'Name': 'Nik', 'Age': 33, 'Location': {'City': 'Toronto', 'Country': 'Canada'}},
{'Name': 'Kate', 'Age': 32, 'Location': {'City': 'London', 'Country': 'UK'}},
{'Name': 'Evan', 'Age': 36, 'Location': {'City': 'New York', 'Country': 'USA'}}
]
df = pd.json_normalize(list_of_dicts)
print(df)
# Returns:
# Name Age Location.City Location.Country
# 0 Nik 33 Toronto Canada
# 1 Kate 32 London UK
# 2 Evan 36 New York USA
Conclusion
In this tutorial, you learned how to read a list of dictionaries to a Pandas DataFrame. You learned how to use four different ways to accomplish this. You also learned how to read only a subset of columns, deal with missing data, and how to set an index. Finally, you learned how to read a list of nested dictionaries to your Pandas DataFrame.
Additional Resources
To learn more about related topics, check out the tutorials below: