In this tutorial, you’ll learn how to use the Pandas read_json function to read JSON strings and files into a Pandas DataFrame. JSON is a ubiquitous file format, especially when working with data from the internet, such as from APIs. Thankfully, the Pandas read_json provides a ton of functionality in terms of reading different formats of JSON strings.
Want to write a Pandas DataFrame to JSON instead? Check out this guide on the Pandas to_json method.
By the end of this tutorial, you’ll have learned the following:
- How to understand the Pandas
read_json()
function - How to read different orientations of JSON strings into Pandas DataFrames
- How to change the encoding used to read JSON strings
- And much, much more
Table of Contents
Understanding the Pandas read_json Function
Before diving into using the Pandas read_json()
function, let’s dive into exploring the different parameters and default arguments the function has to offer. As you can see from the code block below, the function provides a ton of different functionality.
# Understanding the Pandas read_json() Function
import pandas as pd
pd.read_json(path_or_buf, *, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)
While we won’t cover all of the different parameters in the function, we’ll dive into the most important ones including:
path_or_buf=
represents the path to the file or the JSON string itselforient=
indicates the format of the JSON fileencoding=
represents the encoding used to decode the datalines=
indicates reading the file as a JSON object per linenrows=
indicates how many lines to read ifnrows=
is set toTrue
Now that you have a good understanding of the parameters of the function, let’s dive into how to use the Pandas read_json()
function.
How to Read a JSON String with Pandas read_json
In order to read a JSON string in Pandas, you can simply pass the string into the pd.read_json()
function. Pandas will attempt to infer the format of the JSON object and convert it into a DataFrame, if possible.
Let’s take a look at how you can read a JSON string into a Pandas DataFrame:
# Read a JSON String Into a Pandas DataFrame
import pandas as pd
json_string = '[{"Name":"Nik","Age":33.0,"Sales":33.33},{"Name":"Kate","Age":33,"Sales":56.32},{"Name":"Isla","Age":37.0,"Sales":43.44444}]'
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the code block above, we imported Pandas and then loaded a string containing a JSON object. We then passed this string into the pd.read_json()
function. Finally, we printed the resulting DataFrame, which was successfully read.
How to Read a JSON File From the Web
Similarly, Pandas can read a JSON file (either a local file or from the internet), simply by passing the path (or URL) into the pd.read_json()
function. In the code block below, I have saved the URL to the same JSON file hosted on my Github. We can read the DataFrame by passing the URL as a string into the function, as shown below:
# Read a JSON File Into a Pandas DataFrame
import pandas as pd
df = pd.read_json('https://raw.githubusercontent.com/datagy/data/main/samplejson.json')
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the code block above, we were able to load a JSON file into a Pandas DataFrame successfully. Let’s now dive into different formats of JSON files, which can be read by using the orientation=
parameter.
Understanding JSON Orientation Types in Pandas read_json
JSON comes in many different formats, which Pandas allows you to control using the orientation=
parameter. In particular, Pandas provides the following different options: 'split', 'records', 'index', 'columns', 'values', 'table'
. Let’s explore these options to break down the different possibilities.
Understanding the Records Orientation in Pandas read_json
A common data format that you’ll encounter with JSON is the 'records'
format, which is similar to a list of dictionaries. In order to read this format of JSON you can simply pass in orientation='records'
, as shown below:
# Read a JSON String Into a Pandas DataFrame Using Records Orientation
import pandas as pd
json_string = """
[
{
"Name":"Nik",
"Age":33.0,
"Sales":33.33
},
{
"Name":"Kate",
"Age":"None",
"Sales":56.32
},
{
"Name":"Isla",
"Age":37.0,
"Sales":43.44444
}
]"""
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the code block above, we loaded data in the format of a list of dictionaries where the keys are the columns and the values are the records for each individual record.
Understanding the Index Orientation in Pandas read_json
The 'index'
data structure is represented by a dictionary where the keys are the index and the values are another dictionary of column label and value mapping. This data structure can be often found when the index of a dataset is meaningful, rather than a simple range index.
# Read a JSON String Into a Pandas DataFrame Using Index Orientation
import pandas as pd
json_string = """{
"0":{
"Name":"Nik",
"Age":33.0,
"Sales":33.33
},
"1":{
"Name":"Kate",
"Age":"None",
"Sales":56.32
},
"2":{
"Name":"Isla",
"Age":37.0,
"Sales":43.44444
}
}"""
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
This can be a fairly common structure to run into when working with data from APIs and being aware of it can be make your reading much easier.
Understanding the Columns Orientation in Pandas read_json
The 'columns'
orientation provides a format that is like a Python dictionary, where the columns are the keys. The values are also dictionaries, where the keys are the index and the values are the values. Let’s see how you can read this data format:
# Read a JSON String Into a Pandas DataFrame Using Columns Orientation
import pandas as pd
json_string = """{
"Name":{
"0":"Nik",
"1":"Kate",
"2":"Isla"
},
"Age":{
"0":33.0,
"1":"None",
"2":37.0
},
"Sales":{
"0":33.33,
"1":56.32,
"2":43.44444
}
}"""
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the following section, you’ll learn how to use the values orientation.
Understanding the Values Orientation in Pandas read_json
The 'values'
orientation is represented as a list of lists. One of the interesting things about this orientation is that it doesn’t provide column labels. Instead, we can pass in the column names directly using the columns attribute.
# Read a JSON String Into a Pandas DataFrame Using Values Orientation
import pandas as pd
json_string = """[
[
"Nik",
33.0,
33.33
],
[
"Kate",
"None",
56.32
],
[
"Isla",
37.0,
43.44444
]
]"""
df = pd.read_json(json_string)
df.columns = ['Name', 'Age', 'Sales']
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the next orientation section below, you’ll learn how to read the table orientation.
Understanding the Table Orientation in Pandas read_json
The 'table'
orientation is a fairly complex structure that provides a lot of information about how the data are structured. It includes information on the columns and data types, and then maps in the actual index and data values.
# Read a JSON String Into a Pandas DataFrame Using Table Orientation
import pandas as pd
json_string = """{
"schema":{
"fields":[
{
"name":"index",
"type":"integer"
},
{
"name":"Name",
"type":"string"
},
{
"name":"Age",
"type":"number"
},
{
"name":"Sales",
"type":"number"
}
],
"primaryKey":[
"index"
],
"pandas_version":"1.4.0"
},
"data":[
{
"index":0,
"Name":"Nik",
"Age":33.0,
"Sales":33.33
},
{
"index":1,
"Name":"Kate",
"Age":"None",
"Sales":56.32
},
{
"index":2,
"Name":"Isla",
"Age":37.0,
"Sales":43.44444
}
]
}"""
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the following section, you’ll learn how to use the 'split'
orientation.
Understanding the Split Orientation in Pandas read_json
One of the less common JSON formats is the 'split'
orientation, which breaks the data down into column labels, index values, and data values. This is demonstrated below and can be helpful when reading data from a database format:
# Read a JSON String Into a Pandas DataFrame Using Split Orientation
import pandas as pd
json_string = """
{
"columns":[
"Name",
"Age",
"Sales"
],
"index":[
0,
1,
2
],
"data":[
[
"Nik",
33.0,
33.33
],
[
"Kate",
"None",
56.32
],
[
"Isla",
37.0,
43.44444
]
]
}"""
df = pd.read_json(json_string)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
Again, this format isn’t very common, but it’s useful to know that it can be an option to read your data easily.
How to Change the Encoding When Reading JSON Strings in Pandas
In some cases, your data won’t be encoded in an inferable way. In these cases, you can pass the encoding into the encoding=
parameter. In the code block below, we specify that the encoding is the 'utf-8'
encoding:
# Read a JSON File Into a Pandas DataFrame Using Different Encoding
import pandas as pd
df = pd.read_json('https://raw.githubusercontent.com/datagy/data/main/samplejson.json', encoding='utf-8')
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the next section, you’ll learn how to read a unique JSON format, where each line is its own JSON object.
How to Read Individual Lines as JSON Objects in Pandas
In some cases, you’ll encounter JSON strings where each line is represented by its own JSON format. Rather than needing to iterate over each line, you can use the lines=True
argument. This will parse each line as its own record, where the key-value pairs are the column label and value.
# Reading JSON Split Across Lines
import pandas as pd
json_string = """
{"Name":"Nik","Age":33.0,"Sales":33.33}
{"Name":"Kate","Age":33,"Sales":56.32}
{"Name":"Isla","Age":37.0,"Sales":43.44444}"""
df = pd.read_json(json_string, lines=True)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32000
# 2 Isla 37 43.44444
In the code block above, we passed in our string and used lines=True
. Rather than needing to read all of the lines, you can even limit the number of records that are read, using the nrows=
parameter.
# Reading JSON Split Across Lines
import pandas as pd
json_string = """{"Name":"Nik","Age":33.0,"Sales":33.33}
{"Name":"Kate","Age":33.0,"Sales":56.32}
{"Name":"Isla","Age":37.0,"Sales":43.44444}"""
df = pd.read_json(json_string, lines=True, nrows=2)
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33 33.33000
# 1 Kate 33 56.32
In the code block above, we specified that we only wanted to read two lines. When the strings are large, this can be a great way to improve performance.
Conclusion
In this tutorial, you learned how to use the Pandas read_json function to read JSON strings and files into a Pandas DataFrame. You learned how to read JSON strings and JSON files. Then, you learned how to customize the function by reading different formats of JSON. Then, you learned how to customize the function further by changing the encodings and reading separate lines.
Additional Resources
To learn more about related topics, check out the resources below:
Thanks so much. May you please blog about reading large size of json file(s).
Great idea! I’ll add it to the list :).