In this tutorial, you’ll learn how to convert a Pandas DataFrame to a JSON object and file using Python. Most programming languages can read, parse, and work with JSON. Because of this, knowing how to convert a Pandas DataFrame to JSON is an important skill.
Pandas provides a lot of flexibility when converting a DataFrame to a JSON file. This guide dives into the functionality with practical examples. By the end of this tutorial, you’ll have learned:
- How to convert a Pandas DataFrame to a JSON string or file
- How to customize formats for missing data and floats
- How to customize the structure of the resulting JSON file
- How to compress a JSON file when converting a Pandas DataFrame
Table of Contents
Understanding the Pandas to_json Method
To convert a Pandas DataFrame to a JSON string or file, you can use the .to_json()
method. Let’s start by exploring the method and what parameters it has available. The method provides a lot of flexibility in how to structure the JSON file.
# Understanding the Pandas .to_json() Method
import pandas as pd
df = pd.DataFrame()
df.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=None, storage_options=None)
As you can see from the code block above, there are a large number of parameters available in the method. In fact, the method provides default arguments for all parameters, meaning that you can call the method without requiring any further instruction.
The table breaks down the arguments and their default arguments of the .to_json()
method:
Parameter | Description | Default Argument | Accepted Values |
---|---|---|---|
path_or_buf= | The string or path object to write the JSON to. If None , the result is returned as a string. | None | string, path object, or None |
orient= | How to format the JSON string. | None | string |
date_format= | The type of date conversion. | None | None, epoch, iso |
double_precision= | The number of decimal places to use when encoding floating point values. | 10 | int |
force_ascii= | Whether to force encoded strings to be ASCII. | True | Bool |
date_unit= | The time unit to encode to. | 'ms' | string |
default_handler= | Handler to call if the object cannot otherwise be converted to a suitable format for JSON. | None | callable |
lines= | Whether to write out line-delimited JSON. | False | Bool |
compression= | For on-the-fly compression of the output data. | 'infer' | string or dict |
index= | Whether to include the index values in the JSON string. | True | Bool |
indent= | Length of the whitespace used to indent each record. | None | Integer |
storage_options= | Extra options for different storage options such as S3 storage. | None | dict |
.to_json()
method explainedNow that you have a strong understanding of the method, let’s load a sample Pandas DataFrame to follow along with.
Loading a Sample Pandas DataFrame
Let’s begin by loading a sample Pandas DataFrame that you can use to follow along with. The data will be kept deliberately simple, in order to make it simple to follow. Simply copy and paste the code below into your code editor of choice:
# Loading a Sample Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Nik', 'Kate', 'Isla'],
'Age': [33, np.NaN, 37],
'Sales': [33.33, 56.32, 43.44444]
})
print(df)
# Returns:
# Name Age Sales
# 0 Nik 33.0 33.33000
# 1 Kate NaN 56.32000
# 2 Isla 37.0 43.44444
We can see that our DataFrame has 3 columns with 3 records. One of the columns contains strings, another contains integers and missing values, and another contains floating point values. Now that we have a DataFrame loaded, let’s get started by converting the DataFrame to a JSON string.
Convert a Pandas DataFrame to a JSON String
The Pandas .to_json()
method contains default arguments for all parameters. Because of this, we can call the method without passing in any specification. By default, Pandas will use an argument of path_or_buf=None
, indicating that the DataFrame should be converted to a JSON string.
Let’s see how we can convert our Pandas DataFrame to a JSON string:
# Convert a Pandas DataFrame to a JSON String
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
json_string = df.to_json()
print(json_string)
# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.33,"1":56.32,"2":43.44444}}
We can see that by passing the .to_dict()
method with default arguments to a Pandas DataFrame, that a string representation of the JSON file is returned.
You could, of course, serialize this string to a Python dictionary. However, if you wanted to convert a Pandas DataFrame to a dictionary, you could also simply use Pandas to convert the DataFrame to a dictionary.
Convert a Pandas DataFrame to a JSON File
In order to convert a Pandas DataFrame to a JSON file, you can pass a path object or file-like object to the Pandas .to_json()
method. By passing a string representing the path to the JSON file into our method call, a file is created containing our DataFrame.
Let’s see what this looks like:
# Creating a JSON File with Our Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
df.to_json('dataframe.json')
In the following section, you’ll learn how to customize the structure of our JSON file.
Customizing the JSON Structure of a Pandas DataFrame
The Pandas .to_json()
method provides a ton of flexibility in structuring the resulting JSON file. By default, the JSON file will be structured as 'columns'
. The method provides the following options: 'split', 'records', 'index', 'columns', 'values', 'table'
. Let’s explore these options to break down the different possibilities.
Pandas DataFrame to JSON: Split Structure
By passing 'split'
into the Pandas .to_json()
method’s orient argument, you return JSON string that formats the data in the format of a dictionary that breaks out the index, columns, and data separately. This is demonstrated below and can be helpful when moving data into a database format:
# Using the 'split' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='split'))
# Returns:
# {"columns":["Name","Age","Sales"],"index":[0,1,2],"data":[["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]}
Pandas DataFrame to JSON: Records Structure
By passing 'records'
into the Pandas .to_json()
method’s orient argument, you return a JSON string that formats the data in the format of a list of dictionaries where the keys are the columns and the values are the records for each individual record.
# Using the 'records' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='records'))
# Returns:
# [{"Name":"Nik","Age":33.0,"Sales":33.33},{"Name":"Kate","Age":null,"Sales":56.32},{"Name":"Isla","Age":37.0,"Sales":43.44444}]
Pandas DataFrame to JSON: Index Structure
By passing 'index'
into the Pandas .to_json()
method’s orient argument, you return a JSON string that formats the data in the format of a dictionary that contains indices as their key and dictionaries of columns to record mappings.
# Using the 'index' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='index'))
# Returns:
# {"0":{"Name":"Nik","Age":33.0,"Sales":33.33},"1":{"Name":"Kate","Age":null,"Sales":56.32},"2":{"Name":"Isla","Age":37.0,"Sales":43.44444}}
Pandas DataFrame to JSON: Columns Structure
By passing 'columns'
into the Pandas .to_json()
method’s orient argument, you return a JSON string that formats the data in the format of a dictionary that contains the columns as keys and dictionaries of the index to record mappings.
# Using the 'columns' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='columns'))
# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.33,"1":56.32,"2":43.44444}}
Pandas DataFrame to JSON: Values Structure
By passing 'values'
into the Pandas .to_json()
method’s orient argument, you return a JSON string that formats the data in the format of only the values.
# Using the 'values' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='values'))
# Returns:
# [["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]
Pandas DataFrame to JSON: Table Structure
By passing 'table'
into the Pandas .to_json()
method’s orient argument, you return a JSON string that formats the data in the format of a schema table.
# Using the 'table' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='table'))
# Returns:
# '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"Name","type":"string"},{"name":"Age","type":"number"},{"name":"Sales","type":"number"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"Name":"Nik","Age":33.0,"Sales":33.33},{"index":1,"Name":"Kate","Age":null,"Sales":56.32},{"index":2,"Name":"Isla","Age":37.0,"Sales":43.44444}]}'
Modifying Float Values When Converting Pandas DataFrames to JSON
By default, Pandas will reduce the floating point precision to include 10 decimal places. We can customize this behavior by modifying the double_precision=
parameter of the .to_json()
method.
One of the values in our DataFrame contains a floating point value with a precision of 5. Let’s modify the behavior to include only a single point of precision:
# Modifying Floating Point Precision Values
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(double_precision=1))
# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.3,"1":56.3,"2":43.4}}
In the following section, you’ll learn how to convert a DataFrame to JSON and include the index.
Convert Pandas DataFrames to JSON and Include the Index
By default, Pandas will include the index when converting a DataFrame to a JSON object. We can modify this behavior by using the index=
parameter. This parameter can only be modified when you orient your DataFrame as 'split'
or 'table'
.
Let’s see what this looks like to drop the index when converting to JSON:
# Dropping an Index When Converting a DataFrame to JSON
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(orient='split', index=False))
# Returns:
# {"columns":["Name","Age","Sales"],"data":[["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]}
In the following section, you’ll learn how to specify compression for your resulting JSON file.
How to Compress Files When Converting Pandas DataFrames to JSON
The Pandas .to_json()
method provides significant customizability in how to compress your JSON file. By default, Pandas will attempt to infer the compression to be used based on the file extension that has been provided.
Pandas currently supports compressing your files to zip, gzip, bz2, zstd and tar compressions. Let’s see how we can compress our DataFrame to a zip compression:
# Compressing Your JSON File When Converting a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
df.to_json('DataFrame.json', compression='zip')
In the following section, you’ll learn how to modify the indent of your JSON file. When you then want to read your JSON file as a DataFrame, you’ll need to specify the type of compression used.
How to Change the Indent of a JSON File When Converting a Pandas DataFrame
Pandas also allows you to specify the indent of printing out your resulting JSON file. This is similar to pretty-printing JSON in Python. By using the indent=
parameter, you can specify an integer representing the number of indents you want to provide.
Let’s see what this looks like when we pass in a value of 4:
# Specifying the Indent of a JSON File When Converting a Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})
print(df.to_json(indent=4))
# Returns:
# {
# "Name":{
# "0":"Nik",
# "1":"Kate",
# "2":"Isla"
# },
# "Age":{
# "0":33.0,
# "1":null,
# "2":37.0
# },
# "Sales":{
# "0":33.33,
# "1":56.32,
# "2":43.44444
# }
# }
Frequently Asked Questions
The Pandas to_json() method allows you to convert a Pandas DataFrame to a JSON string or file. The method provides customization in terms of how the records should be structured, compressed, and represented.
The orient parameter allows you to specify how records should be oriented in the resulting JSON file. This provides significant possibilities in how records are structured.
Conclusion
In this tutorial, you learned how to convert a Pandas DataFrame to a JSON string or file. You first learned about the Pandas .to_dict()
method and its various parameters and default arguments. You then learned how to convert a DataFrame to a JSON string and file. Then, you learned how to customize the output by specifying the orientation of the JSON file. You also learned how to customize floating point values, the index, and the indentation of the object.
Additional Resources
To learn more about related topics, check out the tutorials below: