Skip to content

Convert a Pandas DataFrame to JSON

Convert a Pandas DataFrame to JSON Cover Image

In this tutorial, you’ll learn how to convert a Pandas DataFrame to a JSON object and file using Python. Most programming languages can read, parse, and work with JSON. Because of this, knowing how to convert a Pandas DataFrame to JSON is an important skill.

Pandas provides a lot of flexibility when converting a DataFrame to a JSON file. This guide dives into the functionality with practical examples. By the end of this tutorial, you’ll have learned:

  • How to convert a Pandas DataFrame to a JSON string or file
  • How to customize formats for missing data and floats
  • How to customize the structure of the resulting JSON file
  • How to compress a JSON file when converting a Pandas DataFrame

Understanding the Pandas to_json Method

To convert a Pandas DataFrame to a JSON string or file, you can use the .to_json() method. Let’s start by exploring the method and what parameters it has available. The method provides a lot of flexibility in how to structure the JSON file.

# Understanding the Pandas .to_json() Method
import pandas as pd
df = pd.DataFrame()
df.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=None, storage_options=None)

As you can see from the code block above, there are a large number of parameters available in the method. In fact, the method provides default arguments for all parameters, meaning that you can call the method without requiring any further instruction.

The table breaks down the arguments and their default arguments of the .to_json() method:

ParameterDescriptionDefault ArgumentAccepted Values
path_or_buf=The string or path object to write the JSON to. If None, the result is returned as a string.Nonestring, path object, or None
orient=How to format the JSON string.Nonestring
date_format=The type of date conversion.NoneNone, epoch, iso
double_precision=The number of decimal places to use when encoding floating point values.10int
force_ascii=Whether to force encoded strings to be ASCII.TrueBool
date_unit=The time unit to encode to.'ms'string
default_handler=Handler to call if the object cannot otherwise be converted to a suitable format for JSON.Nonecallable
lines=Whether to write out line-delimited JSON.FalseBool
compression=For on-the-fly compression of the output data.'infer'string or dict
index=Whether to include the index values in the JSON string.TrueBool
indent=Length of the whitespace used to indent each record.NoneInteger
storage_options=Extra options for different storage options such as S3 storage.Nonedict
The parameters and default arguments of the Pandas .to_json() method explained

Now that you have a strong understanding of the method, let’s load a sample Pandas DataFrame to follow along with.

Loading a Sample Pandas DataFrame

Let’s begin by loading a sample Pandas DataFrame that you can use to follow along with. The data will be kept deliberately simple, in order to make it simple to follow. Simply copy and paste the code below into your code editor of choice:

# Loading a Sample Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Isla'],
    'Age': [33, np.NaN, 37],
    'Sales': [33.33, 56.32, 43.44444]
})

print(df)

# Returns:
#    Name   Age     Sales
# 0   Nik  33.0  33.33000
# 1  Kate   NaN  56.32000
# 2  Isla  37.0  43.44444

We can see that our DataFrame has 3 columns with 3 records. One of the columns contains strings, another contains integers and missing values, and another contains floating point values. Now that we have a DataFrame loaded, let’s get started by converting the DataFrame to a JSON string.

Convert a Pandas DataFrame to a JSON String

The Pandas .to_json() method contains default arguments for all parameters. Because of this, we can call the method without passing in any specification. By default, Pandas will use an argument of path_or_buf=None, indicating that the DataFrame should be converted to a JSON string.

Let’s see how we can convert our Pandas DataFrame to a JSON string:

# Convert a Pandas DataFrame to a JSON String
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

json_string = df.to_json()
print(json_string)

# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.33,"1":56.32,"2":43.44444}}

We can see that by passing the .to_dict() method with default arguments to a Pandas DataFrame, that a string representation of the JSON file is returned.

You could, of course, serialize this string to a Python dictionary. However, if you wanted to convert a Pandas DataFrame to a dictionary, you could also simply use Pandas to convert the DataFrame to a dictionary.

Convert a Pandas DataFrame to a JSON File

In order to convert a Pandas DataFrame to a JSON file, you can pass a path object or file-like object to the Pandas .to_json() method. By passing a string representing the path to the JSON file into our method call, a file is created containing our DataFrame.

Let’s see what this looks like:

# Creating a JSON File with Our Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

df.to_json('dataframe.json')

In the following section, you’ll learn how to customize the structure of our JSON file.

Customizing the JSON Structure of a Pandas DataFrame

The Pandas .to_json() method provides a ton of flexibility in structuring the resulting JSON file. By default, the JSON file will be structured as 'columns'. The method provides the following options: 'split', 'records', 'index', 'columns', 'values', 'table'. Let’s explore these options to break down the different possibilities.

Pandas DataFrame to JSON: Split Structure

By passing 'split' into the Pandas .to_json() method’s orient argument, you return JSON string that formats the data in the format of a dictionary that breaks out the index, columns, and data separately. This is demonstrated below and can be helpful when moving data into a database format:

# Using the 'split' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='split'))

# Returns:
# {"columns":["Name","Age","Sales"],"index":[0,1,2],"data":[["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]}

Pandas DataFrame to JSON: Records Structure

By passing 'records' into the Pandas .to_json() method’s orient argument, you return a JSON string that formats the data in the format of a list of dictionaries where the keys are the columns and the values are the records for each individual record.

# Using the 'records' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='records'))

# Returns:
# [{"Name":"Nik","Age":33.0,"Sales":33.33},{"Name":"Kate","Age":null,"Sales":56.32},{"Name":"Isla","Age":37.0,"Sales":43.44444}]

Pandas DataFrame to JSON: Index Structure

By passing 'index' into the Pandas .to_json() method’s orient argument, you return a JSON string that formats the data in the format of a dictionary that contains indices as their key and dictionaries of columns to record mappings.

# Using the 'index' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='index'))

# Returns:
# {"0":{"Name":"Nik","Age":33.0,"Sales":33.33},"1":{"Name":"Kate","Age":null,"Sales":56.32},"2":{"Name":"Isla","Age":37.0,"Sales":43.44444}}

Pandas DataFrame to JSON: Columns Structure

By passing 'columns' into the Pandas .to_json() method’s orient argument, you return a JSON string that formats the data in the format of a dictionary that contains the columns as keys and dictionaries of the index to record mappings.

# Using the 'columns' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='columns'))

# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.33,"1":56.32,"2":43.44444}}

Pandas DataFrame to JSON: Values Structure

By passing 'values' into the Pandas .to_json() method’s orient argument, you return a JSON string that formats the data in the format of only the values.

# Using the 'values' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='values'))

# Returns:
# [["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]

Pandas DataFrame to JSON: Table Structure

By passing 'table' into the Pandas .to_json() method’s orient argument, you return a JSON string that formats the data in the format of a schema table.

# Using the 'table' orientation
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='table'))

# Returns:
# '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"Name","type":"string"},{"name":"Age","type":"number"},{"name":"Sales","type":"number"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"Name":"Nik","Age":33.0,"Sales":33.33},{"index":1,"Name":"Kate","Age":null,"Sales":56.32},{"index":2,"Name":"Isla","Age":37.0,"Sales":43.44444}]}'

Modifying Float Values When Converting Pandas DataFrames to JSON

By default, Pandas will reduce the floating point precision to include 10 decimal places. We can customize this behavior by modifying the double_precision= parameter of the .to_json() method.

One of the values in our DataFrame contains a floating point value with a precision of 5. Let’s modify the behavior to include only a single point of precision:

# Modifying Floating Point Precision Values
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(double_precision=1))

# Returns:
# {"Name":{"0":"Nik","1":"Kate","2":"Isla"},"Age":{"0":33.0,"1":null,"2":37.0},"Sales":{"0":33.3,"1":56.3,"2":43.4}}

In the following section, you’ll learn how to convert a DataFrame to JSON and include the index.

Convert Pandas DataFrames to JSON and Include the Index

By default, Pandas will include the index when converting a DataFrame to a JSON object. We can modify this behavior by using the index= parameter. This parameter can only be modified when you orient your DataFrame as 'split' or 'table'.

Let’s see what this looks like to drop the index when converting to JSON:

# Dropping an Index When Converting a DataFrame to JSON
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(orient='split', index=False))

# Returns:
# {"columns":["Name","Age","Sales"],"data":[["Nik",33.0,33.33],["Kate",null,56.32],["Isla",37.0,43.44444]]}

In the following section, you’ll learn how to specify compression for your resulting JSON file.

How to Compress Files When Converting Pandas DataFrames to JSON

The Pandas .to_json() method provides significant customizability in how to compress your JSON file. By default, Pandas will attempt to infer the compression to be used based on the file extension that has been provided.

Pandas currently supports compressing your files to zip, gzip, bz2, zstd and tar compressions. Let’s see how we can compress our DataFrame to a zip compression:

# Compressing Your JSON File When Converting a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

df.to_json('DataFrame.json', compression='zip')

In the following section, you’ll learn how to modify the indent of your JSON file. When you then want to read your JSON file as a DataFrame, you’ll need to specify the type of compression used.

How to Change the Indent of a JSON File When Converting a Pandas DataFrame

Pandas also allows you to specify the indent of printing out your resulting JSON file. This is similar to pretty-printing JSON in Python. By using the indent= parameter, you can specify an integer representing the number of indents you want to provide.

Let’s see what this looks like when we pass in a value of 4:

# Specifying the Indent of a JSON File When Converting a Pandas DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Nik', 'Kate', 'Isla'], 'Age': [33, np.NaN, 37], 'Sales': [33.33, 56.32, 43.44444]})

print(df.to_json(indent=4))

# Returns:
# {
#     "Name":{
#         "0":"Nik",
#         "1":"Kate",
#         "2":"Isla"
#     },
#     "Age":{
#         "0":33.0,
#         "1":null,
#         "2":37.0
#     },
#     "Sales":{
#         "0":33.33,
#         "1":56.32,
#         "2":43.44444
#     }
# }

Frequently Asked Questions

How do I write a Pandas DataFrame to a JSON file?

The Pandas to_json() method allows you to convert a Pandas DataFrame to a JSON string or file. The method provides customization in terms of how the records should be structured, compressed, and represented.

What is the orient parameter in Pandas to_json?

The orient parameter allows you to specify how records should be oriented in the resulting JSON file. This provides significant possibilities in how records are structured.

Conclusion

In this tutorial, you learned how to convert a Pandas DataFrame to a JSON string or file. You first learned about the Pandas .to_dict() method and its various parameters and default arguments. You then learned how to convert a DataFrame to a JSON string and file. Then, you learned how to customize the output by specifying the orientation of the JSON file. You also learned how to customize floating point values, the index, and the indentation of the object.

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *