Skip to content

Pandas Dataframe to CSV File – Export Using .to_csv()

Pandas Export to CSV Cover Image

Learn how to use Pandas to convert a dataframe to a CSV file, using the .to_csv() method, which helps export Pandas to CSV files. You’ll learn how to work with different parameters that allow you to include or exclude an index, change the seperators and encoding, work with missing data, limit columns, and how to compress.

CSVs, short for comma separated values, are highly useful formats that store data in delimited text file (typically separated by commas), that place records on separate rows. They are often used in many applications because of their interchangeability, which allows you to move data between different proprietary formats with ease.

Knowing how to work with CSV files in Python and Pandas will give you a leg up in terms of getting started!

The Quick Answer: Use the .to_csv() Function

Quick Answer - Pandas Export Dataframe to CSV

A Quick Summary

The table below summarizes the key parameters and their scenarios of the Pandas .to_csv() method. Click on a parameter in the table to go to the detailed section below.

ParameterWhen to Use ItAn Example
index = Write with or without and indexdf.to_csv(..., index = False)
columns = Write only a subset of columns to CSVdf.to_csv(..., columns = ['col1', 'col2'])
sep = Change the type of separator to usedf.to_csv(..., sep = '\t')
na_rep = How to change how missing data are saveddf.to_csv(..., na_rep = 'N/A')
header = Whether to include a header or notdf.to_csv(..., header = False)
encoding = Change the encoding type useddf.to_csv(..., encoding = 'utf-8')
compression = Whether to compress the data or notdf.to_csv(..., compression = True)
date_format = Specify the format for datetime valuesdf.to_csv(..., date_format = '%Y-%m-%d')
Read on to understand these parameters in detail

What are CSV Files?

Comma-separated value files, or CSV files, are text files often used to represent tabular data. Data are commonly separated by commas, giving them their name. While data attributes are separated by commas, records tend to be separated by new lines.

CSV files are light-weight and tend to be relatively platform agnostic. Because of this, they are often used to transfer data between different systems.

A CSV file will often store the headers of a table in the first row. Many programs will know to interpret a first row as the header row.

Let’s take a look at how a CSV file may store data:

Name,Year,Sales
Nik,2020,1000
Nik,2021,2300
Jane,2020,1900
Jane,2021,3400

If we were to convert this to a table, it would look like this:

NameYearSales
Nik20201000
Nik20212300
Jane20201900
Jane20213400

Let’s now dive into how to load a sample Pandas dataframe which you’ll use throughout this tutorial to export your data.

Loading a Sample Dataframe

If you want to follow along with this tutorial, feel free to load the dataframe provided below. The dataframe will have three columns and only four records, to keep things lightweight and easy to use.

# Loading a Sample Dataframe
import pandas as pd
df = pd.DataFrame.from_dict(
    {   'Name': ['Nik', 'Nik', 'Jane', 'Jane'],
        'Year': [2020, 2021, 2020, 2021],
        'Sales': [1000, 2300, 1900, 3400],
    }
)
print(df)

# Returns:
#    Name  Year  Sales
# 0   Nik  2020   1000
# 1   Nik  2021   2300
# 2  Jane  2020   1900
# 3  Jane  2021   3400

Now that you have a dataframe to work with, let’s learn how to use Pandas to export a dataframe to a CSV file.

Export Pandas Dataframe to CSV

In order to use Pandas to export a dataframe to a CSV file, you can use the aptly-named dataframe method, .to_csv(). The only required argument of the method is the path_or_buf = parameter, which specifies where the file should be saved.

The argument can take either:

  • A relative filename, which will save the file as a CSV file in the current working directory, or
  • A full file path, which will save the file as a CSV in the location specified

Let’s see what this looks in our Pandas code:

# Export a Pandas Dataframe to CSV
# Relative File Path
df.to_csv('datagy.csv')

# Fixed File Path
df.to_csv('/Users/datagy/datagy.csv')

In the next section, you’ll learn how to remove an index from your Pandas dataframe when exporting to a CSV file.

Exporting Pandas to CSV Without Index

By default, Pandas will include a dataframe index when you export it to a CSV file using the .to_csv() method. If you don’t want to include an index, simply change the index = False parameter.

Let’s see how we can do this:

# Export a Pandas Dataframe to CSV without an Index
# Without the index
df.to_csv('datagy.csv', index = False)

# With the Index
df.to_csv('datagy.csv')

Doing so can be quite helpful when your index is meaningless. However, think carefully about this if you are using the index to store meaningful information, such as time series data.

The default parameter for this is True. Because of this, if you do want to include the index, you can simply leave the argument alone.

Working with Columns When Exporting Pandas to CSV

When you export your data, you may be cognizant of the size of your data. One of the ways that you can reduce the size of the exported CSV file is to limit the number of columns that you export.

You can specify which columns to include in your export using the columns = argument, which accepts a list of columns that you want to include. Any columns not included in the list will not be included in the export.

Let’s see how we can use the columns = parameter to specify a smaller subset of columns to export:

# Export a Pandas Dataframe to CSV with only some columns
# Only certain columns
df.to_csv('datagy.csv', columns=['Name', 'Year'])

# All columns
df.to_csv('datagy.csv')

Want to learn more about calculating the square root in Python? Check out my tutorial here, which will teach you different ways of calculating the square root, both without Python functions and with the help of functions.

Changing the Separator When Exporting Pandas to CSV

While comma-separated value files get their names by virtue of being separated by commas, CSV files can also be delimited by other characters. For example, a common separator is the tab value, which can be represented programatically by \t.

Let’s see how we can use the sep= argument to change our separator in Pandas:

# Export a Pandas Dataframe to CSV with a different delimiter
# Specific delimiter
df.to_csv('datagy.csv', sep='\t')

# Comma delimiter
df.to_csv('datagy.csv')

In the next section, you’ll learn how to label missing data when exporting a dataframe to CSV.

Working with Missing Data When Exporting Pandas to CSV

By default, CSV files will not include any information about missing data. They will actually, simply, not show any value at all.

If, however, you want to display a different value, such as N/A for all your missing values, you can do this using the na_rep = argument. The default argument for this is an empty string, but it accepts any type of string.

Let’s load in the string 'N/A' to make it clear that the data is actually missing:

# Export a Pandas Dataframe to CSV With Missing Data
# Different Missing Identifier
df.to_csv('datagy.csv', na_rep='N/A')

# Blank String for Missing Data
df.to_csv('datagy.csv')

Removing Header When Exporting Pandas to CSV

There may be times in your data science journey where you find yourself needing to export a dataset from Pandas without a header. This can be particularly true when exporting large datasets that will need to be appended together afterwards.

Pandas makes it easy to export a dataframe to a CSV file without the header.

This is done using the header = argument, which accepts a boolean value. By default, it uses the value of True, meaning that the header is included.

Let’s see how we can modify this behaviour in Pandas:

# Export a Pandas Dataframe Without a Header
# Without Header
df.to_csv('datagy.csv', header=False)

# With Header
df.to_csv('datagy.csv')

Pandas to CSV with Different Encodings

When you’re working with string data, you’ll often find yourself needing to encode data. This is less common when you’re working with numerical or encoded data only, but strings will often require a little bit extra instruction on how they are to be interpreted.

The default type of encoding is utf-8, which is an incredibly common encoding format.

Let’s see how we can export a Pandas dataframe to CSV using the latin1 encoding:

# Export a Pandas Dataframe With Encodings
# With latin1 Encoding
df.to_csv('datagy.csv', encoding='latin1')

# With utf-8 Encoding
df.to_csv('datagy.csv')

Compress Data When Exporting Pandas to CSV

When you’re working with large datasets that are meant for long-term storage, it can be helpful to compress the dataset, especially when saving it to a CSV format.

When we compress a dataset, the file size becomes smaller. However, the time that it takes Pandas to export to CSV also increases. Similarly, the time that Pandas will take to read the CSV file into a dataframe increases. This is because the compression step takes longer than simply exporting.

However, there are many use cases when compression is a helpful tool, especially when archiving data for long-term storage, or when data isn’t used frequently.

Let’s see how we can use this boolean argument to compress our data:

# Export a Pandas Dataframe With Compression
# With gzip Compression
df.to_csv('datagy.csv', compression='gzip')

# Without Compression
df.to_csv('datagy.csv')

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Specify Date Format When Exporting Pandas to CSV

Pandas makes working with date time formats quite easy. However, the databases that you’re moving data between may have specific formats for dates that need to be followed.

Thankfully, Pandas makes this process quite easy, by using the date_format = argument. The argument accepts a string that allows us to specify how dates should be formatted in the exported CSV.

To learn more about Python date formats, check out my tutorial that will teach you to convert strings to date times.

Let’s see how we can apply a YYYY-MM-DD format to our dates when exporting a Pandas dataframe to CSV:

# Export a Pandas Dataframe With Date Format Specified
# With YYYY-MM-DD Format
df.to_csv('datagy.csv', date_format='%Y-%m-%d')

# Without specified date format
df.to_csv('datagy.csv')

Want to learn more about Python f-strings? Check out my in-depth tutorial, which includes a step-by-step video to master Python f-strings!

Conclusion

In this tutorial, you learned how to use Pandas to export your dataframes to CSV files. You learned how to make excellent use of the Pandas .to_csv() function, along with how to use its many parameters to customize how your data as exported. Some of the key ones you learned to use are the index=, which includes or excludes an index, and the encoding= parameter, which specifies the encoding you wish to use.

To learn more about the Pandas .to_csv() method, check out the official documentation here.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *