Skip to content

Pandas read_csv() – Read CSV and Delimited Files in Pandas

Pandas read_csv Read CSV and Delimited Files in Pandas Cover Image

In this tutorial, you’ll learn how to use the Pandas read_csv() function to read CSV (or other delimited files) into DataFrames. CSV files are a ubiquitous file format that you’ll encounter regardless of the sector you work in. Being able to read them into Pandas DataFrames effectively is an important skill for any Pandas user.

By the end of this tutorial, you’ll have learned the following:

  • How to use the Pandas read_csv() function
  • How to customize the reading of CSV files by specifying columns, headers, data types, and more
  • How to limit the number of lines Pandas reads
  • And much more

Understanding the Pandas read_csv() Function

The Pandas read_csv() function is one of the most commonly used functions in Pandas. The function provides a ton of functionality. In this tutorial, we’ll cover the most important parameters of the function, which give you significant flexibility. In fact, you’ll get the most comprehensive overview of the Pandas read_csv() function.

Take a look at the function below to get a sense of the many different parameters available:

import pandas as pd
pd.read_csv(filepath_or_buffer, *, sep=',', delimiter=None, header='infer', names=_NoDefault.no_default, index_col=None, usecols=None, squeeze=None, prefix=_NoDefault.no_default, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors='strict', dialect=None, error_bad_lines=None, warn_bad_lines=None, on_bad_lines=None, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options=None)

As I had mentioned, you won’t learn about all of these parameters. However, you’ll learn about the most important ones, including:

  • filepath_or_buffer= provides a string representing the path to the file, including local files, URLs, URL schemes (such as for S3 storage)
  • sep= and delimiter= use a string to indicate what character(s) delimit the file
  • header= specifies the row number(s) to use as the column names and can be used to indicate that no header exists in the file (with None)
  • names= is used to provide a list of column names, either when no column headers are provided or if you want to overwrite them
  • usecols= is used to specify which columns to read in, by passing in a list of column labels
  • skiprows= and skipfooter= can specify a number of rows to skip at the top or bottom (and the skiprows parameter can even accept a callable)
  • parse_dates= accepts a list of columns to parse as dates

The list above covers most of the common ones that cover most of the functionality you’ll need to read CSV files in Pandas.

Note that as of Pandas 2.0 (released in April 2023) that the date_parser parameter has been deprecated in favor of the date_format parameter.

How to Read a CSV File with Pandas

In order to read a CSV file in Pandas, you can use the read_csv() function and simply pass in the path to file. In fact, the only required parameter of the Pandas read_csv() function is the path to the CSV file. Let’s take a look at an example of a CSV file:

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

We can save this code to be called sample1.csv. In order to read this CSV file using Pandas, we can simply pass the file path to that file into our function call. Let’s see what this looks like:

# How to read a CSV file with Pandas
import pandas as pd
df = pd.read_csv('sample1.csv')

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

We can see how simple it was to read this CSV file with Pandas. Of course, it helped that the CSV was clean and well-structured. You’ll learn more about how to work file CSV files that aren’t as neatly structured in upcoming sections.

There are a few more things to note here:

  1. Pandas read the first line as the columns of the dataset,
  2. Pandas assumed the file was comma-delimited, and
  3. The index was created using a range index.

Let’s now dive into how to use a custom delimiter when reading CSV files.

How to Use a Custom Delimiter in Pandas read_csv()

In order to use a custom delimiter when reading CSV files in Pandas, you can use the sep= or the delimiter= arguments. By default, this is set to sep=',', meaning that Pandas will assume the file is comma-delimited.

Let’s take a look at an another dataset, which we have now saved in sample2.csv:

Name;Age;Location;Company
Nik;34;Toronto;datagy
Kate;33;New York City;Apple
Joe;40;Frankfurt;Siemens
Nancy;23;Tokyo;Nintendo

The dataset above is the same dataset as we worked with before. However, the values are now separated by semicolons, rather than commas. Since this is different from the default value, we now need to explicitly pass this into the function, as shown below:

# How to read a CSV file with Pandas with custom delimiters
import pandas as pd
df = pd.read_csv('sample2.csv', sep=';')

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

We can see that by specifying the delimeter that Pandas was able to read the file correctly. Because delimiters can vary wildly, it’s good to know how to handle these cases.

Similarly, if your data was separated with tabs, you could use sep='\t'.

How to Specify a Header Row in Pandas read_csv()

By default, Pandas will infer whether to read a header row or not. This behavior can be controlled using the header= parameter, which accepts the following values:

  • an integer representing the row to read,
  • a list of integers to read,
  • None if no header row is present, and
  • 'infer which will attempt to infer the data.

So far, Pandas has inferred the dataset’s header to start in row 0. However, take a look at the dataset shown below, which we have saved in sample3.csv:

Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

We can see that it’s the same dataset, however without a header row. In these cases, we’ll need to explicitly pass in the column names to use. Let’s take a look at what reading this file looks like:

# Specifying a Header Row in a CSV File
import pandas as pd
cols = ['Name', 'Age', 'Location', 'Company']
df = pd.read_csv('sample3.csv', header=None, names=cols)

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

With our code block above, we actually accomplished two things:

  1. We instructed Pandas not to read any line from the CSV file as our header, and
  2. We passed in custom column names into the DataFrame

Let’s now take a look at how we can skip rows using the Pandas read_csv() function.

How to Skip Rows or Footers in Pandas read_csv()

Pandas provides significant flexibility in skipping records when reading CSV files, including:

  1. Skipping a set number of rows from the top,
  2. Skipping a list of rows using a list of values,
  3. Skipping rows using a callable, and
  4. Skipping rows from the bottom

Let’s take a look at how this works:

Skipping Rows When Reading a CSV in Pandas

In some cases, reporting solutions will include rows of information about a report, such as a title. We can skip this by specifying a single row reference or a list of rows to skip. Take a look at our sample dataset, which we’ll refer to as sample4a.csv:

Sample report

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

We can see that we want to skip the first two rows of data. For this, we can simply pass in skiprows=2, as shown below:

# Skipping Rows When Reading a CSV File
import pandas as pd
df = pd.read_csv('sample4a.csv', skiprows=2)

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

We can see that Pandas simply jumped over the first two rows in the data. This allowed us to prevent reading the data that’s not part of the actual dataset.

Using a Callable (Function) to Skip Rows in Pandas read_csv

Pandas also allows you to pass in a callable, allowing you to skip rows meeting a condition. At first glance, this might seem confusing. However, the function can be used to read, for example, every second or fifth record. Let’s take a look at how we can read only every second record of our dataset (using the previous sample1.csv):

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

In order to read only every second row, you can use the following lambda callable in the skiprows= parameter:

# Skipping Rows When Reading a CSV File
import pandas as pd
df = pd.read_csv('sample1.csv', skiprows = lambda x: x % 2)

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0   Kate   33  New York City     Apple
# 1  Nancy   23          Tokyo  Nintendo

In the code block above, we passed in a lambda function of lambda x: x % 2. In this function, we check if there is a remainder from the modulus operation. If it is, the value is truthy, meaning that it is returned.

Similarly, Pandas allows you to skip rows in the footer of a dataset. This can be helpful if reporting software includes values describing things like the date the report was run.

Take a look at the dataset below, which we’ve labeled sample4b.csv:

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

Date run: 05/05/2023

In order to remove the bottom two rows, we can pass in skipfooter=2, as shown below:

# Skipping Rows When Reading a CSV File
import pandas as pd
df = pd.read_csv('sample4b.csv', skipfooter=2, engine='python')

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

In the code block above, we passed in two arguments:

  1. skipfooter=2 specifies that we want to skip the bottom two records, and
  2. engine='python' which specifies the engine we want to use to read the data. While not necessary, Python will raise a ParserWarning otherwise.

In the following section, you’ll learn how to read only a number of rows in the Pandas read_csv() function.

How to Read Only a Number of Rows in Pandas read_csv()

When working with large datasets, it can be helpful to read only a set number of records. This can be helpful, both, when working with datasets that are too large to hold in memory or if you simply want to take a look at a portion of the data.

In order to read only a number of rows, you can nrows=, which accepts an integer of values. Let’s keep using our original dataset, sample1.csv:

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

In the code block below, we use the nrows= parameter to read only 2 of the rows:

# Reading Only a Number of Rows in Pandas
import pandas as pd
df = pd.read_csv('sample1.csv', nrows=2)

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple

In the code block above, we passed in that we only wanted to read two rows. This prevents you from needing to load more data into memory than necessary.

In the following section, you’ll learn how to read only some columns in a CSV file.

How to Read Only Some Columns in Pandas read_csv()

Pandas also allows you to read only specific columns when loading a dataset easily. In particular, the function allows you to specify columns using two different data types passed into the usecols= parameter:

  1. A list of column labels, or
  2. A callable (function)

In most cases, you’ll end up passing in a list of column labels. When using a callable, the callable will evaluate against the list of columns and return only the ones that are true.

Let’s see how we can pass in a list of column labels to read only a few columns in Pandas. For this, we’ll use our original sample1.csv file, as shown below:

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

Let’s now take a look at how we can use the usecols= parameter to read only a subset of columns:

# Reading Only a Number of Columns in Pandas
import pandas as pd
df = pd.read_csv('sample1.csv', usecols=['Name', 'Age'])

print(df.head())

# Returns:
#     Name  Age
# 0    Nik   34
# 1   Kate   33
# 2    Joe   40
# 3  Nancy   23

We can see in the code block above that we used the usecols= parameter to pass in a list of column labels. This allowed us to read only a few columns from the dataset. It’s important to note that we can also pass in a list of position labels. To replicate the example above, we could also use usecols=[0, 1].

Another important note to be aware of is that the order of these values don’t matter. Using usecols=[0, 1] will result with the same dataset as usecols=[1, 0].

How to Specify an Index Column in Pandas read_csv()

In order to specify an index column when reading a CSV file in Pandas, you can pass the following into the index_col= parameter:

  1. A column label or position (integer),
  2. A list of columns labels or positions,
  3. False, which forces Pandas not to assign a column as an index.

Let’s see how we can use our sample1.csv file and read the Name column as the index:

# Specifying an Index Column When Reading CSV Files
import pandas as pd
df = pd.read_csv('sample1.csv', index_col='Name')

print(df.head())

# Returns:
#        Age       Location   Company
# Name                               
# Nik     34        Toronto    datagy
# Kate    33  New York City     Apple
# Joe     40      Frankfurt   Siemens
# Nancy   23          Tokyo  Nintendo

We can see that we passed in the Name column into the index_col= parameter. This allowed us to read that column as the index of the resulting DataFrame.

How to Parse Dates in Pandas read_csv()

When reading columns as dates, Pandas again provides significant opportunities. By using the parse_dates= parameter, you have a number of different options to parse dates:

  1. You can pass in a boolean, indicating whether to parse the index column as a date
  2. A list of integers or column labels, where each column is read as a separate column
  3. A list of lists, where each column is read as a standard date part and is returned as a single column, and
  4. A dictionary of `{‘column_name’: [‘list’, ‘of’, ‘individual’, ‘columns’]}, where the key represents the name of the resulting column.

Let’s take a look at a simple example first, where we have a date stored in a column named 'Date', as shown in sample5.csv':

Name,Year,Month,Day,Date
Nik,2022,5,5,"2022-05-05"
Kate,2023,6,6,"2023-06-06"
Joe,2024,7,7,"2024-07-07"
Nancy,2025,8,8,"2025-08-08"

To read the Date column as a date, you can pass the label into a list into the parse_dates= parameter, as shown below:

# Parsing Dates When Reading CSV Files in Pandas
import pandas as pd
df = pd.read_csv('sample5.csv', parse_dates=['Date'])

print(df.head())

# Returns:
#     Name  Year  Month  Day       Date
# 0    Nik  2022      5    5 2022-05-05
# 1   Kate  2023      6    6 2023-06-06
# 2    Joe  2024      7    7 2024-07-07
# 3  Nancy  2025      8    8 2025-08-08

We can see that the resulting DataFrame read the date column correctly. We also have three columns representing the year, month, and day. We could pass in a list of lists containing these columns. However, Pandas would call the resulting column 'Year_Month_Day', which isn’t great.

Instead, let’s pass in a dictionary that labels the column, as shown below:

# Parsing Dates When Reading CSV Files in Pandas
import pandas as pd
df = pd.read_csv('sample5.csv', parse_dates={'Other Date': ['Year', 'Month', 'Day']})

print(df.head())

# Returns:
#   Other Date   Name        Date
# 0 2022-05-05    Nik  2022-05-05
# 1 2023-06-06   Kate  2023-06-06
# 2 2024-07-07    Joe  2024-07-07
# 3 2025-08-08  Nancy  2025-08-08

In the code block above, we passed in parse_dates={'Other Date': ['Year', 'Month', 'Day']}, where the key represents the resulting column label and the value represents the columns to read in.

**If you’re working with different date formats, it’s best to just read the data in first. Then, you can use the pd.to_datetime() function to correctly format the column.

How to Specify Data Types in Pandas read_csv()

In most cases, Pandas will be able to correctly infer the data types of your columns. However, specifying the data types can make reading the dataset much faster and help correct any malformed assumptions. In order to specify a data type when reading a CSV file using Pandas, you can use the dtype= parameter.

Let’s see how we can specify the datatypes of our original dataset, sample1.csv, as shown below:

Name,Age,Location,Company
Nik,34,Toronto,datagy
Kate,33,New York City,Apple
Joe,40,Frankfurt,Siemens
Nancy,23,Tokyo,Nintendo

In order to do this, we can pass in a dictionary of column labels and their associated data type, as shown below:

# Specifying Data Types with Pandas read_csv()
import pandas as pd
df = pd.read_csv('sample1.csv', dtype={'Name':str, 'Age':int, 'Location':str, 'Company':str})

print(df.head())

# Returns:
#     Name  Age       Location   Company
# 0    Nik   34        Toronto    datagy
# 1   Kate   33  New York City     Apple
# 2    Joe   40      Frankfurt   Siemens
# 3  Nancy   23          Tokyo  Nintendo

The sample dataset we worked with above had easy-to-infer data types. However, the power of this comes when you want to trim down the space of a dataset, by specifying smaller data types, such as np.int32, etc.

Conclusion

In this tutorial, you learned how to use the Pandas read_csv() function to read CSV files (or other delimited files). The function provides a tremendous amount of flexibility in terms of how to read files. For example, the function allows you to specify delimiters, set index columns, parse dates, and so much more.

Additional Resources

To learn more about related topics, check out the resources below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *