In this tutorial, you’ll learn how to work with dates, times, and DateTime in Pandas and Python. Working with DateTime in Python and Pandas can be a complicated thing. This guide aims to make the complicated, simple, by focusing on what you need to know to get started and to know enough to discover more on your own. Dates and times are critical forms of data in many domains, including finance, economics, science, and more.
By the end of this tutorial, you’ll have learned how to:
- Load DateTimes effectively in Pandas
- Access DateTime attributes in Pandas
- Filter a Pandas DataFrame based on DateTime filters
- Resample Pandas DataFrames based on DateTimes
Table of Contents
Importing DateTimes in Pandas DataFrames
Pandas intelligently handles DateTime values when you import a dataset into a DataFrame. The library will try to infer the data types of your columns when you first import a dataset. For example, let’s take a look at a very basic dataset that looks like this:
# A very simple .csv file
Date,Amount
01-Jan-22,100
02-Jan-22,125
03-Jan-22,150
You can find the file here. Let’s try to import the dataset into a Pandas DataFrame and check the column data types.
# Loading a Small DataSet
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/sample_dates.csv')
print(df)
# Returns
# Date Amount
# 0 01-Jan-22 100
# 1 02-Jan-22 125
# 2 03-Jan-22 150
This is great! It looks like everything worked fine. Not so fast – let’s check the data types of the columns in the dataset. We can do this using the .info()
method.
# Checking column data types
print(df.info())
# Returns
# Data columns (total 2 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Date 3 non-null object
# 1 Amount 3 non-null int64
# dtypes: int64(1), object(1)
# memory usage: 176.0+ bytes
We can see that the data type of the Date
column is object
. This means that the data are stored as strings, meaning that you can’t access the slew of DateTime functionality available in Pandas.
Using Pandas parse_dates to Import DateTimes
One easy way to import data as DateTime is to use the parse_dates=
argument. The argument takes a list of columns that Pandas should attempt to infer to read. Let’s try adding this parameter to our import statement and then re-print out the info about our DataFrame:
# Parsing Dates in .read_csv()
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/sample_dates.csv', parse_dates=['Date'])
print(df.info())
# Returns
# Data columns (total 2 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Date 3 non-null datetime64[ns]
# 1 Amount 3 non-null int64
# dtypes: datetime64[ns](1), int64(1)
# memory usage: 176.0 bytes
We can see that our column is now correctly imported as a DateTime format.
Using to_datetime to Convert Columns to DateTime
The example above worked quite well when we imported a straightforward date format. Now let’s take a look at a more complicated example. We’ll load data from here, that looks like this:
# More complex datetime formats
Date,Close Price,High Price,Low Price,Open Price,Volume
Date,Close Price,High Price,Low Price,Open Price,Volume
2021-12-10 05AM,48246.57,48359.35,48051.08,48170.66,827.39761
2021-12-10 06AM,47847.59,48430,47810.81,48249.78,1296.18883
2021-12-10 07AM,47694.62,48037.48,47550,47847.59,2299.85298
2021-12-10 08AM,48090.35,48169.06,47587.39,47694.62,1371.25447
When we pass in the Date
column as we did earlier, Pandas can’t interpret the date format. Let’s see what this looks like. The code below shows that the date wasn’t actually read as a DateTime format, but rather continues to exist as a string.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/time_data.csv', parse_dates=['Date'])
print(df.info())
# Returns:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 337 entries, 0 to 336
# Data columns (total 6 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Date 337 non-null object
# 1 Close Price 337 non-null float64
# 2 High Price 337 non-null float64
# 3 Low Price 337 non-null float64
# 4 Open Price 337 non-null float64
# 5 Volume 337 non-null float64
# dtypes: float64(5), object(1)
# memory usage: 15.9+ KB
One of the ways we can resolve this is by using the pd.to_datetime()
function. The function takes a Series of data and converts it into a DateTime format. We can customize this tremendously by passing in a format specification of how the dates are structured.
The format=
parameter can be used to pass in this format. The format codes follow the 1989 C standard. Of course, chances are you don’t actually know the C standard for dates off by hard. The full list can be found here, but the table below breaks down a few of the most important ones.
Format Code | Description | Examples |
---|---|---|
%m | Month as a zero-padded number | 01, 02, 03, … |
%B | Month as a full name (following locale) | December, January, March |
%y | Year without a century as a zero-padded number | 01, 02, 22 |
%Y | Year with the century included | 2001, 2002, 2022 |
%d | Day of the month as a zero-padded number | 01, 02, 03 |
%a | Weekday as the locale’s abbreviated name | Mon, Tue, Wed |
%A | Weekday as the locale’s full name | Monday, Tuesday, Wednesday |
%H | Hour on the 24-hour clock as a zero-padded decimal | 12, 13, 14 |
%I | Hour on the 12-hour clock as a zero-padded decimal | 12, 1, 2 |
%p | Locale’s equivalent of AM or PM | AM, PM |
Let’s see how we can make use of these format codes to convert our string into a properly formatted DateTime object. In order to do this, we pass in the string using the percent signs and any other formatting exactly as it is, including spaces and hyphens.
# Converting a Complex String to DateTime
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H%p')
print(df.head())
# Returns:
# Date Close Price High Price Low Price Open Price Volume
# 0 2021-11-24 05:00:00 56596.36 56790.76 56483.12 56560.01 1112.23090
# 1 2021-11-24 06:00:00 56607.00 56797.02 56214.85 56596.36 1505.32570
# 2 2021-11-24 07:00:00 56497.47 56702.47 56389.00 56607.00 1238.54469
# 3 2021-11-24 08:00:00 56849.02 57560.00 56389.00 56497.46 2582.79378
# 4 2021-11-24 09:00:00 56682.55 56996.47 56649.93 56849.01 1314.82999
What we did here was pass the Series into the .to_datetime()
function as well as the format. The format matches the complex pattern and successfully transformed the string into a DateTime object.
It’s not always ideal to convert the column after loading your DataFrame. Because of this, in the next section you’ll learn how to pass in a formatter to the import statement.
Using date_parser to Import Complex DateTime
While you can always convert a column into a DateTime object after loading the DataFrame, it can be a lot cleaner to do this as you’re loading the DataFrame in the first place. This is where the date_parser=
parameter comes into play. The parameter takes a function that instructs Pandas how to interpret the string as a DateTime object.
Since it’s a function you won’t use elsewhere, this is an ideal candidate for an anonymous lambda function. Let’s create the function and assign it to the variable parser
. We can then pass this function into the .read_csv()
function. The function itself will make use of the .strptime()
method which converts a string into a DateTime object.
# Creating a function to parse dates
import pandas as pd
from datetime import datetime
parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H%p')
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/time_data.csv', parse_dates=['Date'], date_parser=parser)
print(df.head())
# Returns:
# Date Close Price High Price Low Price Open Price Volume
# 0 2021-11-24 05:00:00 56596.36 56790.76 56483.12 56560.01 1112.23090
# 1 2021-11-24 06:00:00 56607.00 56797.02 56214.85 56596.36 1505.32570
# 2 2021-11-24 07:00:00 56497.47 56702.47 56389.00 56607.00 1238.54469
# 3 2021-11-24 08:00:00 56849.02 57560.00 56389.00 56497.46 2582.79378
# 4 2021-11-24 09:00:00 56682.55 56996.47 56649.93 56849.01 1314.82999
The function relies on importing the datetime
module from the datetime
library. The function takes a single argument, x
and uses the format to convert a string into a DateTime object.
DateTime Attributes and Methods in Pandas
Now that you’ve successfully imported your Pandas DataFrame with properly formatted dates, let’s learn how you can make use of the special attributes that come along with them. For example, you can easily access information about the date time, such as the weekday, month name, and more. This is because the DateTime object contains significantly more information than the representation shows.
DateTime Attributes in Pandas
Let’s take a look at a few. Namely, you’ll learn how to create columns containing the weekday, quarter, and hour of the day:
# Parsing out DateTime Parts
df['Weekday'] = df['Date'].dt.dayofweek
df['Quarter'] = df['Date'].dt.quarter
df['Hour'] = df['Date'].dt.hour
print(df[['Date', 'Weekday', 'Quarter', 'Hour']].head())
# Returns:
# Date Weekday Quarter Hour
# 0 2021-11-24 05:00:00 2 4 5
# 1 2021-11-24 06:00:00 2 4 6
# 2 2021-11-24 07:00:00 2 4 7
# 3 2021-11-24 08:00:00 2 4 8
# 4 2021-11-24 09:00:00 2 4 9
There’s a whole slew of data hiding underneath a DateTime object! This allows us to create complex filter on the DataFrame. These attributes can be accessed using the .dt
accessor, which is very similar to the .str
accessor. You then gain access to vectorized versions of accessing DateTime values.
DateTime Methods in Pandas
Similarly, you can apply DateTime methods to your DataTime columns. These look similar to the attributes, but include the ()
method call features. The reason these are different is that they represent some form of a calculation on the data.
In the example above, you used the .dayofweek
accessor to get a numeric representation of the weekday. However, it may be good to access, for example, the name of the weekday. You can do this by using the .day_name()
method, which returns the string representation of the weekday.
# Accessing the String Name of a Week Day
print(df['Date'].dt.day_name())
# Returns:
# 0 Wednesday
# 1 Wednesday
# 2 Wednesday
# 3 Wednesday
# 4 Wednesday
# ...
Similarly, you can access different calculated attributes. For example, you can calculate the largest and smallest dates using the .max()
and .min()
methods. Let’s see what this looks like:
# Calculating Max and Min DateTimes
print(df['Date'].max())
print(df['Date'].min())
# Returns:
# 2021-12-14 05:00:00
# 2021-11-24 05:00:00
You can go even further an subtracted these two values. This returns a TimeDelta
object, which provides a representation of the differences in DateTimes.
# Subtracting DateTimes in Pandas
print(df['Date'].max() - df['Date'].min())
# Returns: 20 days 00:00:00
This let’s you see that there is a range of 20 days in our dataset!
Filtering a Pandas DataFrame Based on DateTimes
In this section, you’ll learn how to use Pandas DateTimes to filter a DataFrame. This process is incredibly intuitive and very powerful. In order to take most advantage of this, it’s best to set the Date
column to the index of the DataFrame. You can do this using the df.set_index()
method, which takes a column (or columns) to be set as the new index (or indices).
# Setting a Pandas DataFrame Index
df = df.set_index('Date')
print(df.head())
# Returns:
# Close Price High Price Low Price Open Price Volume
# Date
# 2021-11-24 05:00:00 56596.36 56790.76 56483.12 56560.01 1112.23090
# 2021-11-24 06:00:00 56607.00 56797.02 56214.85 56596.36 1505.32570
# 2021-11-24 07:00:00 56497.47 56702.47 56389.00 56607.00 1238.54469
# 2021-11-24 08:00:00 56849.02 57560.00 56389.00 56497.46 2582.79378
# 2021-11-24 09:00:00 56682.55 56996.47 56649.93 56849.01 1314.82999
# 2021-12-10 07:00:00 47694.62 48037.48 47550.00 47847.59 2299.85298
# 2021-12-10 08:00:00 48090.35 48169.06 47587.39 47694.62 1371.25447
# 2021-12-10 09:00:00 48366.37 48495.00 47869.80 48090.35 1232.30238
While this doesn’t look much different than it did before, it now allows us to easily filter our data. Remember, Pandsa indexing works in the format of [row, column]
. Because of this, we can simply pass in a DateTime that we want to select. What’s more, is that we can actually just pass in a date part in order to filter the DataFrame. Let’s try selecting '2021-12-10'
.
# Filterting Based on Only a Date
print(df.loc['2021-12-10'].head())
# Returns:
# Close Price High Price Low Price Open Price Volume
# Date
# 2021-12-10 05:00:00 48246.57 48359.35 48051.08 48170.66 827.39761
# 2021-12-10 06:00:00 47847.59 48430.00 47810.81 48249.78 1296.18883
# 2021-12-10 07:00:00 47694.62 48037.48 47550.00 47847.59 2299.85298
# 2021-12-10 08:00:00 48090.35 48169.06 47587.39 47694.62 1371.25447
# 2021-12-10 09:00:00 48366.37 48495.00 47869.80 48090.35 1232.30238
We can reduce this even further! For example, you could simply pass in a year (in the format 'yyyy'
) or year-month parts ('yyyy-mm'
).
Let’s say you only wanted to filter your DataFrame to only show data from December 2020. Similarly, say you only wanted to see the Close Price for that month. You could use the .loc
accessor to filter the DataFrame:
# Filtering Only a Date Part and Column
print(df.loc['2021-12', 'Close Price'].head())
# Returns:
# Date
# 2021-12-01 00:00:00 57577.07
# 2021-12-01 01:00:00 56994.58
# 2021-12-01 02:00:00 57261.52
# 2021-12-01 03:00:00 57362.01
# 2021-12-01 04:00:00 57054.36
Because you’re selecting an index (rather than filtering data), you can even include index ranges. This works by including data in your index selection ranges and separating them using a colon (:
). Let’s see how you can select only the dates covering '2021-12-03'
through '2021-12-06'
:
# Filtering on a Range of Dates
print(df.loc['2021-12-03':'2021-12-06'])
# Returns:
# Close Price High Price Low Price Open Price Volume
# Date
# 2021-12-03 00:00:00 56513.44 56772.24 56419.09 56484.26 847.92592
# 2021-12-03 01:00:00 56494.53 56727.18 56354.68 56513.44 1051.81425
# 2021-12-03 02:00:00 56257.75 56576.52 56050.81 56494.53 1394.46500
# 2021-12-03 03:00:00 56323.01 56528.81 56089.00 56257.74 1113.47353
# 2021-12-03 04:00:00 56587.40 56700.00 56229.98 56318.89 1318.86346
In the following section, you’ll learn how to take on a more advanced topic: resamping your data.
Resampling Pandas DataFrames using DateTimes
The process of resampling refers to changing the frequency of your data. You have two main methods available when you want to resample your timeseries data:
- Upsampling: increasing the frequency of your data, such as from hours to minutes
- Downsampling: decreasing the frequency of your data, such as from hours to days
Both methods require you to invent data, since the data points don’t actually exist. In many cases, the method of how you invent that data is determined logically. For example, when downsampling average values, it may make sense to return the average of all the periods. If you wanted to return the downsampled high values, you may provide the maximum value.
The data in our dataset is likely already upsampled. The frequency of the data is hourly and is likely tracked at the source much, much more frequency. That being said, the data is still quite granular and shows a lot of variations in the hourly ebb and flow.
The Pandas .resample()
method allows you to resample a dataset with a timeseries index. The method accepts a periodicity that you want to resample to, such as 'W'
for week or 'H'
for hour. Since you’ll want to provide some method by which to invent your data, you can chain in another method, such as .mean()
, to resample with that aggregation function. Let’s resample our hourly data to daily data:
# Resampling an Entire DataFrame with the Same Method
df = df.resample('D').mean()
print(df.head())
# Returns:
# Close Price High Price Low Price Open Price Volume
# Date
# 2021-11-24 56694.938947 56992.628947 56471.820526 56664.502632 1690.892599
# 2021-11-25 58265.386250 58496.355000 58005.205417 58189.446250 1756.396467
# 2021-11-26 55535.068333 55949.997083 55287.681667 55753.348333 2746.994611
# 2021-11-27 54612.505833 54867.189583 54382.247500 54570.950417 1238.208315
# 2021-11-28 54639.055417 54863.042500 54279.283750 54532.270000 1506.821404
Here we can see that the hourly data was downsampled to daily data.
Resampling Pandas Timeseries with Different Methods
In many cases, you won’t want to resample your DataFrame using the same method for every column. For example, you may want to resample the High Price column with the .max()
method and the Low Price column with the .min()
method.
This can be done by chaining the .agg()
method on the .resample()
method. The .agg()
method allows you to pass in a dictionary containing key-value pairs of the column and the method you want to aggregate with. Let’s see how we can pass in different methods for different columns:
# Resampling Data with Different Methods
df = df.resample('D').agg({
'Close Price': 'last',
'High Price': 'max',
'Low Price': 'min',
'Open Price': 'first',
'Volume': 'sum'
})
print(df.head())
# Returns:
# Close Price High Price Low Price Open Price Volume
# Date
# 2021-11-24 57138.29 57560.00 55837.00 56560.01 32126.95939
# 2021-11-25 58960.36 59398.90 57000.00 57138.29 42153.51522
# 2021-11-26 53726.53 59150.00 53500.00 58960.37 65927.87066
# 2021-11-27 54721.03 55280.00 53610.00 53723.72 29716.99957
# 2021-11-28 57274.88 57445.05 53256.64 54716.47 36163.71370
You can see by passing different methods for each column how different our dataset turned out to be compared to the one where only the .mean()
method was applied! This allows you to gain much more finely-tuned control over your data!
Exercises
It’s time to check your understanding! Try and complete the exercises below. If you need help or want to verify your solution, simply toggle the section below.
Create a format string for the following date format: “Saturday, March 31, 2022 13:34:33”
The following format string matches the criteria above:
date_format = '%%A, %B %d, %Y %H:%M:%S'
How would you find the value for the Closing price on December 23, 2021 at 13:00?
You could use the .loc
accessor:
df = df.set_index('Date')
print(df.loc['2021-12-01 13:00:00', 'Close Price'])
# Returns: 57646.5
How would you find the average High Price on December 14, 2021?
You could filter the DataFrame to that date and apply the .mean()
method:
df = df.set_index('Date')
print(df.loc['2021-12-14', 'High Price'].mean())
# Returns: 47139.53166666667
Conclusion and Recap
In this tutorial, you learned how to work with DateTimes in Pandas with Python! The section below provides a quick recap of everything that you learned:
- There are a number of ways to parse dates and times when loading your DataFrame. If passing the columns into the
parse_dates=
parameter doesn’t work, define a parser function and pass the function intodate_parser=
parameter. - DateTime values in Pandas have attributes and methods that can be accessed using the
.dt
accessor - DateTime values can be resampled, either up or down, to provide either higher or lower granularity in your datasets.
Additional Resources
To learn more about related topics, check out the tutorials below:
You never mention where the “dt” function comes from. You show other imports, but not that. So you example aren’t “reproducible.”
Hi Harlan, Thanks for your comment! the .dt is a datetime accessor that comes with Pandas. As long as you have Pandas imported (as in the tutorial), you don’t need to import anything else. Hope that helps!
Hello, in the ‘Using date_parser to Import Complex DateTime’ example you refer to a local file ‘/Users/nikpi/Desktop/time_data.csv’. Is this intentional?
Hi Dup, great catch! I have fixed the link. Thank you!
Hi Nik!!
Maybe the CVS file was updated? Most of the code here doesn’t return the same results shown in the article. It throws the FutureWarning message several times saying that the date_parser parameter in the pd.read_csv() function is deprecated and will be removed in future versions of pandas. Instead of date_parser, the warning suggests using the date_format parameter to specify the format of the dates being parsed.
I absolutely love your content and this course, it has been amazing. You are super talented. I just think maybe there were some things to update haha. Keep it up!! Last article was awesome
Hi Chis,
Thanks so much for flagging this! I’ll be sure to update the article soon. I really appreciate the feedback :).
Nik