Skip to content

DateTime in Pandas and Python

DateTime in Pandas and Python Cover Image

In this tutorial, you’ll learn how to work with dates, times, and DateTime in Pandas and Python. Working with DateTime in Python and Pandas can be a complicated thing. This guide aims to make the complicated, simple, by focusing on what you need to know to get started and to know enough to discover more on your own. Dates and times are critical forms of data in many domains, including finance, economics, science, and more.

By the end of this tutorial, you’ll have learned how to:

  • Load DateTimes effectively in Pandas
  • Access DateTime attributes in Pandas
  • Filter a Pandas DataFrame based on DateTime filters
  • Resample Pandas DataFrames based on DateTimes
  • Use DateTime in Pandas groupby

Importing DateTimes in Pandas DataFrames

Pandas intelligently handles DateTime values when you import a dataset into a DataFrame. The library will try to infer the data types of your columns when you first import a dataset. For example, let’s take a look at a very basic dataset that looks like this:

# A very simple .csv file
Date,Amount
01-Jan-22,100
02-Jan-22,125
03-Jan-22,150

You can find the file here. Let’s try to import the dataset into a Pandas DataFrame and check the column data types.

# Loading a Small DataSet
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/sample_dates.csv')
print(df)

# Returns
#         Date  Amount
# 0  01-Jan-22     100
# 1  02-Jan-22     125
# 2  03-Jan-22     150

This is great! It looks like everything worked fine. Not so fast – let’s check the data types of the columns in the dataset. We can do this using the .info() method.

# Checking column data types
print(df.info())

# Returns
# Data columns (total 2 columns):
#  #   Column  Non-Null Count  Dtype 
# ---  ------  --------------  ----- 
#  0   Date    3 non-null      object
#  1   Amount  3 non-null      int64 
# dtypes: int64(1), object(1)
# memory usage: 176.0+ bytes

We can see that the data type of the Date column is object. This means that the data are stored as strings, meaning that you can’t access the slew of DateTime functionality available in Pandas.

Using Pandas parse_dates to Import DateTimes

One easy way to import data as DateTime is to use the parse_dates= argument. The argument takes a list of columns that Pandas should attempt to infer to read. Let’s try adding this parameter to our import statement and then re-print out the info about our DataFrame:

# Parsing Dates in .read_csv()
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/sample_dates.csv', parse_dates=['Date'])
print(df.info())

# Returns
# Data columns (total 2 columns):
#  #   Column  Non-Null Count  Dtype         
# ---  ------  --------------  -----         
#  0   Date    3 non-null      datetime64[ns]
#  1   Amount  3 non-null      int64         
# dtypes: datetime64[ns](1), int64(1)
# memory usage: 176.0 bytes

We can see that our column is now correctly imported as a DateTime format.

Using to_datetime to Convert Columns to DateTime

The example above worked quite well when we imported a straightforward date format. Now let’s take a look at a more complicated example. We’ll load data from here, that looks like this:

# More complex datetime formats
Date,Close Price,High Price,Low Price,Open Price,Volume
Date,Close Price,High Price,Low Price,Open Price,Volume
2021-12-10 05AM,48246.57,48359.35,48051.08,48170.66,827.39761
2021-12-10 06AM,47847.59,48430,47810.81,48249.78,1296.18883
2021-12-10 07AM,47694.62,48037.48,47550,47847.59,2299.85298
2021-12-10 08AM,48090.35,48169.06,47587.39,47694.62,1371.25447

When we pass in the Date column as we did earlier, Pandas can’t interpret the date format. Let’s see what this looks like. The code below shows that the date wasn’t actually read as a DateTime format, but rather continues to exist as a string.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/data/main/time_data.csv', parse_dates=['Date'])
print(df.info())

# Returns:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 337 entries, 0 to 336
# Data columns (total 6 columns):
#  #   Column       Non-Null Count  Dtype  
# ---  ------       --------------  -----  
#  0   Date         337 non-null    object 
#  1   Close Price  337 non-null    float64
#  2   High Price   337 non-null    float64
#  3   Low Price    337 non-null    float64
#  4   Open Price   337 non-null    float64
#  5   Volume       337 non-null    float64
# dtypes: float64(5), object(1)
# memory usage: 15.9+ KB

One of the ways we can resolve this is by using the pd.to_datetime() function. The function takes a Series of data and converts it into a DateTime format. We can customize this tremendously by passing in a format specification of how the dates are structured.

The format= parameter can be used to pass in this format. The format codes follow the 1989 C standard. Of course, chances are you don’t actually know the C standard for dates off by hard. The full list can be found here, but the table below breaks down a few of the most important ones.

Format CodeDescriptionExamples
%mMonth as a zero-padded number01, 02, 03, …
%BMonth as a full name (following locale)December, January, March
%yYear without a century as a zero-padded number01, 02, 22
%YYear with the century included2001, 2002, 2022
%dDay of the month as a zero-padded number01, 02, 03
%aWeekday as the locale’s abbreviated nameMon, Tue, Wed
%AWeekday as the locale’s full nameMonday, Tuesday, Wednesday
%HHour on the 24-hour clock as a zero-padded decimal12, 13, 14
%IHour on the 12-hour clock as a zero-padded decimal12, 1, 2
%pLocale’s equivalent of AM or PMAM, PM
Some key DateTime format codes in Python

Let’s see how we can make use of these format codes to convert our string into a properly formatted DateTime object. In order to do this, we pass in the string using the percent signs and any other formatting exactly as it is, including spaces and hyphens.

# Converting a Complex String to DateTime
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H%p')
print(df.head())

# Returns:
#                  Date  Close Price  High Price  Low Price  Open Price      Volume
# 0 2021-11-24 05:00:00     56596.36    56790.76   56483.12    56560.01  1112.23090
# 1 2021-11-24 06:00:00     56607.00    56797.02   56214.85    56596.36  1505.32570
# 2 2021-11-24 07:00:00     56497.47    56702.47   56389.00    56607.00  1238.54469
# 3 2021-11-24 08:00:00     56849.02    57560.00   56389.00    56497.46  2582.79378
# 4 2021-11-24 09:00:00     56682.55    56996.47   56649.93    56849.01  1314.82999

What we did here was pass the Series into the .to_datetime() function as well as the format. The format matches the complex pattern and successfully transformed the string into a DateTime object.

It’s not always ideal to convert the column after loading your DataFrame. Because of this, in the next section you’ll learn how to pass in a formatter to the import statement.

Using date_parser to Import Complex DateTime

While you can always convert a column into a DateTime object after loading the DataFrame, it can be a lot cleaner to do this as you’re loading the DataFrame in the first place. This is where the date_parser= parameter comes into play. The parameter takes a function that instructs Pandas how to interpret the string as a DateTime object.

Since it’s a function you won’t use elsewhere, this is an ideal candidate for an anonymous lambda function. Let’s create the function and assign it to the variable parser. We can then pass this function into the .read_csv() function. The function itself will make use of the .strptime() method which converts a string into a DateTime object.

# Creating a function to parse dates
import pandas as pd
from datetime import datetime
parser = lambda x: datetime.strptime(x, '%Y-%m-%d %H%p')
df = pd.read_csv('https://e6v4p8w2.rocketcdn.me/Users/nikpi/Desktop/time_data.csv', parse_dates=['Date'], date_parser=parser)

print(df.head())

# Returns:
#                  Date  Close Price  High Price  Low Price  Open Price      Volume
# 0 2021-11-24 05:00:00     56596.36    56790.76   56483.12    56560.01  1112.23090
# 1 2021-11-24 06:00:00     56607.00    56797.02   56214.85    56596.36  1505.32570
# 2 2021-11-24 07:00:00     56497.47    56702.47   56389.00    56607.00  1238.54469
# 3 2021-11-24 08:00:00     56849.02    57560.00   56389.00    56497.46  2582.79378
# 4 2021-11-24 09:00:00     56682.55    56996.47   56649.93    56849.01  1314.82999

The function relies on importing the datetime module from the datetime library. The function takes a single argument, x and uses the format to convert a string into a DateTime object.

DateTime Attributes and Methods in Pandas

Now that you’ve successfully imported your Pandas DataFrame with properly formatted dates, let’s learn how you can make use of the special attributes that come along with them. For example, you can easily access information about the date time, such as the weekday, month name, and more. This is because the DateTime object contains significantly more information than the representation shows.

DateTime Attributes in Pandas

Let’s take a look at a few. Namely, you’ll learn how to create columns containing the weekday, quarter, and hour of the day:

# Parsing out DateTime Parts
df['Weekday'] = df['Date'].dt.dayofweek
df['Quarter'] = df['Date'].dt.quarter
df['Hour'] = df['Date'].dt.hour

print(df[['Date', 'Weekday', 'Quarter', 'Hour']].head())

# Returns:
#                  Date  Weekday  Quarter  Hour
# 0 2021-11-24 05:00:00        2        4     5
# 1 2021-11-24 06:00:00        2        4     6
# 2 2021-11-24 07:00:00        2        4     7
# 3 2021-11-24 08:00:00        2        4     8
# 4 2021-11-24 09:00:00        2        4     9

There’s a whole slew of data hiding underneath a DateTime object! This allows us to create complex filter on the DataFrame. These attributes can be accessed using the .dt accessor, which is very similar to the .str accessor. You then gain access to vectorized versions of accessing DateTime values.

DateTime Methods in Pandas

Similarly, you can apply DateTime methods to your DataTime columns. These look similar to the attributes, but include the () method call features. The reason these are different is that they represent some form of a calculation on the data.

In the example above, you used the .dayofweek accessor to get a numeric representation of the weekday. However, it may be good to access, for example, the name of the weekday. You can do this by using the .day_name() method, which returns the string representation of the weekday.

# Accessing the String Name of a Week Day
print(df['Date'].dt.day_name())

# Returns:
# 0      Wednesday
# 1      Wednesday
# 2      Wednesday
# 3      Wednesday
# 4      Wednesday
# ...

Similarly, you can access different calculated attributes. For example, you can calculate the largest and smallest dates using the .max() and .min() methods. Let’s see what this looks like:

# Calculating Max and Min DateTimes
print(df['Date'].max())
print(df['Date'].min())

# Returns:
# 2021-12-14 05:00:00
# 2021-11-24 05:00:00

You can go even further an subtracted these two values. This returns a TimeDelta object, which provides a representation of the differences in DateTimes.

# Subtracting DateTimes in Pandas
print(df['Date'].max() - df['Date'].min())

# Returns: 20 days 00:00:00

This let’s you see that there is a range of 20 days in our dataset!

Filtering a Pandas DataFrame Based on DateTimes

In this section, you’ll learn how to use Pandas DateTimes to filter a DataFrame. This process is incredibly intuitive and very powerful. In order to take most advantage of this, it’s best to set the Date column to the index of the DataFrame. You can do this using the df.set_index() method, which takes a column (or columns) to be set as the new index (or indices).

# Setting a Pandas DataFrame Index
df = df.set_index('Date')
print(df.head())

# Returns:
#                      Close Price  High Price  Low Price  Open Price      Volume
# Date                                                                           
# 2021-11-24 05:00:00     56596.36    56790.76   56483.12    56560.01  1112.23090
# 2021-11-24 06:00:00     56607.00    56797.02   56214.85    56596.36  1505.32570
# 2021-11-24 07:00:00     56497.47    56702.47   56389.00    56607.00  1238.54469
# 2021-11-24 08:00:00     56849.02    57560.00   56389.00    56497.46  2582.79378
# 2021-11-24 09:00:00     56682.55    56996.47   56649.93    56849.01  1314.82999
# 2021-12-10 07:00:00     47694.62    48037.48   47550.00    47847.59  2299.85298
# 2021-12-10 08:00:00     48090.35    48169.06   47587.39    47694.62  1371.25447
# 2021-12-10 09:00:00     48366.37    48495.00   47869.80    48090.35  1232.30238

While this doesn’t look much different than it did before, it now allows us to easily filter our data. Remember, Pandsa indexing works in the format of [row, column]. Because of this, we can simply pass in a DateTime that we want to select. What’s more, is that we can actually just pass in a date part in order to filter the DataFrame. Let’s try selecting '2021-12-10'.

# Filterting Based on Only a Date
print(df.loc['2021-12-10'].head())

# Returns:
#                      Close Price  High Price  Low Price  Open Price      Volume
# Date                                                                           
# 2021-12-10 05:00:00     48246.57    48359.35   48051.08    48170.66   827.39761
# 2021-12-10 06:00:00     47847.59    48430.00   47810.81    48249.78  1296.18883
# 2021-12-10 07:00:00     47694.62    48037.48   47550.00    47847.59  2299.85298
# 2021-12-10 08:00:00     48090.35    48169.06   47587.39    47694.62  1371.25447
# 2021-12-10 09:00:00     48366.37    48495.00   47869.80    48090.35  1232.30238

We can reduce this even further! For example, you could simply pass in a year (in the format 'yyyy') or year-month parts ('yyyy-mm').

Let’s say you only wanted to filter your DataFrame to only show data from December 2020. Similarly, say you only wanted to see the Close Price for that month. You could use the .loc accessor to filter the DataFrame:

# Filtering Only a Date Part and Column
print(df.loc['2021-12', 'Close Price'].head())

# Returns:
# Date
# 2021-12-01 00:00:00    57577.07
# 2021-12-01 01:00:00    56994.58
# 2021-12-01 02:00:00    57261.52
# 2021-12-01 03:00:00    57362.01
# 2021-12-01 04:00:00    57054.36

Because you’re selecting an index (rather than filtering data), you can even include index ranges. This works by including data in your index selection ranges and separating them using a colon (:). Let’s see how you can select only the dates covering '2021-12-03' through '2021-12-06':

# Filtering on a Range of Dates
print(df.loc['2021-12-03':'2021-12-06'])

# Returns:
#                      Close Price  High Price  Low Price  Open Price      Volume
# Date                                                                           
# 2021-12-03 00:00:00     56513.44    56772.24   56419.09    56484.26   847.92592
# 2021-12-03 01:00:00     56494.53    56727.18   56354.68    56513.44  1051.81425
# 2021-12-03 02:00:00     56257.75    56576.52   56050.81    56494.53  1394.46500
# 2021-12-03 03:00:00     56323.01    56528.81   56089.00    56257.74  1113.47353
# 2021-12-03 04:00:00     56587.40    56700.00   56229.98    56318.89  1318.86346

In the following section, you’ll learn how to take on a more advanced topic: resamping your data.

Resampling Pandas DataFrames using DateTimes

The process of resampling refers to changing the frequency of your data. You have two main methods available when you want to resample your timeseries data:

  1. Upsampling: increasing the frequency of your data, such as from hours to minutes
  2. Downsampling: decreasing the frequency of your data, such as from hours to days

Both methods require you to invent data, since the data points don’t actually exist. In many cases, the method of how you invent that data is determined logically. For example, when downsampling average values, it may make sense to return the average of all the periods. If you wanted to return the downsampled high values, you may provide the maximum value.

The data in our dataset is likely already upsampled. The frequency of the data is hourly and is likely tracked at the source much, much more frequency. That being said, the data is still quite granular and shows a lot of variations in the hourly ebb and flow.

The Pandas .resample() method allows you to resample a dataset with a timeseries index. The method accepts a periodicity that you want to resample to, such as 'W' for week or 'H' for hour. Since you’ll want to provide some method by which to invent your data, you can chain in another method, such as .mean(), to resample with that aggregation function. Let’s resample our hourly data to daily data:

# Resampling an Entire DataFrame with the Same Method
df = df.resample('D').mean()
print(df.head())

# Returns:
#              Close Price    High Price     Low Price    Open Price       Volume
# Date                                                                           
# 2021-11-24  56694.938947  56992.628947  56471.820526  56664.502632  1690.892599
# 2021-11-25  58265.386250  58496.355000  58005.205417  58189.446250  1756.396467
# 2021-11-26  55535.068333  55949.997083  55287.681667  55753.348333  2746.994611
# 2021-11-27  54612.505833  54867.189583  54382.247500  54570.950417  1238.208315
# 2021-11-28  54639.055417  54863.042500  54279.283750  54532.270000  1506.821404

Here we can see that the hourly data was downsampled to daily data.

Resampling Pandas Timeseries with Different Methods

In many cases, you won’t want to resample your DataFrame using the same method for every column. For example, you may want to resample the High Price column with the .max() method and the Low Price column with the .min() method.

This can be done by chaining the .agg() method on the .resample() method. The .agg() method allows you to pass in a dictionary containing key-value pairs of the column and the method you want to aggregate with. Let’s see how we can pass in different methods for different columns:

# Resampling Data with Different Methods
df = df.resample('D').agg({
    'Close Price': 'last',
    'High Price': 'max',
    'Low Price': 'min',
    'Open Price': 'first',
    'Volume': 'sum'
})

print(df.head())

# Returns:
#             Close Price  High Price  Low Price  Open Price       Volume
# Date                                                                   
# 2021-11-24     57138.29    57560.00   55837.00    56560.01  32126.95939
# 2021-11-25     58960.36    59398.90   57000.00    57138.29  42153.51522
# 2021-11-26     53726.53    59150.00   53500.00    58960.37  65927.87066
# 2021-11-27     54721.03    55280.00   53610.00    53723.72  29716.99957
# 2021-11-28     57274.88    57445.05   53256.64    54716.47  36163.71370

You can see by passing different methods for each column how different our dataset turned out to be compared to the one where only the .mean() method was applied! This allows you to gain much more finely-tuned control over your data!

Exercises

It’s time to check your understanding! Try and complete the exercises below. If you need help or want to verify your solution, simply toggle the section below.

The following format string matches the criteria above:

date_format = '%%A, %B %d, %Y %H:%M:%S'

You could use the .loc accessor:

df = df.set_index('Date')
print(df.loc['2021-12-01 13:00:00', 'Close Price'])

# Returns: 57646.5

You could filter the DataFrame to that date and apply the .mean() method:

df = df.set_index('Date')
print(df.loc['2021-12-14', 'High Price'].mean())

# Returns: 47139.53166666667

Conclusion and Recap

In this tutorial, you learned how to work with DateTimes in Pandas with Python! The section below provides a quick recap of everything that you learned:

  • There are a number of ways to parse dates and times when loading your DataFrame. If passing the columns into the parse_dates= parameter doesn’t work, define a parser function and pass the function into date_parser= parameter.
  • DateTime values in Pandas have attributes and methods that can be accessed using the .dt accessor
  • DateTime values can be resampled, either up or down, to provide either higher or lower granularity in your datasets.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published.