In this tutorial, you’ll learn how to use Python to get a file extension. You’ll accomplish this using both the
pathlib library and the
Being able to work with files in Python in an easy manner is one of the languages greatest strength. You could, for example use the
glob library to iterate over files in a folder. When you do this, knowing what the file extension of each file may drive further decisions. Because of this, knowing how to get a file’s extension is an import skill! Let’s get started learning how to use Python to get a file’s extension, in Windows, Mac, and Linux!
The Quick Answer: Use Pathlib
Using Python Pathlib to Get a File’s Extension
The Python pathlib library makes it incredibly easy to work with and manipulate paths. Because of this, it makes perfect sense that the library would have the way of accessing a file’s extension.
pathlib library comes with a class named
Path, which we use to create path-based objects. When we load our file’s path into a Path object, we can access specific attributes about the object by using its built-in properties.
Let’s see how we can use the
pathlib library in Python to get a file’s extension:
# Get a file's extension using pathlib import pathlib file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx" extension = pathlib.Path(file_path).suffix print(extension) # Returns: .xlsx
We can see here that we passed a file’s path into the
Path class, creating a Path object. After we did this, we can access different attributes, including the
.suffix attribute. When we assigned this to a variable named
extension, we printed it, getting
This method works well for both Mac and Linux computers. When you’re working with Windows, however, the file paths operate a little differently.
Because of this, when using Windows, create your file path as a “raw” string. But how do you do this? Simply prefix your string with a
r, like this
r'some string'. This will let Python know to not use the backslashes as escape characters.
Now that we’ve taken a look at how to use
pathlib in Python to get a file extension, let’s explore how we can do the same using the
Want to learn more? Want to learn how to use the
pathlib library to automatically rename files in Python? Check out my in-depth tutorial and video on Towards Data Science!
Using os.path in Python to Get a File’s Extension
os.path module allows us to easily work with, well, our operating system! The
path module let’s us use file paths in different ways, including allowing us to get a file’s extension.
os.path module has a helpful function,
splitext(), which allows us to split file-paths into their individual components. Thankfully,
splitext() is a smart function that knows how to separate out file extensions, rather than simply splitting a string.
Let’s take a look at how we can use the
splitext() function to get a file’s extension:
# Get a file's extension using os.path import os.path file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx" extension = os.path.splitext(file_path)[-1] print(extension) # Returns: .xlsx
Let’s take a look at what we’ve done here:
- We import
os.path. Rather than writing
from os import path, we use this form of import so that we can leave the variable
pathopen and clear.
- We load our
file_pathvariable. Remember: if you’re Windows, make your file path a raw string, by pre-fixing an
rbefore the opening quotation mark.
- Apply the
splitext()function to the file path. We then access the item’s last item.
splitext() returns a tuple: the first part will be the filename, and the second will be its extension. Because of this, if we only want a file’s extension, we can just access the tuples last item.
How to Use a Python File Extension
Now that you’ve learned two different ways to use Python to get a file’s extension, how can you apply this?
One handy method is to act on, say, only Excel files. If you’re writing a for-loop, you could first check to see if a file is an Excel file and then load it into a Pandas dataframe. This approach would let you skip the files that may not actually contain any data.
Let’s see how to do this in Python and Pandas:
# Get a file's extension using os.path import pathlib import pandas as pd file_paths = ["/Users/datagy/Desktop/Important Spreadsheet.xlsx", "/Users/datagy/Desktop/A Random Document.docx"] df = pd.DataFrame() for file in file_paths: if pathlib.Path(file).suffix in ('.xls', '.xlsx'): temp_df = pd.read_excel(file) df = df.append(temp_df)
In this post, you learned how to use Python to get a file’s extension. You learned how to do this using both the
pathlib library as well as the
os.path module, using the
splitext() function. You learned how to do this in Windows, Mac and Linux, in order to ensure that your code can run across systems.
To learn more about the
splitext() function, check out the official documentation here.