In this tutorial, you’ll learn how to use Python to get a file extension. You’ll accomplish this using both the pathlib
library and the os.path
module.
Being able to work with files in Python in an easy manner is one of the languages greatest strength. You could, for example use the glob
library to iterate over files in a folder. When you do this, knowing what the file extension of each file may drive further decisions. Because of this, knowing how to get a file’s extension is an import skill! Let’s get started learning how to use Python to get a file’s extension, in Windows, Mac, and Linux!
The Quick Answer: Use Pathlib
Table of Contents
Using Python Pathlib to Get a File’s Extension
The Python pathlib library makes it incredibly easy to work with and manipulate paths. Because of this, it makes perfect sense that the library would have the way of accessing a file’s extension.
The pathlib
library comes with a class named Path
, which we use to create path-based objects. When we load our file’s path into a Path object, we can access specific attributes about the object by using its built-in properties.
Let’s see how we can use the pathlib
library in Python to get a file’s extension:
# Get a file's extension using pathlib
import pathlib
file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx"
extension = pathlib.Path(file_path).suffix
print(extension)
# Returns: .xlsx
We can see here that we passed a file’s path into the Path
class, creating a Path object. After we did this, we can access different attributes, including the .suffix
attribute. When we assigned this to a variable named extension
, we printed it, getting .xlsx
back.
This method works well for both Mac and Linux computers. When you’re working with Windows, however, the file paths operate a little differently.
Because of this, when using Windows, create your file path as a “raw” string. But how do you do this? Simply prefix your string with a r
, like this r'some string'
. This will let Python know to not use the backslashes as escape characters.
Now that we’ve taken a look at how to use pathlib
in Python to get a file extension, let’s explore how we can do the same using the os.path
module.
Want to learn more? Want to learn how to use the pathlib
library to automatically rename files in Python? Check out my in-depth tutorial and video on Towards Data Science!
Using os.path in Python to Get a File’s Extension
The os.path
module allows us to easily work with, well, our operating system! The path
module let’s us use file paths in different ways, including allowing us to get a file’s extension.
The os.path
module has a helpful function, splitext()
, which allows us to split file-paths into their individual components. Thankfully, splitext()
is a smart function that knows how to separate out file extensions, rather than simply splitting a string.
Let’s take a look at how we can use the splitext()
function to get a file’s extension:
# Get a file's extension using os.path
import os.path
file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx"
extension = os.path.splitext(file_path)[-1]
print(extension)
# Returns: .xlsx
Let’s take a look at what we’ve done here:
- We import
os.path
. Rather than writingfrom os import path
, we use this form of import so that we can leave the variablepath
open and clear. - We load our
file_path
variable. Remember: if you’re Windows, make your file path a raw string, by pre-fixing anr
before the opening quotation mark. - Apply the
splitext()
function to the file path. We then access the item’s last item.
The splitext()
returns a tuple: the first part will be the filename, and the second will be its extension. Because of this, if we only want a file’s extension, we can just access the tuples last item.
How to Use a Python File Extension
Now that you’ve learned two different ways to use Python to get a file’s extension, how can you apply this?
One handy method is to act on, say, only Excel files. If you’re writing a for-loop, you could first check to see if a file is an Excel file and then load it into a Pandas dataframe. This approach would let you skip the files that may not actually contain any data.
Let’s see how to do this in Python and Pandas:
# Get a file's extension using os.path
import pathlib
import pandas as pd
file_paths = ["/Users/datagy/Desktop/Important Spreadsheet.xlsx", "/Users/datagy/Desktop/A Random Document.docx"]
df = pd.DataFrame()
for file in file_paths:
if pathlib.Path(file).suffix in ('.xls', '.xlsx'):
temp_df = pd.read_excel(file)
df = df.append(temp_df)
Now that you’ve learned a practical example, check out my other Pandas tutorials here, including how to calculate an average in Pandas and how to add day’s to a Pandas columns.
Conclusion
In this post, you learned how to use Python to get a file’s extension. You learned how to do this using both the pathlib
library as well as the os.path
module, using the splitext()
function. You learned how to do this in Windows, Mac and Linux, in order to ensure that your code can run across systems.
To learn more about the splitext()
function, check out the official documentation here.
Pingback: VLOOKUP in Python and Pandas using .map() or .merge() • datagy
Pingback: Python: Copy a File (4 Different Ways) • datagy
Pingback: Python: Int to Binary (Convert Integer to Binary String) • datagy
Pingback: Python Ceiling: Rounding Up (and Python Ceiling Division) • datagy