In this post, you’ll learn different ways to list files in a directory, using both the OS library and the Glob library.
Working with data, you may find yourself in a situation where you need to combine different files or extract data from the latest file.
Table of Contents
Sample Folder Structure
To follow along, let’s use the data structure in the image below. Each of the files contains sales data for a different month, as shown below:
Importing OS and Glob Libraries
Both the OS and Glob libraries are installed by default. To import them, simply write:
import os
import glob
Use os’s listdir Function to Return all Files in a Directory
The os’s listdir function generates a list of all files (and directories) in a folder.
To use this, simply pass the directory as an argument.
To follow along, load the sample files into a single directory. Pass the path to the folder Files into the argument of the listdir function:
files = os.listdir(file_path)
print(files)
# Returns
# ['November.xlsx', 'October.xlsx', 'Other Files']
Use os’s Walk Function to Return All Files in a Directory and all Sub-directories
If you want to list all the files in a directory and all subdirectories, you can use the os walk function.
This function is a bit more confusing, but take a look at the code below:
files_list = []
for root, directories, files in os.walk(file_path):
for name in files:
files_list.append(os.path.join(root, name))
print(files_list)
The walk function returns a set of tuples of three. This for loop traverses those outputs.
Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!
Use Glob to List all Files in a Directory
Glob functions similarly to the os listdir function, but you can search for all files or those matching particular conditions.
A major benefit of the glob library is that it automatically includes the path to the file in each item. This can be especially helpful for data-related work.
For example, to return everything in a directory, use the asterisk (*):
file_list = glob.glob("FILE_PATH/*")
print(file_list)
This would return all files and folders in that directory.
Use Glob to Return all Files of a File Type in a Directory
Similar to the example above, you can also return only files matching a certain condition. For example, if you want to only return Excel files, you could write:
file_list = glob.glob("FILE_PATH/*.xlsx")
This would return only the files that include an xlsx extension.
This can be extremely useful in getting filenames of only certain file types.
Combine Data with Pandas and Glob
Using Pandas and glob, it’s easy to combine multiple Excel files into a single dataframe.
For example, if you wanted to combine the November and October files from the sample files, this could be done easily with Glob and Pandas:
file_list = glob.glob("FILE_PATH/*.xlsx")
files = []
for filename in file_list:
df = pd.read_excel(filename)
files.append(df)
frame = pd.concat(files, axis=0, ignore_index=True)
print(frame)
This returns a fully combine dataframe of all the Excel files in a folder.
Conclusion
In this post, you learned how to list all files in a folder using the os listdir function, how to traverse folders in a directory and get all file names, and how to use glob to get specific file paths. Finally, you learned how to combine Excel files into a single dataframe using glob and Pandas.
Pingback: Python: Delete a File or Directory • datagy