In this tutorial, you’ll learn how to use Python to remove special characters from a string. Many times, when working with strings, you’ll encounter strings with special characters. These can cause problems when you’re trying to conduct text analysis, such as natural language processing. Because of this, knowing how to use Python to remove special characters from a string is an important skill.
Let’s get started!
The Quick Answer: Use re sub
Table of Contents
Remove Special Characters Including Strings Using Python isalnum
Python has a special string method, .isalnum()
, which returns True
if the string is an alpha-numeric character and returns False
if it is not. We can use this, to loop over a string and append, to a new string, only alpha-numeric characters.
Let’s see what this example looks like:
# Remove Special Characters from a String Using .isalnum()
text = 'datagy -- is. great!'
new_text = ''
for character in text:
if character.isalnum():
new_text += character
print(new_text)
# Returns: datagyisgreat
Let’s take a look at what we’ve done here:
- We instantiate two strings: one that contains our old string and an empty string
- We loop over each character in our string and evaluate if it is alphanumeric, using the
.isalnum()
method - If it is, we add the character to our string. If it’s not, we do nothing.
In the next example, you’ll learn how to get a bit more flexibility (such as keeping spaces), by using the Python regular expressions library, re
.
Remove Special Characters Using Python Regular Expressions
The Python regular expressions library, re
, comes with a number of helpful methods to manipulate strings. One of these methods is the .sub()
method that allows us to substitute strings with another string.
One of the perks of the re
library is that we don’t need to specify exactly what character we want to replace. Because of this, we can set ranges of characters to replace (or keep).
For example, to keep all alphanumeric characters and spaces, we simply tell the .sub()
method to replace anything except for [^a-zA-Z0-9 ]
.
Let’s see what this looks like in Python:
# Remove Special Characters from a String Using re.sub()
import re
text = 'datagy -- is. great!'
new_text = re.sub(r"[^a-zA-Z0-9 ]", "", text)
print(new_text)
# Returns: datagy is great
Let’s explore what we’ve done here:
- We loaded our string into a variable
- We used the
re.sub()
method to make our replacement. Here, the function takes three arguments: (1) the pattern we want to replace (we used the^
to denote that we want to replace anything except the following), (2) what we want to replace the characters with, and (3) the string we want to make the replacement in.
In the next section, you’ll learn how to use the filter()
function to remove special characters from a Python string.
Remove Special Characters from Strings Using Filter
Similar to using a for loop, we can also use the filter() function to use Python to remove special characters from a string.
The filter()
function accepts two parameters:
- A function to evaluate against,
- An iterable to filter
Since strings are iterable, we can pass in a function that removes special characters. Similar to the for loop method, we can use the .isalnum()
method to check if it a substring is alphanumeric or not.
Let’s try this out in Python:
# Remove Special Characters from a String Using re.sub()
import re
text = 'datagy -- is. great!'
new_text = ''.join(filter(str.isalnum, text))
print(new_text)
# Returns: datagyisgreat
Let’s explore how and why this works:
- We use the filter function to return a filter object that includes on alphanumeric characters
- We then using the
str.join
method to join our characters with blank characters. With this, we’re converting a Python list to a string.
If you wanted to include other characters, such as strings, we can define a custom function that we can evaluate against.
Let’s see how this works in Python:
# Remove Special Characters from a String Using filter()
import re
def remove_special_characters(character):
if character.isalnum() or character == ' ':
return True
else:
return False
text = 'datagy -- is. great!'
new_text = ''.join(filter(remove_special_characters, text))
print(new_text)
# Returns: datagy is great
Let’s break down why this approach works:
- We define a custom function that checks whether a character is alphanumeric or not or if the character is equal to a space, defined by the
' '
character - If it is, then
True
is returned. Otherwise, the expression evaluates toFalse
Because of this evaluation of thruthy-ness, we filter out any values that do not evaluate to True
.
Conclusion
In this post, you learned how to remove special characters from a Python string. You learned how to do this with the .isalphanum()
method, the regular expressions library re
, and the filter()
function. Learning how to do this is an important skill as working with textual data grows more and more important.
To learn more about the Python regular expressions library re
, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the resources below:
Pingback: Python Increment and Decrement Operators: An Overview • datagy
Pingback: Python Ord and Chr Functions: Working with Unicode • datagy