Python: Remove Special Characters from a String

Python Remove Special Characters from String

In this tutorial, you’ll learn how to use Python to remove special characters from a string. Many times, when working with strings, you’ll encounter strings with special characters. These can cause problems when you’re trying to conduct text analysis, such as natural language processing. Because of this, knowing how to use Python to remove special characters from a string is an important skill.

Let’s get started!

The Quick Answer: Use re sub

Quick Answer - Python Remove Special Characters from String

Remove Special Characters Including Strings Using Python isalnum

Python has a special string method, .isalnum(), which returns True if the string is an alpha-numeric character, and returns False if it is not. We can use this, to loop over a string and append, to a new string, only alpha-numeric characters.

Let’s see what this example looks like:

# Remove Special Characters from a String Using .isalnum()

text = 'datagy -- is. great!'
new_text = ''

for character in text:
    if character.isalnum():
        new_text += character

print(new_text)

# Returns: datagyisgreat

Let’s take a look at what we’ve done here:

  1. We instantiate two strings: one that contains our old string and an empty string
  2. We loop over each character in our string and evaluate if it is alphanumeric, using the .isalnum() method
  3. If it is, we add the character to our string. If it’s not, we do nothing.

In the next example, you’ll learn how to get a bit more flexibility (such as keeping spaces), by using the Python regular expressions library, re.

Want to learn how to use the Python zip() function to iterate over two lists? This tutorial teaches you exactly what the zip() function does and shows you some creative ways to use the function.

Remove Special Characters Using Python Regular Expressions

The Python regular expressions library, re, comes with a number of helpful methods to manipulate strings. One of these methods is the .sub() method which allows us to substitute strings with another string.

One of the perks of the re library is that we don’t need to specify exactly what character we want to replace. Because of this, we can set ranges of characters to replace (or keep).

For example, to keep all alphanumeric characters and spaces, we simply tell the .sub() method to replace anything except for [^a-zA-Z0-9 ].

Let’s see what this looks like in Python:

# Remove Special Characters from a String Using re.sub()
import re

text = 'datagy -- is. great!'
new_text = re.sub(r"[^a-zA-Z0-9 ]", "", text)

print(new_text)

# Returns: datagy  is great

Let’s explore what we’ve done here:

  1. We loaded our string into a variable
  2. We used the re.sub() method to make our replacement. Here, the function takes three arguments: (1) the pattern we want to replace (we used the ^ to denote that we want to replace anything except the following), (2) what we want to replace the characters with, and (3) the string we want to make the replacement in.

In the next section, you’ll learn how to use the filter() function to remove special characters from a Python string.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Remove Special Characters from Strings Using Filter

Similar to using a for loop, we can also use the filter() function to use Python to remove special characters from a string.

The filter() function accepts two parameters:

  1. A function to evaluate against,
  2. An iterable to filter

Since strings are iterable, we can pass in a function that removes special characters. Similar to the for loop method, we can use the .isalnum() method to check if it a substring is alphanumeric or not.

Let’s try this out in Python:

# Remove Special Characters from a String Using re.sub()
import re

text = 'datagy -- is. great!'
new_text = ''.join(filter(str.isalnum, text))

print(new_text)

# Returns: datagyisgreat

Let’s explore how and why this works:

  1. We use the filter function to return a filter object that includes on alphanumeric characters
  2. We then using the str.join method to join our characters with blank characters.

If you wanted to include other characters, such as strings, we can define a custom function that we can evaluate against.

Let’s see how this works in Python:

# Remove Special Characters from a String Using filter()
import re

def remove_special_characters(character):
    if character.isalnum() or character == ' ':
        return True
    else:
        return False

text = 'datagy -- is. great!'
new_text = ''.join(filter(remove_special_characters, text))

print(new_text)

# Returns: datagy  is great

Let’s break down why this approach works:

  1. We define a custom function that checks whether a character is alphanumeric or not or if the character is equal to a space, defined by the ' ' character
  2. If it is, then True is returned. Otherwise, the expression evaluates to False

Because of this evaluation of thruthy-ness, we filter out any values that do not evaluate to True.

Want to learn more about Python f-strings? Check out my in-depth tutorial, which includes a step-by-step video to master Python f-strings!

Conclusion

In this post, you learned how to remove special characters from a Python string. You learned how to do this with the .isalphanum() method, the regular expressions library re, and the filter() function. Learning how to do this is an important skill as working with textual data grows more and more important.

To learn more about the Python regular expressions library re, check out the official documentation here.