Following up on our post about some of the most commonly used Excel text functions, today we’re going to focus on cleaning up a block of text that includes paragraph breaks and characters
Here we’re looking at the descriptions of five videos on a made up YouTube channel. Video 1’s description has the proper formatting. All of our descriptions should meet the following criteria:
- Short description
- Links to both our instagram handle and snapchat handle
- Free of any extra spaces, paragraph breaks, and special characters.
Our ultimate goal is to clean all of these cells up using ONLY formulas rather than CTRL+F or doing this process manually.
Note: In this case cleaning up these cells by hand would probably be faster than learning/using the formulas below. However, if you’re dealing with hundreds or even thousands of rows of data, I’m hoping a few of these tricks will open up the possibility for you of using these tips on a much larger scale.
The first thing I noticed in this cell is a space at the beginning of the box. Let’s use TRIM to remove trailing spaces at the beginning and end of a cell.
Although it remove the space at the beginning, we’re still left with the paragraph breaks at the bottom and the meaningless text ‘video431’. Let’s start by replacing the text.
The SUBSTITUTE Function
The Substitute function takes three mandatory arguments. The data you’re working with (B3 in our case), what text you’re looking for, and what text you want to replace.
In this case we’re targeting the test “video431/” and replacing it with nothing. In C3 is our cleaned up cell.
Introducing the CHAR Function
The CHAR function is an amazing function to get familiar with if you’re cleaning up text data at all. Every OS has a character set (hence “CHAR”) used to classify characters in Excel. Here is the link to the Windows List. Capital ‘A’ is character 65. Lower case ‘g’ is 103. And the ‘newline’ character aka ‘carraige return’ is character number 10. Because a ‘carraige return’ has a specific char number, we can use this to our advantage to find and replace these line breaks systematically.
Counting the Number of Paragraph Breaks in the Cell
One of my frequently used tricks when dealing with Text data is to count the number of times something appears in a cell. It allows me to target characters much more easily when I have to remove them.
There’s a lot going on in this formula so let’s break it down
LEN(B3): This finds the total length, or number of characters in B3. In our case, that length is 144.
LEN(SUBSTITUTE(B3,CHAR(10),””)): This is the length of the same string, but with the paragraphs removed. In our case that total is 138.
144-138 = 6, so we know there are a total of 6 paragraph breaks in our cell. Since we know there are two between “with this…” and “Follow us”, that must mean there are four more below the ‘@randominstagramhandle’
It’s kind of a crude drawing, but the tick marks are next to each of the four paragraph breaks we still need to remove.
Introducing the CLEAN function
According the official documentation, the clean function “Removes all nonprintable characters from text”. This includes good ‘ol character 10.
Although we got rid of the four paragraph breaks at the end, we also removed the ones in the middle of our description! That won’t work.
Introducing the REPLACE Function
The Replace function is a cousin of the Substitute Function
It takes four arguments:
- Data to clean (B3)
- Where to start in the data. Here I entered len(b3) – 3 because we want to start at character number 141 out of 144
- How many characters to replace (4)
- What to replace it with (nothing)
Beautiful. That gets us close to the finish line. From there let’s concatenate our snapchat handle on to the block of text.
By adding on one more line break and our snapchat handle, our description is finished!
This one should be a little easier.
It looks like we have some extra characters at the top of the cell and we’re missing the full call to action for our snapchat handle.
By inserting the three asterisks inside a Substitute function, we can get rid of them immediately.
After that, we notice we still have an additional paragraph break at the top of the cell
To remove this we’ll use the optional argument of the substitute function.
Notice at the end of our substitute formula is the number ‘1.’ This fills the optional argument of “instance number”. If you’re replacing char(10), which char(10) do you want to replace? If you enter no arguments here, it will replace all of them. If you provide an argument, it will replace that instance of the character. In our case, it’s the first instance.
Note: Even though it looks like there is still a paragraph break, there is not (row height still affected by cell b4)
Lastly, we have to clean up our snapchat handle.
Since we have the correct layout of our social media CTA, we can substitute out the abbreviated one for the correct one using SUBSTITUTE.
With this description it looks like we have a random parentheses showing up along with incomplete instagram/snapchat callouts.
The first thing we’ll do here is grab the open parentheses and the space all in one formula. Notice that you can include the parentheses and space together and replace both with an empty string.
The final thing we’ll do here is create a “nested” substitute function to clean up the social media call-to-actions.
After cleaning up the instagram handle, we can wrap that entire function in the substitute function to replace the snapchat handle at the same time.
(Note: In that formula, B9 and B10 are the correct mentions of the social media CTA)
In this scenario we have a good description and no social media call to actions. All we have to do hear is concatenate them to our description and we’re done.
By using the ampersand, we can concatenate the social media CTAs and provide proper spacing using good ‘ol char(10), the line break. But there is a slightly easier way to write this. Notice I had to add two char(10)s in a row to get enough space in between the next block.
The REPT Function
The repeat function is a great one to have in your arsenal as well. This takes two arguments:
- What you want to repeat
- How many times.
Using the Rept function here has the same effect. For those of you that would prefer to use the Concatenate function here instead of the ampersand, this is how it would look:
Using concatenate vs. ampersand is mainly a preference thing. Some may like the cleaner look of comma separated values vs. the ampersand.
In either case, we’re done! All of our descriptions are cleaned up and we’re ready to publish these to our YouTube channel.
Functions Mentioned in this blog post: