Using SUBSTITUTE and CHAR to remove a block of unwanted text

The SUBSTITUTE function is incredible for cleaning up text. You can replace a string of text with whatever you’d like, and if the text isn’t found you don’t get an error; It just returns the original string unharmed.

The CHAR (as in “character”) function is a pretty cool weapon as well. You can specify which characters you want to target, even if they’re weird characters, numbers, letters, or paragraph breaks. You’ll see why this is important shortly.

Examples of Random Characters and their character numbers

  • Lowercase “t” = Character Number 116
  • Uppercase “T” = Character Number 84
  • Paragraph Break = Character Number 10

These character numbers allow Excel to target very specific parts of text strings.

Problem:

Remove the second instance of a URL from a block of text

substitute function

Imagine we don’t need to show two instances of the same url in one cell. Let’s erase the second one using substitute

substitute function

Here’s the formula:

=SUBSTITUTE(E2,”DunderMifflin.com”,””,2)

Within E2, we’re substituting the string “DunderMifflin.com” with double quotes – “”. That means we’re essentially replacing the url with nothing. The 2 you see at the end of the formula signifies that we’re replacing the 2nd instance of the url and NOT the first.

While it may look like we’re done, you’ll can see a extra paragraph break that we haven’t replaced with anything yet. Let’s use our friend CHAR() to do that.

substitute function

Looks Good! Let’s analyze the code:

=SUBSTITUTE(E2,CHAR(10)&CHAR(10)&”DunderMifflin.com”,””)

This formula is close, but CHAR(10) is what Excel knows to mean a paragraph break. Using the ampersand sign is a quick way to concatenate two values together. So instead of looking for the url by itself, we’re looking for two paragraph breaks followed by the url, then replacing that entire block of text with nothing. Also note we no longer need the “2” inside of our formula because that block of text does not appear again.

While it may seem much easier to simply do a find and replace in this instance, the substitute function will come in very useful if you have to clean hundreds of rows at once during a deeper analysis. Further, I do not know of a way to delete a paragraph break using CTRL+H, so you’d still be left with huge paragraph breaks at the end of each block of text.

 

Manipulating Text in Excel – LEN, RIGHT, LEFT, & MID

There are many times in Excel when you only need to extrapolate or evaluate PART of a cell. For this, the following functions will come in very handy:

LEN ()

  • Used to find the length of a string/integer within a cell

Example:

The ID of students at ACME Community College all have 9 digits. A data input clerk has been inputting names by hand but the director of enrollment would like an audit performed on the IDs to make sure they are being input properly. How, would you do this?

LEN+formula+excel

Column E contains our formula. LEN(D2) is nine, perfectly in line with our criteria. At the bottom, however, we see one cell with a length of 8. How bad is our problem?

LEN+formula+excel (1)

After a quick filter of our LEN formula, we can see that a total of 12 IDs from our original set have been input incorrectly. From here, we can dive into the specific details as to what was input incorrectly

RIGHT(x)

  • Used to find the x number of characters on the right side of a cell

Example: The director of enrollment was to figure out how many students list a zip code of 90210 for their home address. However, the field for Address includes ALL of the student’s address, include street and house number.

RIGHT+formula+excel

Our RIGHT formula pulls out the zip code from the student address information.

RIGHT+formula+excel (1)

A quick filter reveals that 16 of the students reported a zip code of 90210. Total time to figure out? 1 minute. So far, so good….

LEFT ()

Used to find the x number of characters on the left side of a cell.

Example:

You think you’re out of the weeds, but your director wants you to break down by month the enrollment patterns of the students. ACME C.C. enrolls students year round. Are there a few months out of the year that enroll more students? Let’s find out.

LEFT+excel+formula

We’ve used the LEFT () formula to pull out the month that each student enrolls. Next, let’s create a table that counts how many times each month appears in our data.

Countif+excel

After using a COUNTIF formula along with our LEFT formula across the data set, we can see that January was the most popular month in which to enroll.

MID(x)

  • Used to extract a string the middle of a cell starting at a specific point

Example:

Your director has a list of every major that every student is enrolled in, and she wants you to break down the majors by frequency. There’s only one problem. Because of the way the data was entered, the data entry clerk also included ‘ACME’ (the school) and ‘CA’ (the campus for this data). MID to the rescue.

MID+formula+excel

The beginning of our string has the name of the school, ACME. The end of our string has the campus location (CA). We need to pull out just the major from this string.

Once we’ve picked the cell (J2), for the MID formula we have to specify the starting point of the formula. Since we know that “ACME_” is the beginning of every entry (total of 5 characters), we can start our MID formula at character number 6. The last part of the mid formula is telling the formula how MANY characters to pull. Although we have a consistent beginning to our string, the length of the major name varies from student to student. ‘Philosophy’ is a different number of characters as compared to ‘English’…

MID+excel

Since we know that every entry ends in “_CA”, we have a consistent number of characters at the beginning and end of our string. The beginning has 5 characters. The end has 3 characters. Therefore, if we calculate the LENGTH of the string and subtract 8, we’ll end up with the perfect number of characters for our MID formula. Therefore, our final formula for cell K2 (shown in the pic above) is:

=MID(J2,6,LEN(J2)-8)

With our MID formula calculated for every student, we can calculate the total MAJOR count for each student using COUNTIF. It looks like Computer Science takes the top spot:

MID+formula+Excel (1)