COMP 10024 – Regular Expressions – The Art of Pattern Matching

April 6, 2026 4:59 pm Published by

A Regular Expression (or regex) is a specialized language used to describe search patterns. While a simple search looks for a specific word, a regex can look for “any word starting with a capital letter that ends in a number.” It is used extensively in tools like grep, sed, vi, and programming languages like Python and Perl.


1. Literal Matching

The simplest form of a regular expression is just a literal string of characters. This is what you do when you use a standard “Find” command.

Regex Matches
the Matches “the” in “this is the sample”.
is Matches “is” in “this is nice”.

2. Anchors: Defining the Boundary

By default, a regex will match a pattern anywhere on a line. Anchors allow you to restrict the match to the very beginning or the very end of a string.

  • The Caret (^): Anchors the match to the start of the line.
  • The Dollar Sign ($): Anchors the match to the end of the line.
# Matches lines that START with "The"
grep "^The" myfile.txt

# Matches lines that END with "done"
grep "done$" myfile.txt

# Matches lines that contain ONLY the word "exit"
grep "^exit$" myfile.txt

3. The Wildcard and Escaping

The dot (.) is the “universal character.” it matches exactly one instance of any character (letter, number, space, or symbol) except for a newline.

What if you want to find a literal dot? If you want to search for a period or a dollar sign without using their “magic” powers, you must escape them using a backslash (\).

# Matches "cat", "cot", "c9t", "c t"
grep "c.t" myfile.txt

# Matches a literal dollar sign
grep "\$" prices.txt

4. Ranges and Sets ([ ])

Square brackets define a Character Class. They tell the regex: “Match exactly ONE character at this position, as long as it is inside these brackets.”

  • [aeiou] : Matches any single vowel.
  • [0-9] : The dash indicates a range; this matches any single digit.
  • [A-Z] : Matches any uppercase letter.
  • [a-zA-Z0-9] : Matches any single alphanumeric character.
# Match "meat" or "meet"
grep "me[ea]t" recipes.txt

# Match any three-digit number
grep "[0-9][0-9][0-9]" data.txt

Lab: The Regex Detective

In this lab, you will use grep and regular expressions to find specific patterns within your system files.

Task 1: Exploring the Dictionary

  1. Most UNIX systems have a dictionary file at /usr/share/dict/words. We will use this as our playground.
  2. Find all words that start with “pre”:

    grep "^pre" /usr/share/dict/words
  3. Find all words that end with “ing”:

    grep "ing$" /usr/share/dict/words

Task 2: Using the Wildcard

  1. Find all 4-letter words that start with “b” and end with “d”:

    grep "^b..d$" /usr/share/dict/words
  2. How many 5-letter words exist in the dictionary?

    grep "^.....$" /usr/share/dict/words | wc -l

Task 3: Numeric Ranges

  1. Look at your /etc/passwd file, which contains system user information.
  2. Find any lines that contain a 3-digit number:

    grep "[0-9][0-9][0-9]" /etc/passwd
  3. Find lines that start with the letter “a”, “b”, or “c”:

    grep "^[abc]" /etc/passwd

Task 4: Escaping Special Characters

  1. Create a file named test.txt and add these lines:
    The cost is $5.00
    Is it raining?
    The end.
  2. Try to find the question mark: grep "\?" test.txt
  3. Try to find the lines ending in a period: grep "\.$" test.txt

Summary Challenge

Construct a single grep command that finds all words in the dictionary that meet these three criteria:

  • Start with a capital “S”
  • Are exactly 6 characters long
  • End with the letter “y”

Categorised in: , ,

This post was written by amax

Comments are closed here.