COMP 10024 – Regular Expressions – The Art of Pattern Matching
April 6, 2026 4:59 pmA Regular Expression (or regex) is a specialized language used to describe search patterns. While a simple search looks for a specific word, a regex can look for “any word starting with a capital letter that ends in a number.” It is used extensively in tools like grep, sed, vi, and programming languages like Python and Perl.
1. Literal Matching
The simplest form of a regular expression is just a literal string of characters. This is what you do when you use a standard “Find” command.
| Regex | Matches |
|---|---|
the |
Matches “the” in “this is the sample”. |
is |
Matches “is” in “this is nice”. |
2. Anchors: Defining the Boundary
By default, a regex will match a pattern anywhere on a line. Anchors allow you to restrict the match to the very beginning or the very end of a string.
- The Caret (
^): Anchors the match to the start of the line. - The Dollar Sign (
$): Anchors the match to the end of the line.
# Matches lines that START with "The"
grep "^The" myfile.txt
# Matches lines that END with "done"
grep "done$" myfile.txt
# Matches lines that contain ONLY the word "exit"
grep "^exit$" myfile.txt
3. The Wildcard and Escaping
The dot (.) is the “universal character.” it matches exactly one instance of any character (letter, number, space, or symbol) except for a newline.
What if you want to find a literal dot? If you want to search for a period or a dollar sign without using their “magic” powers, you must escape them using a backslash (\).
# Matches "cat", "cot", "c9t", "c t"
grep "c.t" myfile.txt
# Matches a literal dollar sign
grep "\$" prices.txt
4. Ranges and Sets ([ ])
Square brackets define a Character Class. They tell the regex: “Match exactly ONE character at this position, as long as it is inside these brackets.”
[aeiou]: Matches any single vowel.[0-9]: The dash indicates a range; this matches any single digit.[A-Z]: Matches any uppercase letter.[a-zA-Z0-9]: Matches any single alphanumeric character.
# Match "meat" or "meet"
grep "me[ea]t" recipes.txt
# Match any three-digit number
grep "[0-9][0-9][0-9]" data.txt
Lab: The Regex Detective
In this lab, you will use grep and regular expressions to find specific patterns within your system files.
Task 1: Exploring the Dictionary
- Most UNIX systems have a dictionary file at
/usr/share/dict/words. We will use this as our playground. - Find all words that start with “pre”:
grep "^pre" /usr/share/dict/words - Find all words that end with “ing”:
grep "ing$" /usr/share/dict/words
Task 2: Using the Wildcard
- Find all 4-letter words that start with “b” and end with “d”:
grep "^b..d$" /usr/share/dict/words - How many 5-letter words exist in the dictionary?
grep "^.....$" /usr/share/dict/words | wc -l
Task 3: Numeric Ranges
- Look at your
/etc/passwdfile, which contains system user information. - Find any lines that contain a 3-digit number:
grep "[0-9][0-9][0-9]" /etc/passwd - Find lines that start with the letter “a”, “b”, or “c”:
grep "^[abc]" /etc/passwd
Task 4: Escaping Special Characters
- Create a file named
test.txtand add these lines:The cost is $5.00 Is it raining? The end. - Try to find the question mark:
grep "\?" test.txt - Try to find the lines ending in a period:
grep "\.$" test.txt
Summary Challenge
Construct a single grep command that finds all words in the dictionary that meet these three criteria:
- Start with a capital “S”
- Are exactly 6 characters long
- End with the letter “y”
Categorised in: COMP-10024, Lectures, Portfolio
This post was written by amax
Comments are closed here.