What is Regex (Regular Expressions)?
A regular expression (regex) is a sequence of characters that defines a search pattern, used for matching, extracting, and manipulating text in strings.
Regex (regular expression) is a sequence of characters that defines a search pattern, used for matching, extracting, and manipulating text in strings. Originally formalized by Stephen Kleene in 1951, regular expressions are supported in virtually every programming language, text editor, and command-line tool — though syntax varies between flavors like PCRE, ECMAScript, POSIX, and RE2.
Basic syntax
A regex pattern is built from literal characters and metacharacters:
.matches any single character*means zero or more of the preceding element+means one or more?means zero or one^anchors to the start of a string$anchors to the end[abc]matches any one of a, b, or c\dmatches any digit,\wmatches word characters,\smatches whitespace
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Matches: user@example.com
Capture groups and backreferences
Parentheses () create capture groups that extract matched substrings. In the pattern (\d{4})-(\d{2})-(\d{2}) matched against 2026-03-21, group 1 captures 2026, group 2 captures 03, and group 3 captures 21.
Named groups like (?<year>\d{4}) make patterns more readable. Backreferences like \1 refer back to previously captured groups within the same pattern.
Lookaheads and lookbehinds
These are zero-width assertions — they check what’s around a match without including it:
(?=...)positive lookahead: match only if followed by…(?!...)negative lookahead: match only if NOT followed by…(?<=...)positive lookbehind: match only if preceded by…
Example: \d+(?=px) matches 16 in 16px but not 16 in 16em.
Where regex is used
- Validation: Email addresses, phone numbers, URLs, dates
- Search and replace: In code editors,
sed, and string manipulation functions - Log parsing: Extracting timestamps, error codes, IP addresses from log files
- Web scraping: Pulling structured data from HTML (though a proper parser is usually better)
- Routing: Web frameworks use regex patterns for URL routing
Common gotchas
Greedy matching (.*) captures as much as possible. Use .*? for non-greedy (lazy) matching. Backtracking can cause catastrophic performance on certain patterns — a concept called ReDoS (Regular Expression Denial of Service).
Test and debug patterns with the Regex Tester, reference syntax with the Regex Cheatsheet, or visualize pattern logic with the Regex Visualizer.
Comments