Regex Cheat Sheet: Common Patterns & Quick Reference
· 12 min read
Table of Contents
Regular expressions are one of the most powerful tools in a developer's toolkit, yet they remain intimidating for many programmers. This comprehensive cheat sheet breaks down regex patterns into digestible sections with practical examples you can use immediately.
Whether you're validating email addresses, parsing log files, or cleaning up messy data, this guide will help you write better regex patterns faster. We'll cover everything from basic character matching to advanced lookaround assertions.
Regex Basics
Regular expressions (regex or regexp) are patterns used to match character combinations in strings. They're supported in virtually every programming language—JavaScript, Python, Java, PHP, Ruby, Go, and more—as well as text editors like VS Code, Sublime Text, and command-line tools like grep and sed.
At their core, regex patterns consist of two types of characters: literal characters that match themselves exactly, and metacharacters that have special meanings and define matching rules.
The simplest regex is a literal string. The pattern hello matches the text "hello" exactly wherever it appears. But the real power comes from metacharacters that add flexibility—like matching any digit, repeating patterns, or anchoring to specific positions.
Pro tip: Use our Regex Tester to experiment with patterns in real time. You'll see matches highlighted instantly as you type, making it much easier to understand how patterns work.
Literal Characters vs Metacharacters
Most characters in a regex pattern are literal—they match themselves. The pattern cat matches the letters c, a, and t in that exact sequence. However, certain characters have special meanings:
. ^ $ * + ? { } [ ] \ | ( )are metacharacters- To match these literally, escape them with a backslash:
\.matches a period - Inside character classes
[], most metacharacters lose their special meaning
For example, example\.com matches "example.com" literally, while example.com would match "exampleXcom" because the unescaped dot matches any character.
Character Classes
Character classes let you match one character from a set of possibilities. They're the foundation of flexible pattern matching and come in two forms: predefined shorthand classes and custom bracket expressions.
| Pattern | Matches | Example |
|---|---|---|
. |
Any character except newline | h.t → hat, hot, hit, h@t |
\d |
Any digit [0-9] | \d{3} → 123, 456, 789 |
\D |
Any non-digit | \D+ → abc, xyz, @#$ |
\w |
Word character [a-zA-Z0-9_] | \w+ → hello_world, var123 |
\W |
Non-word character | \W → @, #, space, punctuation |
\s |
Whitespace (space, tab, newline) | \s+ → any whitespace sequence |
\S |
Non-whitespace | \S+ → any visible characters |
[abc] |
Any of a, b, or c | [aeiou] → any vowel |
[^abc] |
Not a, b, or c | [^0-9] → any non-digit |
[a-z] |
Range: a through z | [A-Za-z] → any letter |
[a-z0-9] |
Multiple ranges | [a-fA-F0-9] → hex digits |
Custom Character Classes
Bracket expressions [] let you define your own character sets. Inside brackets, most metacharacters lose their special meaning—you don't need to escape them.
[aeiou]matches any single vowel[0-9]matches any digit (equivalent to\d)[a-zA-Z]matches any letter, uppercase or lowercase[^0-9]matches anything except digits (the^negates the class)[a-z-]matches lowercase letters or a hyphen (hyphen at the end is literal)
Quick tip: The order of characters in a character class doesn't matter. [abc] and [bca] are identical. The class matches if any of the characters are present.
Practical Examples
Here are some real-world uses of character classes:
[A-Z][a-z]+matches capitalized words like "Hello" or "World"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}matches IP addresses (basic pattern)[a-fA-F0-9]{6}matches hex color codes like "FF5733"[^\s]+matches any sequence of non-whitespace (a "word" in the broadest sense)
Quantifiers and Repetition
Quantifiers specify how many times a pattern should repeat. They're placed after the element you want to repeat—a character, character class, or group.
| Quantifier | Meaning | Example |
|---|---|---|
* |
0 or more times | ab*c → ac, abc, abbc, abbbc |
+ |
1 or more times | ab+c → abc, abbc (not ac) |
? |
0 or 1 time (optional) | colou?r → color, colour |
{n} |
Exactly n times | \d{4} → 2026, 1999 |
{n,} |
n or more times | \w{3,} → words with 3+ chars |
{n,m} |
Between n and m times | \d{2,4} → 12, 123, 1234 |
*? |
Lazy/minimal match (0 or more) | <.*?> → first tag only |
+? |
Lazy/minimal match (1 or more) | ".+?" → first quoted string |
?? |
Lazy/minimal match (0 or 1) | \d?? → matches 0 digits if possible |
Greedy vs Lazy Matching
This is one of the most important concepts in regex. By default, quantifiers are greedy—they match as much text as possible while still allowing the overall pattern to match.
Consider the HTML string <b>bold</b> and <i>italic</i>:
<.*>(greedy) matches the entire string from the first<to the last><.*?>(lazy) matches just<b>, then</b>, then<i>, then</i>separately
Adding ? after a quantifier makes it lazy (also called non-greedy or minimal). It matches as little text as possible while still allowing the pattern to succeed.
Pro tip: When extracting content between delimiters (quotes, tags, brackets), almost always use lazy quantifiers. The pattern ".*?" correctly extracts individual quoted strings, while ".*" would match from the first quote to the last quote in the entire text.
Common Quantifier Patterns
Here are patterns you'll use constantly:
\d+matches one or more digits (numbers like 42, 1000, 7)\w+matches one or more word characters (identifiers, variable names)\s*matches optional whitespace (zero or more spaces/tabs).+?matches any characters lazily (content between markers)[a-z]{2,}matches words with at least 2 lowercase letters\d{3}-\d{3}-\d{4}matches phone numbers like 555-123-4567
Anchors and Boundaries
Anchors don't match characters—they match positions in the string. They're essential for ensuring patterns match at specific locations rather than anywhere in the text.
| Anchor | Position | Example |
|---|---|---|
^ |
Start of string (or line with m flag) | ^Hello → matches "Hello world" but not "Say Hello" |
$ |
End of string (or line with m flag) | end$ → matches "The end" but not "end of story" |
\b |
Word boundary | \bcat\b → matches "cat" but not "category" |
\B |
Not a word boundary | \Bcat\B → matches "concatenate" but not "cat" |
\A |
Start of string (never line) | Like ^ but ignores multiline mode |
\Z |
End of string (never line) | Like $ but ignores multiline mode |
Word Boundaries Explained
The word boundary \b is incredibly useful but often misunderstood. It matches the position between a word character (\w) and a non-word character (\W), or at the start/end of the string.
Consider the pattern \bcat\b applied to different strings:
- "the cat sat" → matches (cat is surrounded by spaces)
- "category" → no match (cat is followed by word character 'e')
- "concatenate" → no match (cat is preceded and followed by word characters)
- "cat" → matches (cat is at start and end of string)
- "cat!" → matches (cat is followed by punctuation, a non-word character)
This makes \b perfect for finding whole words without accidentally matching parts of larger words.
Start and End Anchors
The ^ and $ anchors are essential for validation. When you want to ensure an entire string matches a pattern (not just contains it), wrap your pattern with these anchors.
^\d+$ensures the entire string is digits (validates numeric input)^[A-Z]ensures the string starts with an uppercase letter[.!?]$ensures the string ends with punctuation^https?://ensures a URL starts with http:// or https://
Quick tip: When validating user input (email, phone, username), always use ^ and $ to anchor your pattern. Without them, the pattern \d{3} would accept "abc123def" when you probably want to reject anything that isn't exactly 3 digits.
Groups and Capturing
Parentheses () serve two purposes in regex: they group parts of a pattern together, and they capture the matched text for later use. This is where regex becomes truly powerful for extraction and transformation.
| Syntax | Purpose | Example |
|---|---|---|
(abc) |
Capturing group | (\d{3})-(\d{4}) captures area code and number |
(?:abc) |
Non-capturing group | (?:https?://)?example\.com groups without capturing |
(a|b) |
Alternation (OR) | (cat|dog) matches either "cat" or "dog" |
\1 |
Backreference to group 1 | (\w+)\s+\1 matches repeated words like "the the" |
(?<name>abc) |
Named capturing group | (?<year>\d{4})-(?<month>\d{2}) for dates |
Capturing Groups
When you wrap part of a pattern in parentheses, the regex engine captures the matched text. You can then reference these captures in your code or even within the regex itself using backreferences.
For example, the pattern (\d{3})-(\d{3})-(\d{4}) applied to "555-123-4567" creates three captures:
- Group 1: "555"
- Group 2: "123"
- Group 3: "4567"
In most programming languages, you can access these captures through match objects or replacement strings. This lets you reformat data easily—turning "555-123-4567" into "(555) 123-4567" with a replacement like ($1) $2-$3.
Non-Capturing Groups
Sometimes you need grouping for quantifiers or alternation but don't need to capture the text. Use (?:...) for better performance and cleaner capture numbering.
Compare these patterns:
(https?)://([\w.]+)creates two captures: protocol and domain(?:https?)://([\w.]+)creates one capture: just the domain
The second pattern is more efficient and makes your captures easier to work with since you don't have to skip over groups you don't care about.
Alternation and OR Logic
The pipe | inside a group creates an OR condition. The pattern (cat|dog|bird) matches any of those three words.
Important: alternation is left-to-right and stops at the first match. The pattern (cat|category) will never match "category" because "cat" matches first. Put longer alternatives first: (category|cat).
Pro tip: Use our Text Replacer tool to test regex replacements with capturing groups. You can see exactly how your captures are being used in the replacement string.
Backreferences
Backreferences let you match the same text that was captured earlier in the pattern. This is perfect for finding repeated words, matching paired delimiters, or validating consistent formatting.
(\w+)\s+\1matches repeated words like "the the" or "hello hello"(['"])(.*?)\1matches quoted strings with matching quotes<(\w+)>.*?</\1>matches HTML tags with matching open/close tags
Lookahead and Lookbehind
Lookaround assertions are zero-width—they match a position without consuming characters. They let you check what comes before or after a position without including it in the match.
| Syntax | Type | Meaning |
|---|---|---|
(?=...) |
Positive lookahead | Matches if followed by pattern |
(?!...) |
Negative lookahead | Matches if NOT followed by pattern |
(?<=...) |
Positive lookbehind | Matches if preceded by pattern |
(?<!...) |
Negative lookbehind | Matches if NOT preceded by pattern |
Lookahead Examples
Positive lookahead (?=...) checks that a pattern follows the current position:
\d+(?= dollars)matches numbers followed by " dollars" but doesn't include " dollars" in the matchpassword(?=.*\d)(?=.*[A-Z])validates that a password contains at least one digit and one uppercase letter
Negative lookahead (?!...) checks that a pattern does NOT follow:
\d+(?! dollars)matches numbers NOT followed by " dollars"^(?!.*password).*$matches strings that don't contain "password"
Lookbehind Examples
Positive lookbehind (?<=...) checks that a pattern precedes the current position:
(?<=\$)\d+matches numbers preceded by a dollar sign, without including the $(?<=@)\w+matches usernames after @ symbols
Negative lookbehind (?<!...) checks that a pattern does NOT precede:
(?<!un)happymatches "happy" but not "unhappy"(?<!//.*)\bfunction\bmatches "function" keyword not in comments
Quick tip: Lookaround assertions are powerful but can be confusing. Remember: they check conditions without moving the match position forward. Think of them as "peek ahead" or "peek behind" operations.
Password Validation with Lookahead
One of the most practical uses of lookahead is password validation. You can check multiple requirements without complex logic:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This pattern ensures:
- At least one lowercase letter
(?=.*[a-z]) - At least one uppercase letter
(?=.*[A-Z]) - At least one digit
(?=.*\d) - At least one special character
(?=.*[@$!%*?&]) - Minimum 8 characters
{8,}
Regex Flags and Modifiers
Flags modify how the regex engine interprets your pattern. They're typically added after the closing delimiter in languages like JavaScript (/pattern/flags) or as parameters in function calls.
| Flag | Name | Effect |
|---|---|---|
i |
Case-insensitive | Makes pattern match regardless of case |
g |
Global | Finds all matches, not just the first |
m |
Multiline | Makes ^ and $ match line boundaries |
s |
Dotall | Makes . match newlines too |
u |
Unicode | Enables full Unicode support |
x |
Extended | Allows whitespace and comments in pattern |
Case-Insensitive Flag (i)
The i flag makes your pattern match regardless of letter case. Without it, hello only matches "hello" exactly. With it, the pattern matches "hello", "Hello", "HELLO", "HeLLo", etc.
This is essential for user input validation where you don't want to force specific capitalization. For example, /^yes$/i accepts "yes", "Yes", "YES", or any other case variation.
Global Flag (g)
By default, regex engines stop after finding the first match. The g flag tells the engine to find all matches in the string.
This is crucial for operations like "find and replace all" or counting occurrences. In JavaScript, string.match(/\d+/g) returns an array of all number sequences, while without g it returns only the first match.
Multiline Flag (m)
Normally, ^ and $ match the start and end of the entire string. With the m flag, they match the start and end of each line within the string.
This is useful for processing multi-line text like log files or CSV data. The pattern ^ERROR/m matches "ERROR" at the start of any line, not just the first line of the file.
Dotall Flag (s)
By default, the dot . matches any character except newlines. The s flag (also called "single-line" mode, confusingly) makes the dot match newlines too.
This is helpful when matching content that spans multiple lines, like HTML tags or multi-line comments: <div>.*?</div> with the s flag matches div elements even if they contain line breaks.
Common Patterns Library
Here's a collection of battle-tested regex patterns for common validation and extraction tasks. These patterns balance accuracy with practicality—perfect for most use cases.
Email Addresses
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern handles most valid email addresses. It requires a local part (before @), a domain name, and a TLD. Note that fully RFC-compliant email validation is extremely complex—this pattern covers 99% of real-world cases.
URLs
^https?://[^\s/$.?#].[^\s]*$
Matches HTTP and HTTPS URLs. For more permissive matching (optional protocol), use:
^(?:https?://)?(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:/[^\s]*)?$
Phone Numbers
^\+?1?\s*\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$