ð° Regular Expressions (RegEx)
Various useful tools and resources for working with regular expressions.
Specificationsâ
| Engine / Flavor | Origin / Common Use | Lookahead | Lookbehind | Backreferences | Named Groups | Unicode \p{} | Non-greedy Quantifiers | Guarantees Linear Time | Notable Traits |
|---|---|---|---|---|---|---|---|---|---|
| POSIX (BRE/ERE) | grep, sed, awk | â | â | â (BRE limited) | â | Partial | â | â (deterministic) | Classic Unix tools, minimal |
| PCRE | PHP, Perl, Apache | â | â | â | â | â | â | â | De facto standard, very expressive |
Python re | Python | â | â (fixed width) | â | â
(?P<>) | Partial (full via regex lib) | â | â | PCRE-like with some constraints |
| JavaScript (ECMAScript) | Web, Node.js | â | â (since ES2018) | â | â
(?<name>) | â
(\p{} in ES2018+) | â | â | Minimal but modern |
| google/re2Â (0) | Go, Google infra | â | â | â | â | â | â | â | No backtracking, ultra fast |
Java java.util.regex | JVM ecosystem | â | â (fixed width) | â | â | Partial | â | â | PCRE-ish, stable |
| .NET | C#, F#, etc. | â | â | â | â | â | â | â | Very feature-rich, balancing groups |
Rust regex | Rust crate | â | â | â | â | â | â | â | Like RE2, safe & Unicode-first |
| PostgreSQL | SQL | â (limited) | â | â | â | Partial | â | â | POSIX-based |
| Oniguruma | Ruby | â | â | â | â | â | â | â | Advanced Unicode and named groups |
ðĄ Tipsâ
- Think about you want to "match" or "search"
- For matching, use
^and$to match the beginning and end of the string- Be careful with multi-line strings
Toolsâ
Charactersâ
| Character | RegEx | Notes |
|---|---|---|
| Lowercase letter | [a-z] | |
| Uppercase letter | [A-Z] | |
| Letter | [a-zA-Z] | |
| Digit | \d | Equal to [0-9] |
| Non-digit | \D | |
| Whitespace | \s | Could be space, tab, etc. |
| Non-whitespace | \S | |
| Word character | \w | Equal to [a-zA-Z0-9_] |
| Non-word | \W | |
| Word boundary | \b | |
| Non-boundary | \B |
Quantifiersâ
note
In possessive match mode, the regex engine will not backtrack to find a match. This is useful when you know that there is only one possible match. This can be more efficient than greedy or reluctant matching, but more prone to not finding a match.
| Quantifiers | RegEx | Notes |
|---|---|---|
| More or 0 | * | Greedy Match (Prefer more) |
| More or 1 | + | Greedy Match (Prefer more) |
| 1 or 0 | ? | Greedy Match (Prefer one) |
| 0 or more | *? | Reluctant Match (Prefer 0) |
| 1 or more | +? | Reluctant Match (Prefer 1) |
| 0 or 1 | ?? | Reluctant Match (Prefer 0) |
| 0 or more | *+ | Possessive Match |
| 1 or more | ++ | Possessive Match |
| 1 or 0 | ?+ | Possessive Match |
| Exactly n | {n}, {n}+, {n}? | |
| At least n | {n,} | Greedy Match (Prefer more) |
| Between n and m | {n,m} | Greedy Match (Prefer more) |
| At least n | {n,}? | Reluctant Match (Prefer less) |
| Between n and m | {n,m}? | Reluctant Match (Prefer less) |
| At least n | {n,}+ | Possessive Match |
| Between n and m | {n,m}+ | Possessive Match |
Snippetsâ
File Nameâ
This takes more than RegEx to fully validate, see String Sanitizing
Emailâ
TODO
URLâ
TODO
Private IP Address1â
| Start | End | CIDR |
|---|---|---|
| 127.0.0.0 | 127.255.255.255 | 127.0.0.0/8 |
| 10.0.0.0 | 10.255.255.255 | 10.0.0.0/8 |
| 172.16.0.0 | 172. 31.255.255 | 172.16.0.0/12 |
| 192.168.0.0 | 192.168.255.255 | 192.168.0.0/16 |
(^127\.)|
(^10\.)|
(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|
(^192\.168\.)
Base64 encoded stringâ
^[a-zA-Z0-9+/]*={0,2}$
URL-safe slugâ
- Have at least one character
- Contains only letters, numbers, hyphens, and underscores
- Does not start or end with a hyphen or underscore
- Does not have consecutive hyphens or underscores
^[a-zA-Z0-9]+([-_][a-zA-Z0-9]+)*$
Semantic Versionâ
^(\d+)\.(\d+)\.(\d+)$
Chinese Charactersâ
[\u4e00-\u9fa5]
ðđðž Taiwan Addressâ
This is a WIP, might not be perfect.
^[\u4e00-\u9fa5]+?(įļĢ|åļ)[\u4e00-\u9fa5]+?(é|éŪ|åļ|å)[\u4e00-\u9fa5]+?(č·Ŋ|čĄ|åΧé)([\u4e00-\u9fa5]+?æŪĩ)?(\d+?å··)?(\d+?åž)?[\d\-ã]+?č(.+?æĢ)?((B?[\d]+?|G|L)æĻ)?(äđ\d+?)?$
ðđðž Taiwan Address with Villageâ
^[\u4e00-\u9fa5]+?(įļĢ|åļ)[\u4e00-\u9fa5]+?(é|éŪ|åļ|å)([\u4e00-\u9fa5]+?(æ|é))?(\d+?é°)?[\u4e00-\u9fa5]+?(č·Ŋ|čĄ|åΧé)([\u4e00-\u9fa5]+?æŪĩ)?(\d+?å··)?(\d+?åž)?[\d\-ã]+?č(.+?æĢ)?((B?[\d]+?|G|L)æĻ)?(äđ\d+?)?$