Skip to main content

🎰 Regular Expressions (RegEx)

Various useful tools and resources for working with regular expressions.

Specifications​

Engine / FlavorOrigin / Common UseLookaheadLookbehindBackreferencesNamed GroupsUnicode \p{}Non-greedy QuantifiersGuarantees Linear TimeNotable Traits
POSIX (BRE/ERE)grep, sed, awk❌❌✅ (BRE limited)❌Partial❌✅ (deterministic)Classic Unix tools, minimal
PCREPHP, Perl, Apache✅✅✅✅✅✅❌De facto standard, very expressive
Python rePython✅✅ (fixed width)✅✅ (?P<>)Partial (full via regex lib)✅❌PCRE-like with some constraints
JavaScript (ECMAScript)Web, Node.js✅✅ (since ES2018)✅✅ (?<name>)✅ (\p{} in ES2018+)✅❌Minimal but modern
google/re2 (0)Go, Google infra✅❌❌✅✅✅✅No backtracking, ultra fast
Java java.util.regexJVM ecosystem✅✅ (fixed width)✅✅Partial✅❌PCRE-ish, stable
.NETC#, F#, etc.✅✅✅✅✅✅❌Very feature-rich, balancing groups
Rust regexRust crate✅❌❌✅✅✅✅Like RE2, safe & Unicode-first
PostgreSQLSQL✅ (limited)❌✅❌Partial❌❌POSIX-based
OnigurumaRuby✅✅✅✅✅✅❌Advanced Unicode and named groups

ðŸ’Ą Tips​

  • Think about you want to "match" or "search"
  • For matching, use ^ and $ to match the beginning and end of the string
    • Be careful with multi-line strings

Tools​

Characters​

CharacterRegExNotes
Lowercase letter[a-z]
Uppercase letter[A-Z]
Letter[a-zA-Z]
Digit\dEqual to [0-9]
Non-digit\D
Whitespace\sCould be space, tab, etc.
Non-whitespace\S
Word character\wEqual to [a-zA-Z0-9_]
Non-word\W
Word boundary\b
Non-boundary\B

Quantifiers​

note

In possessive match mode, the regex engine will not backtrack to find a match. This is useful when you know that there is only one possible match. This can be more efficient than greedy or reluctant matching, but more prone to not finding a match.

QuantifiersRegExNotes
More or 0*Greedy Match (Prefer more)
More or 1+Greedy Match (Prefer more)
1 or 0?Greedy Match (Prefer one)
0 or more*?Reluctant Match (Prefer 0)
1 or more+?Reluctant Match (Prefer 1)
0 or 1??Reluctant Match (Prefer 0)
0 or more*+Possessive Match
1 or more++Possessive Match
1 or 0?+Possessive Match
Exactly n{n}, {n}+, {n}?
At least n{n,}Greedy Match (Prefer more)
Between n and m{n,m}Greedy Match (Prefer more)
At least n{n,}?Reluctant Match (Prefer less)
Between n and m{n,m}?Reluctant Match (Prefer less)
At least n{n,}+Possessive Match
Between n and m{n,m}+Possessive Match

Snippets​

File Name​

This takes more than RegEx to fully validate, see String Sanitizing

Email​

TODO

URL​

TODO

Private IP Address1​

StartEndCIDR
127.0.0.0127.255.255.255127.0.0.0/8
10.0.0.010.255.255.25510.0.0.0/8
172.16.0.0172. 31.255.255172.16.0.0/12
192.168.0.0192.168.255.255192.168.0.0/16
(^127\.)|
(^10\.)|
(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|
(^192\.168\.)

Base64 encoded string​

^[a-zA-Z0-9+/]*={0,2}$

URL-safe slug​

  • Have at least one character
  • Contains only letters, numbers, hyphens, and underscores
  • Does not start or end with a hyphen or underscore
  • Does not have consecutive hyphens or underscores
^[a-zA-Z0-9]+([-_][a-zA-Z0-9]+)*$

Semantic Version​

^(\d+)\.(\d+)\.(\d+)$

Chinese Characters​

[\u4e00-\u9fa5]

ðŸ‡đ🇞 Taiwan Address​

This is a WIP, might not be perfect.

^[\u4e00-\u9fa5]+?(įļĢ|åļ‚)[\u4e00-\u9fa5]+?(鄉|éŽŪ|åļ‚|區)[\u4e00-\u9fa5]+?(č·Ŋ|街|åΧ道)([\u4e00-\u9fa5]+?æŪĩ)?(\d+?å··)?(\d+?åž„)?[\d\-、]+?號(.+?æĢŸ)?((B?[\d]+?|G|L)æĻ“)?(äđ‹\d+?)?$

ðŸ‡đ🇞 Taiwan Address with Village​

^[\u4e00-\u9fa5]+?(įļĢ|åļ‚)[\u4e00-\u9fa5]+?(鄉|éŽŪ|åļ‚|區)([\u4e00-\u9fa5]+?(村|里))?(\d+?鄰)?[\u4e00-\u9fa5]+?(č·Ŋ|街|åΧ道)([\u4e00-\u9fa5]+?æŪĩ)?(\d+?å··)?(\d+?åž„)?[\d\-、]+?號(.+?æĢŸ)?((B?[\d]+?|G|L)æĻ“)?(äđ‹\d+?)?$

References​

Footnotes​

  1. https://stackoverflow.com/a/2814102/10325430 â†Đ