crstin – Learn Regex - The Easy Way

regex

What is Regular Expression?

Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.

A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within a string, validating form, extract a substring from a string based upon a pattern match, and so much more.

Imagine you are writing an application and you want to set the rules for when a user chooses their username. We want to allow the username to contain letters, numbers, underscores and hyphens. We also want to limit the number of characters in username so it does not look ugly. We use the following regular expression to validate a username:

Start of the line
        |   3 to 15 characters long
        |             |
        |             |
        ^[a-z0-9_-]{3,15}$
              |          |
              |          |
              |     End of the line
letters, numbers, underscores, hyphens

Above regular expression can accept the strings john_doe, jo-hn_doe and john12_as. It does not match Jo because that string contains uppercase letter and also it is too short.

Basic Matchers
Meta character
Shorthand Character Sets
Lookaround
Flags
Greedy vs lazy matching

1. Basic Matchers

A regular expression is just a pattern of characters that we use to perform search in a text. For example, the regular expression the means: the letter t, followed by the letter h, followed by the letter e.

"the" => The fat cat sat on the mat.

Meta character	Description
.	Period matches any single character except a line break.
[ ]	Character class. Matches any character contained between the square brackets.
[^ ]	Negated character class. Matches any character that is not contained between the square brackets
*	Matches 0 or more repetitions of the preceding symbol.
+	Matches 1 or more repetitions of the preceding symbol.
?	Makes the preceding symbol optional.
{n,m}	Braces. Matches at least "n" but not more than "m" repetitions of the preceding symbol.
(xyz)	Character group. Matches the characters xyz in that exact order.
\|	Alternation. Matches either the characters before or the characters after the symbol.
\	Escapes the next character. This allows you to match reserved characters `[ ] ( ) { } . * + ? ^ $ \ \|`
^	Matches the beginning of the input.
$	Matches the end of the input.

Shorthand	Description
.	Any character except new line
\w	Matches alphanumeric characters: `[a-zA-Z0-9_]`
\W	Matches non-alphanumeric characters: `[^\w]`
\d	Matches digit: `[0-9]`
\D	Matches non-digit: `[^\d]`
\s	Matches whitespace character: `[\t\n\f\r\p{Z}]`
\S	Matches non-whitespace character: `[^\s]`

Symbol	Description
?=	Positive Lookahead
?!	Negative Lookahead
?<=	Positive Lookbehind
?<!	Negative Lookbehind

Flag	Description
i	Case insensitive: Sets matching to be case-insensitive.
g	Global Search: Search for a pattern throughout the input string.
m	Multiline: Anchor meta character works on each line.

Learn Regex - The Easy Way

What is Regular Expression?

Table of Contents

1. Basic Matchers

2. Meta Characters

2.1 Full stop

2.2 Character set

2.2.1 Negated character set

2.3 Repetitions

2.3.1 The Star

2.3.2 The Plus

2.3.3 The Question Mark

2.4 Braces

2.5 Capturing Group

2.5.1 Non-capturing group

2.6 Alternation

2.7 Escaping special character

2.8 Anchors

2.8.1 Caret

2.8.2 Dollar

3. Shorthand Character Sets

4. Lookaround

4.1 Positive Lookahead

4.2 Negative Lookahead

4.3 Positive Lookbehind

4.4 Negative Lookbehind

5. Flags

5.1 Case Insensitive

5.2 Global search

5.3 Multiline

6. Greedy vs lazy matching