What is a Wildcard in Regex?
When creating regular expressions, wildcards are used when you want to match against some characters, but you're not sure exactly which ones. This means you can search through user-inputted text without knowing exactly what they've entered, and still find the part you're looking for. You could also use them to validate a user's input against a specific pattern, to make sure it's in the correct format.
This blog post is a modified excerpt from the book "What the F.+k is Regex?" - you can get the full digital copy to read from Leanpub or Amazon. Thank you!
For example, if you need to make sure a password contains at least one lowercase and uppercase letter, a number, and a symbol, one of the easiest and fastest ways to check these requirements is to use a regex pattern with wildcards - we'll look into this later!
There are three types of wildcard - the dot/full-stop/period, using parentheses and pipes, and square brackets - and we'll go through each of them to see what they do and how they work.
Dot - Literally Anything
.air
Using just a dot (also known as a full stop or period) will match anything exactly one time.
In our example above, .air
could be used to match "fair" and "pair", however it will also match capital letters, numbers, symbols, and whitespace (for example, "4air" and "#air"). Do keep in mind that this does not include new-line characters (\n
), but this can be configured using a flag - this is covered in more detail within "What the F.+k Is Regex?".
Parentheses - Grouping (Either-or)
(f|rep)air
We are able to capture different choices by using parentheses (brackets) and the pipe character. In the above example, we specify that either "f" or "rep" are present in the string. Therefore, this search pattern will match only "fair" or "repair".
In both examples we've looked at so far, the wildcard was placed at the start, however they can be provided anywhere in a pattern, so str(ip|ob)e
will be able to match both "stripe" and "strobe".
Also, many pipes can be used at once if you need more than two options. For example, using the pattern (f|p|l|rep)air
will be able to match against the following inputs:
- "fair"
- "pair"
- "lair"
- "repair"
Square Brackets - Sets
At this point, regex starts to get slightly more complicated. Using square brackets, regex will match against any one of the characters you put between them.
[fp]air
The example above will match against only "fair" and "pair", because of the f
and p
in the square brackets.
You can list as many letters, numbers, or symbols (with some exceptions) in here as you want to, and the pattern will match exactly one of them.
Therefore, this will not work when matching multiple characters. When we try to use this method to match against "repair" as we did with parentheses, creating a pattern such as [rep]air
will match any of the following words (regardless of whether they are non-sensical or not!):
- "rair"
- "eair"
- "pair"
Ranges within Sets
[a-z]air
Square brackets are also commonly used for ranges of characters. For example, using the above regex pattern will ensure that only one lowercase letter between a
and z
will successfully match. Ranges are case-sensitive, so if you need to match uppercase and lowercase letters, you'll need to specify both. This can be done quite easily, as multiple ranges can be specified within the same set, as seen in the following example.
[B-Db-d]are
Keep in mind that ranges don't have to include the entire alphabet - they can start from any letter in the alphabet, so this pattern will match "bare", "care", and "dare", as well as starting with a capital letter (e.g. "Bare").
Excluding from Sets
[^abc]
In addition to specifying ranges, you can also add the caret symbol (^
) as the first character inside the square brackets to specify that you want to match on everything except for the following characters. Above, this pattern matches on literally anything except for the lower-case letters "a", "b", and "c".
Exceptions
There are some quirky exceptions when using sets in regex, although you can understand why once you know what each character is for.
It is important to note that the anything expression (denoted by .
) will not work inside square brackets - the period will match exactly as it appears.
Hello[!.]
This pattern will match against either an exclamation mark or a full stop ("Hello!" or "Hello."), rather than matching against anything.
Also, if you would like to match against the hyphen (-
), rather than specify a new range of characters, it must either go at the start or the end of the character set, or have a backslash (\
) placed in front of it. To understand why, there is an entire chapter in "What the F.+k Is Regex?" about escape characters.
In the following two examples, we will match against any letter or the hyphen character:
[A-Za-z-]
[A-Z\-a-z]
The same is similar for the caret symbol (^
). Normally, you would put this character at the start of the set, but if you specifically need to match it, you would either place it somewhere in the middle, or at the end, instead. In the following two examples, we will match against any letter or the caret symbol:
[A-Z^a-z]
[\^A-Za-z]
Conclusion
Wildcards are a core part of regular expressions, and can be extremely powerful when combined together. In a future blog post, we'll look at quantifiers, in order to tell the input to match against a specific number of wildcards (for example, if you need three uppercase letters in a row).
Happy hacking!