PHP support for regular regular expressions is built into the language. In PHP they are commonly called Perl-compatible regular expressions or PCRE for short. Note this is not meant to be a comprehensive discussion on PCRE. For a complete description on PCRE, see the PHP manual: here. That being said, there are just a few basic things you need to know about PCRE in order start using them.
- Delimiters
- Meta-characters
- Escape Sequences
- Anchors
- Subpatterns
- Functions
Before we get started, I should probably present an example of a simple regular expression. The following regular expression finds every two digit number followed by a letter:
/[0-9]{2,2}\w/
For example, the code below:
1 2 3 4 5 6 7 |
|
Will return:
aab2k3oo45l987xz0x11Cv22
Array
(
[0] => Array
(
[0] => 45l
[1] => 87x
[2] => 11C
)
)
Delimiters
All regular expressions are delimited by a character. This can be almost any character, but it should not be one that can be found in the search string. For example, in the above PCRE the “/” character was used as the delimiter. A delimiter cannot be a whitespace, backslash, or alphanumeric character.
Meta-characters
Meta-characters allow for alternatives or repetitions in patterns. A few were used in the above example:
- [ character class definition start
- ] character class definition ending
- - character range
- ( subpattern start
- ) subpattern end
- { min-max quantifier start
- } min-max quantifier ending
- \ escape character
Some characters have special meaning inside a character class:
- \ escape character
- ^ class negation, but only if first character e.g. [^a-z789]
- - character range
- ] terminates the character class
Escape Sequences
If the escape character \ is followed by a non alpha-numeric character then it takes away from the special meaning that the character may have. For example, if you wanted to actually include a “ character in a PCRE, then \” sequence would remove the special “string character” status that “ has.
If the escape character \ is followed by an alpha-numeric character then it takes on a special meaning. Escape codes can encode non-printing codes in a visible manner. For example:
- \f is a form feed character
- \n is a newline character
- \t is a tab character
Escape sequences can be used to match any character by giving their octal character codes. For example:
- \040 space character
- \007 bel character
- \033 esc character
- \011 tab character
- \113 the letter K
Escape sequences can also represent generic character types. For example:
- \d is any decimal digit
- \s is any whitespace character
- \w is any word character
If some alpha characters are capitalized, it means the opposite. For example:
- \S is any non-whitespace character
- \W is any non-word character
- \D is any non-digit
For example:
1 2 3 4 5 6 7 8 |
|
Returns:
[U]
[]
[qexVhA]
[ LJQYP]
Anchors
Outside a character class, in the default matching mode, the circumflex character (^) is an assertion which is true only if the current matching point is at the start of the subject string.
1
2
3
4
5
6
7
8
9
<?php
$s = "U29qexVhA8 LJQYP";
$regex = "/QYP$/";
preg_match ($regex, $s, $matches);
print_r ($matches);
$regex = "/^U29/";
preg_match ($regex, $s, $matches);
print_r ($matches);
?>
Returns:
Array
(
[0] => QYP
)
Array
(
[0] => U29
)
Subpatterns
Subpatterns are delimited by parenthesis.
The purpose of subpatterns is to:
- Localize a set of alternatives, and
- Sets up the subpattern as a capturing subpattern.
For example, given a set of localized alternative: /phil(harmonic|anthropist)/
1
2
3
4
5
6
7
<?php
$s = "the st. louis philharmonic orchestra";
$regex = '/phil(harmonic|anthropist)/';
preg_match_all($regex, $s, $matches);
echo $s . PHP_EOL;
print_r ($matches);
?>
returns:
the st. louis philharmonic orchestra
Array
(
[0] => Array
(
[0] => philharmonic
)
[1] => Array
(
[0] => harmonic
)
)
In this example, ‘philharmonic’ was captured, with ‘harmonic’ being captured as a subpattern.
If an opening parenthesis is followed by “:?”, the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns.
For example:
1
2
3
4
5
6
7
<?php
$s = "the new york jets and the new york mets";
$regex = '/new york (:?jets|giants)|(mets|yankees)/';
preg_match_all($regex, $s, $matches);
echo $s . PHP_EOL;
print_r ($matches);
?>
produces:
=-=-=-=-= subpatterns
the new york jets and the new york mets
Array
(
[0] => Array
(
[0] => new york jets
[1] => mets
)
[1] => Array
(
[0] =>
[1] => mets
)
)
Functions
preg_filter() – performs a regular expression search and replace, returning a string
1
2
3
4
5
6
$subject = "It all depends what this is.";
$replace = "is";
$pattern = "/this/";
echo "replacing this".PHP_EOL."[$subject ]".PHP_EOL."with".PHP_EOL . "[";
print (preg_filter($pattern, $replace, $subject));
echo "]" . PHP_EOL;
will print:
replacing this
[It all depends what this is. ]
with
[It all depends what is is.]
preg_grep() - Return array entries that match the pattern
preg_last_error() - Returns the error code of the last PCRE regex execution
1
2
3
4
5
6
$names = array ('John', 'Bob', 'Teresa', 'Lisa', 'Jimmy', 'Beverly');
$grepped = preg_grep("/^B+/", $names);
print_r($grepped);
if (preg_last_error() == PREG_NO_ERROR) {
print 'There was no preg error.' . PHP_EOL;
}
will print:
Array
(
[1] => Bob
[5] => Beverly
)
There was no preg error.
preg_match_all() - Perform a global regular expression match (see earlier examples for usage)
preg_match() - Perform a regular expression match
preg_match is similar to preg_match_all except that it will stop searching after the first match is found
preg_quote() - Quote regular expression characters
1
2
$regex_chars = "\ + * ? [ ^ ] $ ( ) { } = ! < > | : –";
echo preg_quote ($regex_chars) . PHP_EOL;
will print:
\\ \+ \* \? \[ \^ \] \$ \( \) \{ \} \= \! \< \> \| \: \-
preg_replace_callback() - Perform a regular expression search and replace using a callback
1
2
3
4
5
6
7
8
9
10
11
$bicycle = array ('frame', 'chain', 'gruppo', 'seatpost', 'tires',
'handlebars', 'stem', 'saddle');
$pattern = "/\w+/";
print_r(preg_replace_callback ($pattern, 'my_callback', $bicycle));
function my_callback ($matches) {
$s = "";
foreach ($matches as $match) {
$s .= strtoupper($match);
}
return $s;
}
will print:
Array
(
[0] => FRAME
[1] => CHAIN
[2] => GRUPPO
[3] => SEATPOST
[4] => TIRES
[5] => HANDLEBARS
[6] => STEM
[7] => SADDLE
)
preg_replace() - Perform a regular expression search and replace, returning an array
1
2
3
4
5
$string = "this sentence has been capitalized.";
$replace = "T";
$pattern = "/^\w/";
$ar = preg_replace ($pattern, $replace, $string);
print_r($ar);
Will print:
This sentence has been capitalized.
preg_split() - Split string by a regular expression
1
2
3
4
$string = "Isn't PHP a cool language?";
$pattern = "/\s/";
$ar = preg_split ($pattern, $string);
print_r ($ar);
Will print:
Array
(
[0] => Isn't
[1] => PHP
[2] => a
[3] => cool
[4] => language?
)