Know Your MYSQL: How to use regular expression in MYSQL
MySQL offers the ability
to use regular expressions to perform complex searches against your data. A
regular expression is a tool that provides for a concise and flexible way to
identify strings of text based on user-defined patterns.
This article will
discuss the MySQL regular expression operators, review their use and syntax,
and identify the constructs and special characters that can be used in a MySQL
regular expression, as well as provide a few examples of their use.
MySQL Regular Expression
Operators
The following operators
are used in MySQL to perform regular expression operations. These are used in a
WHERE clause similar to the well-known and often used LIKE operator.
- REGEXP: The
pattern matching operator for using regular expressions.
- NOT
REGEXP:
The negation of the REGEXP operator.
- RLIKE: A synonym
for the REGEXP operator.
MySQL Regular Expression
Syntax
The basic syntax used
for MySQL regular expression operations is:
-- For the REGEXP Operator
SELECT {COLUMN_NAME} FROM
{TABLE_NAME} WHERE {COLUMN_NAME} REGEXP '{REGEXP_PATTERN}';
-- For the NOT REGEXP Operator
SELECT {COLUMN_NAME} FROM
{TABLE_NAME} WHERE {COLUMN_NAME} NOT REGEXP '{REGEXP_PATTERN}';
-- For the RLIKE Alias Operator
SELECT {COLUMN_NAME} FROM
{TABLE_NAME} WHERE {COLUMN_NAME} RLIKE '{REGEXP_PATTERN}';
To provide more
detailed, yet simple, example of a MySQL regular expression operation, take the
following statement. It will retrieve all the columns of each record in the
table PRICE where the PRICELIST_ID matches the pattern specified (starts with
the numeric range 0-9 occurring one or more times, followed by an ‘_’
(underscore), and then the character sequence ‘USD’.
SELECT * FROM PRICE WHERE
PRICELIST_ID REGEXP '^[0-9]+_USD';
Another example of a
MySQL regular expression operation, can be shown in the following statement. It
will retrieve all columns of each record from the PRICE table where the
PRICE_ID matches the pattern specified (starts with an O, followed by and ‘_’
(underscore), then the numeric range 0-9 occurring one or more times, followed
by and ‘_’ (underscore), then ending with either the character sequence USD, or
BRA.
SELECT * FROM PRICE WHERE
PRICE_ID REGEXP '^O_[0-9]+_[USD|BRA]';
MySQL REGEXP Constructs
and Special Characters
A MySQL regular
expression may use any of the following constructs and special characters to
construct a pattern for use with the REGEXP operators. The construct or special
character is shown, followed by a description of each and what operations in
performs within the pattern for the regular expression.
- ^ :
Match the beginning of a string.
- $ :
Match the end of a string.
- . :
Match any character (including carriage return and newline characters).
- a* :
Match any sequence of zero or more a characters.
- a+ :
Match any sequence of one or more a characters.
- a? :
Match either zero or one a characters.
- de|abc :
Match either of the character sequences, de or abc.
- (abc)* :
Match zero or more instances of the character sequence abc.
- {1},{2,3} : Provides
a more general way of writing regular expressions that match many
occurences of the previous atom (or “piece”) of the pattern. i.e. a?
can be written as a{0,1}.
- [a-dX],[^a-dX] :
Matches any character that is (or is not, if ^ is used) either a, b, c, d,
or X. A “-” character between two other characters forms a
range that maches all characters from the first character to the second.
- [.characters.] :
Within a bracket expression (using “[” and “]”), matches the sequence of
characters of that collating element. i.e. the pattern [[.period.]] would
match the ‘.’ (period) character.
- [=character_class=] :
Within a bracket expression, represents an equivalence class. It matches
all characters with the same collation value, including itself.
- [:character_class:] :
Within a bracket expression, represents a character class that matches all
characters belonging to that class. i.e. the pattern [[:alpha:]] would match
against a string that is all aphabetic characters.
- [[:<:]],[[:>:]] :
These markers stand for word boundaries, and as such they match the
beginning and ending of words, respectively.
* NOTE: MySQL interprets the
“\” (backslash) character as an escape character. If you choose to use the “\”
character as part of your pattern in a regular expression it will need to
escaped with another backslash “\\”.
Comments
Post a Comment