8 Practical PHP Regular Expressions

Devolio

Search

Twitter

    Monday, October 15. 2007

    8 Practical PHP Regular Expressions

    Here are eight examples of practical PHP regular expressions and techniques that I've used over the past few years using Perl Compatible Regular Expressions. This guide goes over the eight different validation techniques and describes briefly how they work. Usernames, telephone numbers, email addresses, and more.
    Here are eight examples of practical PHP regular expressions and techniques that I've used over the past few years using Perl Compatible Regular Expressions. This guide goes over the eight different validation techniques and describes briefly how they work. Usernames, telephone numbers, email addresses, and more.

    Validating Usernames

    Something often overlooked, but simple to do with a regular expression would be username validation. For example, we may want our usernames to be between 4 and 28 characters in length, alpha-numeric, and allow underscores.
    $string = "userNaME4234432_";
    if (preg_match('/^[a-z\d_]{4,28}$/i', $string)) {
    echo "example 1 successful.";
    }

    Validating Telephone Numbers

    A much more interesting example would be matching telephone numbers (US/Canada.) We'll be expecting the number to be in the following form: (###)###-####
    $string = "(032)555-5555";
    if (preg_match('/^(\(?[2-9]{1}[0-9]{2}\)?|[0-9]{3,3}[-. ]?)[ ][0-9]{3,3}[-. ]?[0-9]{4,4}$/', $string)) {
    echo "example 2 successful.";
    }

    Thanks to Chris for pointing out that there are no US area codes below 200.

    Again, whether the phone number is typed like (###) ###-####, or ###-###-#### it will validate successfully. There is also a little more leeway than specifically checking for enough numbers, because the groups of numbers can have or not have parenthesis, and be separated by a dash, period, or space.

    Email Addresses

    Another practical example would be an email address. This is fairly straightforward to do. There are three basic portions of an email address, the username, the @ symbol, and the domain name. The following example will check that the email address is in the valid form. We'll assume a more complicated form of email address, to make sure that it works well with even longer email addresses.
    $string = "first.last@domain.co.uk";
    if (preg_match(
    '/^[^0-9][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/',
    $string)) {
    echo "example 3 successful.";
    }

    Postal Codes

    Validating Postal codes (Zip codes?,) is another practical example, but is a good example to show how ? works in regular expressions.

    $string = "55324-4324";
    if (preg_match('/^[0-9]{5,5}([- ]?[0-9]{4,4})?$/', $string)) {
    echo "example 4 successful.";
    }

    What the ? does in this example is saying that the extra 4 digits at the end can either not exist, or exist- but only once. That way, whether or not they type them in, it will still validate correctly.

    IP Addresses

    Without pinging or making sure it's actually real, we can make sure that it's in the right form. We'll be expecting a normally formed IP address, such as 255.255.255.0.
    $string = "255.255.255.0";
    if (preg_match(
    '^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$',
    $string)) {
    echo "example 5 successful.";
    }

    Hexadecimal Colors

    Moving right along with numbers, we could check for Hexadecimal color codes, in short hand or long hand format (#333, 333, #333333 or 333333) with an optional # symbol. This could be useful in a lot of different ways... maybe previewing CSS files? Grabbing colors off pages? The options are endless.
    $string = "#666666";
    if (preg_match('/^#(?:(?:[a-f\d]{3}){1,2})$/i', $string)) {
    echo "example 6 successful.";
    }

    Multi-line Comments

    - A simple way to find or remove PHP/CSS/Other languages multi-line comments could be useful as well.
    $string = "/* commmmment */";
    if (preg_match('/^[(/*)+.+(*/)]$/', $string)) {
    echo "example 7 successful.";
    }

    Dates

    - And my last simple, yet practical example would be dates, in my favorite MM/DD/YYYY format.
    $string = "10/15/2007";
    if (preg_match('/^\d{1,2}\/\d{1,2}\/\d{4}$/', $string)) {
    echo "example 8 successful.";
    }


    Thanks to Dave Doyle for correcting and improving the username, zip code, IP address, and date regular expressions.

    These are just some examples of the Regular Expressions I've written to “get the job done” for quite awhile. They work well for the uses in which I've needed them, and hopefully they'll be of some use to you as well.

    Have some regular expressions you're having a problem with? Check out our Guide for PHP Regular Expressions and PCRE Tester and Cheat Sheet. Looking for a regular expression to do something particular? Leave a comment, I'd love to hear what you have to say, and would love to hear some of your ideas for other regular expressions.





    The web page hosting is done by different websites at different rates. There are few backup software that includes the facility of keeping the backup of your website data, this is done for the security reasons. For the advertisements, the famous method is the pay per click program, which enables the website to pay through click calculations. The internet connection is important while hosting a website; different wireless internet providers give better services than the dial up connection. If a data is loss, data recovery group can help in retrieving the previous data. The pay per click program is an effective internet marketing program that can be implemented to boost the visitors on page and pay only when ads are clicked.

    Trackbacks

    Weblog: PHPDeveloper.org
    Tracked: Oct 16, 23:25
    8 expresiones regulares para PHP muy útiles
    En Devolio.com el autor hizo una lista de 8 expresiones regurales que considera útiles a la hora de programar con PHP, muchas de ellas utilizadas para validar nombres de usuario, números telefónicos, direcciones de email entre otras. Validar un nomb...
    Weblog: Carlos Leopoldo
    Tracked: Oct 19, 17:59
    Developer Corner: Link Redux
    Every once in a while it's nice to drop a few links for our developer friends - just to keep them on...
    Weblog: Website Magazine
    Tracked: Jan 11, 17:55
    Expresiones regulares en PHP
    Las expresiones regulares son una manera de describir un patr?n, que en PHP se puede usar para comparar, examinar o editar strings de manera pr?ctica. Las m?s conocidas son preg_match(), preg_replace() y preg_split(). Si sabemos usarlas correctamente podremos, por ejemplo, validar nombres de usuarios, n?meros de tel?fono o e-mails entre otras cosas. ?Conozc?moslas en profundidad!
    Weblog: www.enchilame.com
    Tracked: Jul 02, 15:56

    Comments
    Display comments as (Linear | Threaded)

    #1 - .mario said:
    2007-10-15 17:25 - (Reply)

    \w,/i! :-)

    http://www.regular-expressions.info/reference.html

    #2 - Ryan Ginstrom said:
    2007-10-15 21:49 - (Reply)

    Your telephone number regex seems to have a bug -- it will fail if there's a space after the parentheses.
    Pass: (555)555-5555
    Fail: (555) 555-5555

    #2.1 - Joey said:
    2007-10-15 22:00 - (Reply)

    Thanks for letting me know Ryan, I've fixed it now. Turns out I had pasted a version of it from an older file than I thought.

    #3 - Dave Doyle said:
    2007-10-15 23:47 - (Reply)

    If I could make some constructive suggestions (and forgive me, because I'm a Perl guy, not a PHP guy so forgive me if my syntax isn't quite perfect). Even these suggestions aren't perfect, but they're a little further down the road:

    Username Validation: '/^[a-z\d_]{4,28}$/i' (added \d character class and made case-insensitive)

    Phone: You may actually be better off just stripping all non-digit characters and checking for 7 or 10 digits as there are so many variations in how someone might enter a standard North American Phone number.

    Email address: I'm not gonna touch this one as it's damn hard to write a regexp to validate email reliably and I ain't that good.

    Postal Code (in this case zip): '^\d{5}(?:[- ]?\d{4})?$' ( if you need exactly x characters you need not do {x,x}, merely {x}; also specifying "non-capturing" braces so \1 is not populated)

    IP Address is actually wrong. You're enabling it to match numbers that are too high (299.299.299.299 is valid in your regex but not a valid IP). Something more like:
    '^(?:25[0-5]|2[0-4]\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|[1-9]\d|\d)){3}$' (limits to maximum 255)

    Hex Colors: '/^#(?:(?:[a-f\d]{3}){1,2})$/i' (once again, add case-insensitivity and \d character class, make non-capturing)

    Multiline comments: This may not actually catch multiline. if you add 's' after the '/' this enables . to match newlines. However, even adding this will break if there are nested comments in a section.

    Dates: Can be made a little stricter - '/^(?:1[012]|0?[1-9])\/(?:0?[1-9]|[12]\d|3[01])\/[12]\d{3}/' (restrict month to 1 to 12, day to 1 to 1 to 31, year to 1000 to 2999; even so, this doesn't quite validate as 2/30/2004 is perfectly acceptable, but not a real date... need some proper date checking for real assurance).

    Not trying to be nitpicky and all this was written in your little comment box so it's not tested. Just wanted to point out some potential problems.

    #3.1 - Joey said:
    2007-10-15 23:59 - (Reply)

    By all means, I appreciate the suggestions.

    Admittedly, I'm not the greatest regular expressions, these are just some I've used before that are relatively simple and worked for me. I'll go through and update the article later in the day, thank you for the advice.

    #3.2 - Takkie said:
    2007-10-16 04:05 - (Reply)

    @Dave: You've posted my thoughts about this article almost literally ;-)

    @Joey: Trying to validate an email address using regular expressions is nearly never worth the effort. Either you'll accept bogus email addresses but mostly (and worse!) the regular expression will be to strict and will exclude addresses that are perfectly valid. So either use a very non-strict regex like /^\S+@\S+\.\S+$/D which will match almost anything that's remotely similar to an email address or use a regex like described on this page: http://www.iamcal.com/publish/articles/php/parsing_email/

    As a final tip for anybody using the dollar sign in regular expressions: by default the dollar sign matches the end of the string or immediately before(!) the final character if it's a newline. To prevent this behavior (let $ only match the end of the subject string) you'll need to use the D modifier. (More info: http://php.net/manual/en/reference.pcre.pattern.modifiers.php)

    #3.2.1 - Jeroen 2007-10-16 09:11 - (Reply)

    As Takkie mentioned, you should use the /D modifier. This prevents form input hacks. Imagine I would write a script to send a "test@devolio.com\n,spamm0r1@hotmail.com,spamm0r2@hotmail.com,etc@hotmail.com" to a form that sends mail (or something similar). Your regexp will allow the input because it only matches this part: "test@devolio.com".

    Alternatively, you can use the \A and \z meta-characters.

    #3.2.1.1 - Jeroen 2007-10-16 09:14 - (Reply)

    Reading back my comment, I thought I should elaborate some more: use \A and \z instead of ^ and $ to indicate start and end of subject.

    #4 - Dave Doyle said:
    2007-10-15 23:53 - (Reply)

    whoops. i already see a mistake I made on IP:

    '^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$'

    (missed the 100's and forgot $ anchor)

    Anyhow, you get the idea.

    #5 - Jon 2007-10-16 07:04 - (Reply)

    Here's a CSV parsing regex:

    http://snippets.dzone.com/posts/show/4430

    #6 - Jos Hirth said:
    2007-10-16 08:23 - (Reply)

    The email regex isn't correct, I'm afraid. The RFC allows a lot more than that. Eg foo+bar@domail.tld is valid. Also any email addresses from x.org won't work.

    Check Drupal's implementation (includes/common.inc valid_email_address()). That one is pretty solid.

    #7 - TomB said:
    2007-10-16 15:21 - (Reply)

    You cannot verify an email for rfc822 compliance with just an regular expression. Since the are real classes doing that way better since forever one would just think word has gotten around. With PHP just use PEAR with Perl the CPAN for such things.
    To get an impression
    http://cvs.php.net/viewvc.cgi/pear/Validate/Validate.php?revision=1.120&content-type=text%2Fplain&pathrev=1.120

    #8 - Martin 2007-10-17 15:25 - (Reply)

    The world doesn't consist only of Americans you know, so most of these are useless unless you have a very specific target audience and never plan on providing international support.

    #8.1 - Dave Doyle 2007-10-17 15:32 - (Reply)

    You assume he wrote this for every programmer the world over? Why would you assume that?

    Offer some help. Show him other regexes that might be applicable in YOUR locale.

    #8.2 - Phil Steels said:
    2007-10-18 10:02 - (Reply)

    I've never understood why people make unnecessary comments.. eg. "The world doesn't consist only of Americans"

    Being British, it is frustrating that when you are searching for a solution 9 times out of 10 the answer leans towards America, but so what?! It gives you an idea which you can then modify to suit your own needs.

    I am grateful for ANY help I can get, no matter which part of the world it comes from.

    Thanks to everyone above for their time - I for one definately appreciate it! :-)

    Phil.

    #8.2.1 - Dave Doyle said:
    2007-10-18 10:19 - (Reply)

    Well, I'm Canadian. But as you said, you can take what you need and learn from that. (Granted, American Address/Phone info mirrors ours pretty closely anyhow... not as big a concern for me obviously)

    Here's some Canadian Bacon flavour then.

    Canadian Postal codes:

    preg_match('/\A([A-Z]\d[A-Z])\s*(\d[A-Z]\d)\z$/i', $the_postal_code, $matches ).

    So, we're using the \A and \z anchors instead of ^ and $, make it case insensitive, be a little loose and allow as much whitespace in the middle. If it's valid we can just do:

    $valid_post_code = strtoupper( $matches[1] . ' ' . matches[2] );

    Yay! Valid Canadian Postal Code!

    #9 - Jos Hirth said:
    2007-10-30 17:22 - (Reply)

    @Joey

    You should stop displaying the email addresses completely. The generic imaterrorist[at]whitehouse.gov obfuscation is already handled by most bots.

    And yes, the email address I used here (and only here) got already spammed.

    Sort that out ASAP.

    #10 - Chris 2007-12-01 12:42 - (Reply)

    I don't mean to be a jackass, but I've been looking around the net and it bothers me that the numbers [0-9] are included in the area code and/or the first 3 digits of a phone number. I may be stupid so bear with me.

    I checked out on a couple websites that North American area codes (and most likely the first 3 digits of a 7 digit phone number) do not contain [0,1] in the first digit.

    In other words shouldn't the area code or first 3 digits be [2-9]{1}[0-9]{2}? I'm no expert at regular expressions but at least that's what I think.

    #10.1 - Joey said:
    2007-12-01 13:01 - (Reply)

    Hey Chris,

    You're neither stupid nor a jackass, you're completely correct. Thanks for letting me know, fixing it now.

    #11 - Shycon Design said:
    2008-01-15 17:37 - (Reply)

    Just stumbling along and found this page. Looks very helpful as I dive into some more complex PHP programming. Thanks!

    #12 - Joe said:
    2008-02-06 17:27 - (Reply)

    I'm trying to write a regular expression. I only need the first match.

    I need it to give me a set of data between a given word and another given word. match(word1, $txtSearch, word2)

    The data looks like this:

    Word1: word, word, word, word, word,
    word, word, word, word, word,
    word, word, word, word, word,
    word, word
    Word2:

    So all I need is the data between these two words. Can anyone help me out?

    Thanks.

    #12.1 - Joey said:
    2008-02-08 00:26 - (Reply)

    Hey Joe,

    I can probably give you a good starting point. Could you show me a test case or two I could run them against?

    #12.1.1 - Joe said:
    2008-02-08 11:04 - (Reply)

    Nevermind I figured it out.

    I had to do a screenscape of a web site.
    I then stripped out all the HTML tags:
    $txtSearch = strip_tags($h->body);

    Next I had to remove everything before a certain word:
    // Get everything between the words "Synonyms:" and "Source:"
    preg_match('/\\Synonyms:\s?.+[a-b].+[^.]+?\s\Source/i', $txtSearch, $matches1);
    $newSearch1 = $matches1[0];

    Then I had to remove extra junk:
    // Remove the =20 and *.
    $newSearch2 = str_replace('=20', '', $newSearch1);
    $newSearch3 = str_replace('*', '', $newSearch2);

    Finally I had to remove the two words tthat initially defined my range of words:
    // Remove the words "Synonyms:" and "Source:"
    $newSearch4 = str_replace('Synonyms:  ', '', $newSearch3);
    $finalTxt = str_replace('Source', '', $newSearch4);

    Then I displayed the results:
    return $finalTxt;

    (This was in a function I made so I just returned the variable.)

    #13 - google searcher said:
    2008-04-29 20:00 - (Reply)

    /^[(/*)+.+(*/)]$/
    Your pattern to match multiline php comments doesn't make any sense to me.
    First of all you're using ^ and $ symbols in a POSIX like way, in PCRE (preg) ^ and $ mean the start and end of lines and to activate this meaning in your pattern you must use the 'm' modifier after the delimiter (which you didn't), else they are taken literally.
    Why would you need to verify the begg.of end of a line anyways at all when trying to match php multiline comments? I think this is irrelevant.

    I think this is a better alternative:
    ~/\*.*?\*/~s where '~' is the delimiter and 's' is the modifier to make '.' match newline characters as well. And '\\' is the escape character.

    if (preg_match('~/\*.*?\*/~s', '/* commmmment
    here is a valid php comment /**/')) {
    echo 'now it is successful!';
    }

    See it here:
    http://nancywalshee03.freehostia.com/regextester/regex_tester.php?seeSaved=6s7pah8k

    This page has great google rankings, maybe it will be more beneficial now. :-)

    #14 - Peter Li 2008-05-05 21:52 - (Reply)

    I think there is a small bug in your email regexp:

    ^[^0-9][a-zA-Z0-9_]+

    your regexp matches an email that does not start with a digit... this possibly correct, however it would allow any character including comma, plus, dollar etc.
    You probably really wanted just alpha chars. So you probably want to use:

    ^[^a-zA-Z][a-zA-Z0-9_]+


    hence your example would be:

    $string = "first.last@domain.co.uk";
    if (preg_match(
    '/^[^a-zA-Z][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/',
    $string)) {
    echo "example 3 successful.";
    }

    #15 - atomiku said:
    2008-05-10 19:27 - (Reply)

    Excellent, this is very very handy.

    Thanks!

    #16 - monk said:
    2008-05-13 18:15 - (Reply)

    Thanks a lot dude ! Great examples, great explained! Keep up with this!

    #17 - James 2008-06-30 03:50 - (Reply)

    Your email regex needs to take into account that you can infact have a plus + sign in the username part of the address.

    Here is a great function with regex to validate an email address:

    http://code.iamcal.com/php/rfc822/rfc2822.phps

    #18 - Graham said:
    2008-07-20 00:21 - (Reply)

    Just StumbleUpon'd this article, and it was very helpful. Even more helpful were your loyal readers and their comments. Definitely bookmark worthy. Thanks for your help!

    #19 - one.perfect.sunrise 2008-08-26 06:30 - (Reply)

    About "Validating Telephone Numbers":


    $phone_number = '0893010144';
    //$phone_number = '0893 010144';
    //$phone_number = '0893 01 01 44';
    //$phone_number = '+359893010144';
    //$phone_number = '+359893 01 01 44';
    //$phone_number = '(359) 893010144';
    //$phone_number = '(+359) 893010144';
    //$phone_number = '(+359) 893-01-01-44';
    //$phone_number = '(+359) 893 010 144';
    //$phone_number = '(+359) 893 01 01 44';
    //$phone_number = '(+359) 893 010 144';
    //$phone_number = '(+359)893010144';

    $pattern = '/^\(?\+?[0-9]{3}\)?([0-9- ]){6,13}$/';

    if (preg_match($pattern, $phone_number)) {
    echo 'ok ...';
    } else {
    echo 'bad ...';
    }

    Try this. The max length of phone number and (spaces or dash) must be between 6 and 13 characters (all). You can change this length if you want ...


    Add Comment

    Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
    Standard emoticons like :-) and ;-) are converted to images.
    E-Mail addresses will not be displayed and will only be used for E-Mail notifications