preg_quote

" width="11" height="7"/>

preg_match_all

Last updated: Sun, 25 Nov 2007

preg_match

(PHP 4, PHP 5)

preg_match — 正規表現によるマッチングを行う

説明

int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags [, int $offset ]]] )

pattern で指定した正規表現により subject を検索します。

パラメータ

pattern

検索するパターンを表す文字列。

subject

入力文字列。

matches

matches を指定した場合、検索結果が代入されます。 $matches[0] にはパターン全体にマッチしたテキストが代入され、 $matches[1] には 1 番目ののキャプチャ用サブパターンにマッチした文字列が代入され、といったようになります。

flags

flags には以下のフラグを指定できます。

PREG_OFFSET_CAPTURE: このフラグを設定した場合、各マッチに対応する文字列のオフセットも返されます。これにより、返り値は配列となり、配列の要素 0 はマッチした文字列、要素 1は対象文字列中におけるマッチした文字列のオフセット値となることに注意してください。

offset

通常、検索は対象文字列の先頭から開始されます。オプションのパラメータ offset を使用して検索の開始位置を (バイト単位で) 指定することも可能です。

注意: offset を用いるのと、 substr($subject, $offset) を preg_match()の対象文字列として指定するのとは等価ではありません。これは、pattern には、 ^, $ や (?<=x) のような言明を含めることができるためです。以下を比べてみてください。
<?php $subject = "abcdef"; $pattern = '/^def/'; preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3); print_r($matches); ?>

上の例の出力は以下となります。
Array
(
)

         
一方、この例を見てください。

<?php $subject = "abcdef"; $pattern = '/^def/'; preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE); print_r($matches); ?>

出力は以下のようになります。
Array
(
    [0] => Array
        (
            [0] => def
            [1] => 0
        )

)

         

返り値

preg_match() は、pattern がマッチした回数を返します。つまり、0 回（マッチせず）または 1 回となります。これは、最初にマッチした時点でpreg_match() は検索を止めるためです。逆にpreg_match_all()は、 subject の終わりまで検索を続けます。 preg_match() は、エラーが発生した場合にFALSEを返します。

変更履歴

バージョン説明

4.3.3 パラメータ offset が追加されました。

4.3.0 フラグ PREG_OFFSET_CAPTURE が追加されました。

4.3.0 パラメータ flags が追加されました。

バージョン	説明
4.3.3	パラメータ `offset` が追加されました。
4.3.0	フラグ `PREG_OFFSET_CAPTURE` が追加されました。
4.3.0	パラメータ `flags` が追加されました。

例

Example#1 文字列 "php" を探す


<?php
// パターンのデリミタの後の "i" は、大小文字を区別しない検索を示す
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}
?>

Example#2 単語 "web" を探す


<?php
/* パターン内の \b は単語の境界を示す。このため、独立した単語の
 *  "web"にのみマッチし、"webbing" や "cobweb" のような単語の一部にはマッチしない */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}

if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}
?>

Example#3 URL からドメイン名を得る


<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
    "http://www.php.net/index.html", $matches);
$host = $matches[1];

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

上の例の出力は以下となります。

domain name is: php.net

注意

ヒント

ある文字列が他の文字列内に含まれているかどうかを調べるためだけに preg_match() を使うのは避けた方が良いでしょう。 strpos() か strstr() 関数を使う方が速くなります。

参考

preg_quote

" width="11" height="7"/>

preg_match_all

Last updated: Sun, 25 Nov 2007

add a note User Contributed Notes
preg_match

andy at jmedia web dawt com
07-Mar-2008 01:21


If you want to test for FALSE use === instead.



$result = preg_match("/badtest/J",$string);

if($result === FALSE) {

   // bad query

   error_log("Whoops!");

}  else {

   echo("Matched " . $result . " times");

}

Thijs van Beek
05-Mar-2008 07:56


To the comment below about the vallidation of phone numbers.



PEAR offers some briljant classes for phonenumber vallidation.



Check out http://pear.php.net/packages.php?catpid=50&catname=Validate



Regards

Thijs

contact at bradjasper dot com
22-Feb-2008 06:49


<?php



//    After not being able to find a comprehensive phone number expression,

//    I came up with my own which handles many ways to format a number

//

//    Accepted

//    490-5473    (559) 585-1635

//    (231)-826-4402    2072444529    315-789-7555 x52

//    708.333.0003    1-559-584-9639    308-882-7111 ext 7234

//

//    Rejected

//    765-0600 489-4151    60-415-5389    315-789-7555, x52



function validatePhoneNumber( $sPhoneNum ) {



    return preg_match('/^(1(\/|-|\s|.|))?(\(?\d{3}\)?)?(\/|-|\s|.|)?\d{3}'

                . '(\/|-|\s|.|)?\d{4}(\/|-|\s|.|)?((x|ext)(.*)\d+)?$/i', $sPhoneNum );

}



?>

roy at intelligent-imaging dot com
13-Feb-2008 04:35


In regard to Adlez below:



Your function 'Entities' returns an uninitialized variable. Rather a waste of time, don't you think? Perhaps you should check your code before submitting it and save everyone time ....

jasperhorn [AT] gma-remove-il (dot) com
05-Feb-2008 09:01


> This fixes BK's bugs when checking an email address:

> 

> "/^([a-z0-9._-](\+[a-z0-9])*)+@[a-z0-9.-]+\.[a-z]{2,6}$/i"



misplaced that * I guess



>

> Features:

> -Accepts + addressing, and must have characters after the +

> -No special characters such as: ! # $ % & ' * / = ? ^ ` { | } who needs 'em!

> -No spaces.

> 

> Caveats still remaining:

> -You should use trim() on your email address first.



Trimming... easy fix, add [:space:] or \s to the front and end of the expression



> -Allows multiple ...... dots



Needs a restructure of the expression - having a different character class before and after dots



> -Allows dots in the .wrong places.

> -Allows domain names with dashes - in the wrong places.



I would come up with the following (mind that I am just getting started with regexes as well):



"/^\s*[a-z][a-z0-9]*(\.[a-z0-9][a-z0-9-]*)*(\+[a-z0-9]*)?

@[a-z0-9][a-z0-9-]*(\.[a-z0-9][a-z0-9-]*)*\.[a-z]{2,6}\s*$/i"

(added linebreak due to technical reasons problems with php.net)



The only problem that still applies, is the dashes in the wrong places (though the only wrong place is right before a dot). Fixing this is tricky, I tried using lookbehind, but that did not work. You do have to keep in mind that there may only be one letter between two dots.

Paperweight
22-Jan-2008 02:03


This fixes BK's bugs when checking an email address:



"/^([a-z0-9._-](\+[a-z0-9])*)+@[a-z0-9.-]+\.[a-z]{2,6}$/i"



Features:

-Accepts + addressing, and must have characters after the +

-No special characters such as: ! # $ % & ' * / = ? ^ ` { | } who needs 'em!

-No spaces.



Caveats still remaining:

-You should use trim() on your email address first.

-Allows multiple ...... dots

-Allows dots in the .wrong places.

-Allows domain names with dashes - in the wrong places.

ghoti
17-Jan-2008 05:01


Another pointer regarding the example from BK below....  Email addresses can actually contain a great deal more non-alphanumeric characters than this regexp implies.  The characters are listed here in http://www.faqs.org/rfcs/rfc2822.html section 3.2.4.



For validating email addresses prior to an actual emailed challenge, I have been using the following regexp with eregi for years to match the left-hand side of an email address:



  ^[a-z0-9][a-z0-9!#$%&'*+/=?^_`{|}-]+$



Of course, the right-hand side is this:



  ^([a-z0-9][a-z0-9-]+\.)+[a-z][a-z]+$



Anything more restrictive violates standards.



Watch your quotes.

Peter
15-Jan-2008 10:57


Hi BK, your expression contains two errors:

- any address with a space in the middle of the first part will also be accepted

- addresses with a plus sign in the first part will not be accepted (though they are valid)

wannabe at php dot net
12-Jan-2008 11:35


Hey BK, why not just:



$email = trim($_POST["email"]);



You should make a habit out of doing this on anything submitted by a user anyway.

joseph at no-spam dot xylex dot net
28-Dec-2007 05:01


A quick example of using named recursion and negative lookaheads for finding the outermost div.  You can use this same idea for any type of nested tags.



<?php

$sample = 

"lead in text to capture <div>

    outside div text

    <div>

        inner div text

        <div>

            deep nested text

        </div>

    </div>

    bottom of outside div text

</div> end of text to capture";



preg_match(

'#^(?P<a>.*?)(?P<b>.?<div((.(?!<div))|(?P>b))*?.</div>)(?P<c>.*?)$#s', 

$sample, $matches);

echo "<pre>";



var_dump($matches);

//$matches['a'] == "lead in text to capture"

//$matches['b'] == the outermost <div> and child contents (with a leading space)

//$matches['c'] == " end of text to capture"

?>

ahigerd at timeips dot com
28-Dec-2007 03:19


One note on the regular expressions provided that claim to validate e-mail addresses: They're incomplete. To quote the note submission page, just a couple paragraphs above the box where you type in your note:



"(And if you're posting an example of validating email addresses, please don't bother. Your example is almost certainly wrong for some small subset of cases. See this information from O'Reilly Mastering Regular Expressions book [http://examples.oreilly.com/regex/readme.html] for the gory details.)"



That said, the expressions as provided aren't COMPLETELY irrelevant -- they WILL validate MOST e-mail addresses, and you won't really be blocking any significant portion of the population by using them. Just be aware of the limitations.

BK
23-Dec-2007 05:20


I found Frebby's post below from 28-Oct-2007 to be a rock solid way to validate a user's e-mail address, and it even accepts subdomains such as .co.uk. However, it fails if the user or the browser adds a space before the text entry (this sometimes happens when clicking into a form field, particularly in IE 6). 



There may be other ways to address that problem, but here's a simple fix that works well within basic form processing scripts:



<?php 



$email = $_POST["email"];

$errorurl = "your error page URL here" ;



if (!preg_match("/^[\ a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,6}$/i", $email)) {

header( "Location: $errorurl" );

   exit ;

}



?>



The difference? Just add a backwards slash and a space before the a-z portion of the first segment:



[\ a-z0-9._-]+@



That's it! Enjoy

bjorn dot padding at gmail dot com
18-Dec-2007 08:52


To test if a regular expression is syntactically correct:



<?

function preg_test($regex)

{

    if (sprintf("%s",@preg_match($regex,'')) == '')

    {

        $error = error_get_last();

        throw new Exception(substr($error['message'],70));

    }

    else

        return true;

}

?>



usage:



<?

if (preg_test('/.*/i'))

     print "correct!";



// Returns "correct!"

?>



<?

if (preg_test('/.**/i'))

     print "correct!";



// Throws exception with message 'Compilation failed: nothing to repeat at offset 2'

?>

noel at HATESPAM dot noelswanson dot com
07-Dec-2007 04:08


Not quite sure why no one has posted this before (unless I missed it somewhere) but "Example 1531. Getting the domain name out of a URL" clearly doesn't work for domains such as .co.uk.



Here is a simple improvement that does a much better job at extracting a domain from a URL (though not perfect). It assumes the following: country code TLD's have two letters (eg .uk, .jp, .au), and their subdomains have two or three letters, (eg .gov.uk, .co.uk). These are parsed in three parts. Anything else is parsed in two parts.



Hope it helps!



Noel



<?php

function extract_domain($url){

    preg_match('@^(?:http://)?([^/]+)@i', $url, $matches);

    $host = $matches[1];



    // get last three segments of host name if country code TLD with sub domain, eg .co.uk

    preg_match('/[^.]+\.[^.]{2,3}\.[^.]{2}$/', $host, $matches);

    if (empty($matches)) {

        // get last two segments of host name if generic TLD

        preg_match('/[^.]+\.[^.]+$/', $host, $matches);

    }

    return $matches[0];

}

?>

vladimir at pixeltomorrow dot com
02-Dec-2007 06:03


Here's a nice workaround to check if your regex is valid. 



Sometimes PHP may throw an error like:



Warning: preg_match() [function.preg-match]: Unknown modifier '$' in foo.php on line 2



You can't really tell if the 'false' value is actually a value returned because the rule isn't value, or because you regex rule doesn't really match your string.



To find out what is the deal with your regex rule (maybe you're building it on fly, etc), you can find out if the "false" returned result is really coming because the string doesn't match, or because a warning was issued.



For the last part, I like to use try... catch expressions, so it's highly recommended, let's have a look:



<?php



function testMyRule($rule, $string) {

// don't forget to enable warning reporting if disabled

try {



    /*

    catch the preg match output warning, inside buffer

    */



    ob_start(); // start bufffer

    $result = serialize(preg_match($rule, $string));  

    $pwarnings = ob_get_contents(); // get results, including the warning if any

    ob_end_clean(); // clean output



        if (strpos($pwarnings, 'Warning')) { // is warning?



            throw new Exception($pwarnings); 



        }



    return unserialize($result); //



} catch(Exception $e) {



    echo $e->getMessage();

    die();

}



} // end of function



?>



Now, you have to make sure error reporting will allow warnings, and next, we'll serialize the result of your preg_match function applied against your string. 



If this issues a warning, we catch inside the buffer, and later see if the buffer contains the warning. 



We're serializing/unserializing the result of our pregmatch, because if we  wouldn't serialize it, it would come back as a string, instead of boolean. 



Enjoy!



Vladimir Ghetau

boyan7640 at gmail dot com
19-Nov-2007 07:19


If you try to find the offset when searching in UTF-8 string (containing multibyte characters, like cyrillic characters) with preg_match, using the PREG_OFFSET_CAPTURE flag, you may have different result from what you expected.



First of all you must compiled PHP with Multibyte Support (mbstring). Then you must configure to use Multibyte Support functions (mb_*) or turn on some php Runtime Configurations (php.ini, apache vhost conf file, .htaccess or somewhere else):

     php_value           default_charset UTF-8

     php_value           mbstring.func_overload 7

     php_value           mbstring.internal_encoding UTF-8

     php_value           mbstring.detect_order UTF-8



When using preg_match with PREG_OFFSET_CAPTURE flag and UTF-8 string the function will count bytes and NOT characters, so 2 bytes but NOT 1 character for some multibyte character. That's way the offset will be more than what you expected.



My simple solution is using mb_strpos:

     ...

     preg_match($pattern, $found_text, $matches, PREG_OFFSET_CAPTURE);

     // This will convert $matches[0][1] multibyte byte length to multibyte character length (UTF-8)

     $matches[0][1] = mb_strpos($found_text, $matches[0][0]);

     ...



P.S. The $pattern variable must use "/u" switch for Unicode!!!



-------------------------------------------------

PHP Version 5.2.4

Multibyte regex (oniguruma) version 4.4.4

-------------------------------------------------

frebby at gmail dot com
28-Oct-2007 06:12


If you wonder how to check for correct e-mail and such (you can use it for usernames and anything you want, but this is for e-mail) you can use this little code to validate the users e-mail:



We'll assume that they have been processing a form, entering their e-mail as "email" and now PHP will take care of the rest:



$emailcheck = $_POST["email"];

if(!preg_match("/^[a-z0-9\å\ä\ö._-]+@

[a-z0-9\å\ä\ö.-]+\.[a-z]{2,6}$/i", $emailcheck))

$errors[] = "- Your e-mail is missing or is not valid."; 



(note that the preg_match had to be cut or I couldn't post it since it was too long so I cut it after @ so just put them together again.)



If we split the parts it would look like this:



 [a-z0-9._-]+@



This is the name of the email, such as greatguy3 (then @domain.com) so this allows dot, underscore and - aswell as alphabetical letters and decimals.



[a-z0-9.-]+\.



This is the domain part, note that there must be a dot after domain name, so it's harder to fake an email. Same here though, A-Z, 0-9, dot and - (if your domain has - in it, such as nintendo-wii.com) 



[a-z]{2,6}$/i



This is the last part of your email, the .com/.net/.info, whichever you use. The numbers between {} is how many letters are limited (in this case min 2 and max 6) and it would allow "us" up to "org.uk" and "museum" and only A-Z letters are used for obvious reasons. The "i" there is there so you can use both uppercase and lowercase characters. (A-Z & a-z)



So a valid email address with this code would be "coolguy3@cooldomain.com" and a nonvalid one would be "zûmgz^;*@hot_mail.bananá"



This is only the email part, so this is not the fullcode. Paste this in your form process to use it with the rest of your code!



Hope this helps.

erandra at gmail dot com
11-Oct-2007 04:39


Here is a sample code to check for alphabetic characters only with an exception to space, hyphen and single quotes using preg_match().



$alpha = "some very funny string'9-2'";



/* check for alphabets and hyphens, quotes and space in the string but no numbers */

  if(preg_match("/^[a-zA-Z\-\'\ ]+$/u", $alpha)){

  return 1;

  }else

  return 0;



one can just add a '\' followed by the character he wish to allow for use [\@].



i hope it would be helpful to some one their.



-Erandra

Adlez
26-Sep-2007 07:22


Quick function to filter input. 

Filters any javascript, html, sql injections, and RFI.



<?php

function entities($text){

 $text = "";

 for ( $i = 0; $i <= strlen($text) - 1; $i += 1) {

  $text .= "&#" .ord($text{$i});

 }

 return $eresult;

}

function filter($text){

 if (preg_match("#(on(.*?)\=|script|xmlns|expression|

javascript|\>|\<|http)#si","$text",$ntext)){

  $re = entities($ntext[1]);

  $text = str_replace($ntext[0],$re,$text);

 }

 $text = mysql_real_escape_string($text);

 return $text;

}

foreach ($_POST as $x => $y){

 $_POST[$x] = filter($y);

}

foreach ($_GET as $x => $y){

 $_GET[$x] = filter($y);

}

foreach ($_COOKIE as $x => $y){

 $_COOKIE[$x] = filter($y);

}

?>

Who Needs Email at Reg dot Ex
26-Aug-2007 03:17


regex for validating emails, from Perl's RFC2822 package:

 

http://en.wikipedia.org/wiki/Talk:E-mail_address

creature
01-Aug-2007 10:06


>>what about .mil, .golf,.tv etc etc



ICANN Does not list .golf TLD



A complete List of Top Level Domains from ICANN here:

http://data.iana.org/TLD/tlds-alpha-by-domain.txt



I also found this article about verifying Email-Adresses:

http://www.regular-expressions.info/email.html

razortongue
26-Jul-2007 08:47


Maybe it will sound obvious, but I've encountered this a few times...



If you are using preg_match() to validate user input, remember about including ^ and $ to your regex or take input from $matches[0] after successfully matching a pattern ie.

preg_match('/[0-9]+/', '123 UNION SELECT ... --') will return TRUE, but when you it in a SQL statement, injected code will be probably executed(if you don't escape user argument). Note that $matches[0] == '123', so it can be used as a valid input.

alexandre at NO-DAMN-SPAM-BOTS-gaigalas dot net
25-Jul-2007 05:44


Match and replace for arrays. Useful for parsing entire $_POST



Only array_preg_match examples:



<?php



function array_preg_match(array $patterns, array $subjects, &$errors = array()) {

    $errors = array();

    foreach ($patterns as $k => $v) preg_match($v, $subjects[$k]) or $errors[$k] = TRUE;    

    return count($errors) == 0 ? TRUE : FALSE;

}



function array_preg_replace(array $patterns, array $replacements, array $subject) {

    $r = array();

    foreach ($patterns as $k => $v) $r[$k] = preg_replace($v, $replacements[$k], $subject[$k]);    

    return $r+$subject;

}



$arr1 = array('name' => 'Alexandre', 'phone' => '44559999');



$arr2 = array('name' => '', 'phone' => '44559999c');



        array_preg_match(array(

            'name' => '#.+#', //Not empty

            'phone' => '#^$|(\d[^\D])+#' // Only digits, optional

        ), $arr1, $match_errors);

        print_r($match_errors); // Empty, it is ok.



        array_preg_match(array(

            'name' => '#.+#', //Not empty

            'phone' => '#^$|(\d[^\D])+#' // Only digits, optional

        ), $arr2, $match_errors);

        print_r($match_errors); // Two indexes, name and phone, both not ok.



?>

Antti Haapala
23-Jul-2007 07:22


Ne'er try to verify email address by using some random regex you just invented sitting on the toilet seat. It will not work properly. The proper regex for email validation is something along the lines of 



"([-!#$%&'*+/=?_`{|}~a-z0-9^]

+(\.[-!#$%&'*+/=?_`{|}~a-z0-9

^]+)*|"([\x0b\x0c\x21\x01-\x08\

x0e-\x1f\x23-\x5b\x5d-\x7f]|\\[\x

0b\x0c\x01-\x09\x0e-\x7f])*")@((

[a-z0-9]([-a-z0-9]*[a-z0-9])?\.)+[

a-z0-9]([-a-z0-9]*[a-z0-9]){1,1}|

\[((25[0-5]|2[0-4][0-9]|[01]?[0-9]

[0-9]?)\.){3,3}(25[0-5]|2[0-4][0-9

]|[01]?[0-9][0-9]?|[-a-z0-9]*[a-z0

-9]:([\x0b\x0c\x01-\x08\x0e-\x1f\

x21-\x5a\x53-\x7f]|\\[\x0b\x0c\x0

1-\x09\x0e-\x7f])+)\])". 



However, you shouldn't even try that regex. If you do not understand what that regexp does, then please do not try to write one yourself. If you need a _truly_ _valid_ e-mail address, no regexp is going to help you - just send a verification message to the user-supplied address with a link or code the user can paste to verify the address. IF you still WISH - against my recommendation - to use some validating regexp then *please* just make it warn loudly that the address may be invalid; do not write code that throws a fatal error outright. I am quite fed up with sites that do not accept my .name e-mail address, or some other valid, working forms for that matter.

David W.
11-Jul-2007 05:33


I just started using PHP and this section doesn't clarify whether or not you must use "/" as your regular expression delimiters.



I want to clarify that you can use almost any character as your delimiter. The delimiter is automatically the first character of your regular expression string. This makes it a bit easier if you are looking for things that might contain a forward slash. For example::



preg_match('#</b>#', $string);



Instead of:



preg_match('/<\/b>/', $string);



Or:



preg_match('@/my/dir/name/@', $string);



Instead of:



preg_match('/\/my\/dir\/name\//', $string);



This can greatly boost readability. Not quite as flexible as in Perl (You can't use control characters or \n which can really come in handy when you aren't quite sure what characters might be in your regular expression), but switching to another delimiter can make your code a bit easier to read.

chuckie
06-Dec-2006 11:19


This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:



<?php



function mb_preg_match($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = NULL, $pn_offset = 0, $ps_encoding = NULL) {

  // WARNING! - All this function does is to correct offsets, nothing else:

  //

  if (is_null($ps_encoding))

    $ps_encoding = mb_internal_encoding();



  $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));

  $ret = preg_match($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);



  if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))

    foreach($pa_matches as &$ha_subpattern)

      $ha_subpattern[1] = mb_strlen(substr($ps_subject, 0, $ha_subpattern[1]), $ps_encoding);



  return $ret;

  }



?>

Izzy
18-Aug-2006 04:27


Concerning the German umlauts (and other language-specific chars as accented letters etc.): If you use unicode (utf-8), you can match them easily with the unicode character property \pL (match any unicode letter) and the "u" modifier, so e.g.



<?php preg_match("/[\w\pL]/u",$var); ?>



would really match all "words" in $var - whether they contain umlauts or not. Took me a while to figure this out, so maybe this comment will safe the day for someone else :-)

patrick at procurios dot nl
30-Jan-2006 03:17


This is the only function in which the assertion \\G can be used in a regular expression. \\G matches only if the current position in 'subject' is the same as specified by the index 'offset'. It is comparable to the ^ assertion, but whereas ^ matches at position 0, \\G matches at position 'offset'.

bloopy.org
26-Jan-2006 08:18


Intending to use preg_match to check whether an email address is in a valid format? The following page contains some very useful information about possible formats of email addresses, some of which may surprise you: http://en.wikipedia.org/wiki/E-mail_address

john at recaffeinated d0t c0m
28-Dec-2005 01:27


Here's a format for matching US phone numbers in the following formats:



###-###-####

(###) ###-####

##########



It restricts the area codes to >= 200 and exchanges to >= 100, since values below these are invalid.



<?php

$pattern = "/(\([2-9]\d{2}\)\s?|[2-9]\d{2}-|[2-9]\d{2})" 

         . "[1-9]\d{2}"

         . "-?\d{4}/";

?>

phpnet_spam at erif dot org
27-Oct-2005 02:37


Test for valid US phone number, and get it back formatted at the same time:



  function getUSPhone($var) {

    $US_PHONE_PREG ="/^(?:\+?1[\-\s]?)?(\(\d{3}\)|\d{3})[\-\s\.]?"; //area code

    $US_PHONE_PREG.="(\d{3})[\-\.]?(\d{4})"; // seven digits

    $US_PHONE_PREG.="(?:\s?x|\s|\s?ext(?:\.|\s)?)?(\d*)?$/"; // any extension

    if (!preg_match($US_PHONE_PREG,$var,$match)) {

      return false;

    } else {

      $tmp = "+1 ";

      if (substr($match[1],0,1) == "(") {

        $tmp.=$match[1];

      } else {

        $tmp.="(".$match[1].")";

      }

      $tmp.=" ".$match[2]."-".$match[3];

      if ($match[4] <> '') $tmp.=" x".$match[4];

      return $tmp;

    }

  }



usage:



  $phone = $_REQUEST["phone"];

  if (!($phone = getUSPhone($phone))) {

    //error gracefully :)

  }

Rasqual
05-Jul-2005 01:03


Do not forget PCRE has many compatible features with Perl.

One that is often neglected is the ability to return the matches as an associative array (Perl's hash).



For example, here's a code snippet that will parse a subset of the XML Schema 'duration' datatype:



<?php

$duration_tag = 'PT2M37.5S';  // 2 minutes and 37.5 seconds



// drop the milliseconds part

preg_match(

  '#^PT(?:(?P<minutes>\d+)M)?(?P<seconds>\d+)(?:\.\d+)?S$#',

  $duration_tag,

  $matches);



print_r($matches);

?>



Here is the corresponding output:

Array

(

    [0] => PT2M37.5S

    [minutes] => 2

    [1] => 2

    [seconds] => 37

    [2] => 37

)

info at reiner-keller dot de
12-Feb-2005 03:03


Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like



<?php



$bolMatch = preg_match("/^[a-zA-Z������]+$/", $strData);



?>



use the setlocal command and the POSIX format like



<?php



setlocale (LC_ALL, 'de_DE');

$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);



?>



This works for any country related special character set.



Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).



Live can be so easy reading the manual ;-)

hfuecks at phppatterns dot com
13-Jan-2005 10:11


Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)

mark at portinc dot net
03-Feb-2004 11:30


<?php // some may find this usefull... :)



$iptables = file ('/proc/net/ip_conntrack'); 

$services = file ('/etc/services');

$GREP = '!([a-z]+) '     .// [1] protocol 

        '\\s*([^ ]+) '     .// [2] protocl in decimal

        '([^ ]+) '        .// [3] time-to-live 

        '?([A-Z_]|[^ ]+)?'.// [4] state 

        ' src=(.*?) '     .// [5] source address 

        'dst=(.*?) '      .// [6] destination address

        'sport=(\\d{1,5}) '.// [7] source port 

        'dport=(\\d{1,5}) '.// [8] destination port 

        'src=(.*?) '      .// [9] reversed source

        'dst=(.*?) '      .//[10] reversed destination

        'sport=(\\d{1,5}) './/[11] reversed source port

        'dport=(\\d{1,5}) './/[12] reversed destination port

        '\\[([^]]+)\\] '    .//[13] status

        'use=([0-9]+)!';   //[14] use



$ports = array();

foreach($services as $s) { 

  if (preg_match ("/^([a-zA-Z-]+)\\s*([0-9]{1,5})\\//",$s,$x)) {

     $ports[ $x[2] ] = $x[1];

} }

for($i=0;$i <= count($iptables);$i++) { 

  if ( preg_match ($GREP, $iptables[$i], $x) ) {

     // translate known ports... . . 

     $x[7] =(array_key_exists($x[7],$ports))?$ports[$x[7]]:$x[7]; 

     $x[8] =(array_key_exists($x[8],$ports))?$ports[$x[8]]:$x[8]; 

     print_r($x);

  }  // on a nice sortable-table... bon appetite!

}

?>

nico at kamensek dot de
18-Jan-2004 04:31


As I did not find any working IPv6 Regexp, I just created one. Here is it:



$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';

$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';

$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';

$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';

$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';

$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';

$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';



patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.



$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$

|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";

thivierr at telus dot net
24-Nov-2003 06:23


A web server log record can be parsed as follows:



$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';



if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',

  $line_in,

  $elements))

{

  print_r($elements);

}



Array

(

    [0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"

    [1] => 209.6.145.47

    [2] => -

    [3] => -

    [4] => 22/Nov/2003:19:02:30 -0500

    [5] => GET

    [6] => /dir/doc.htm

    [7] => HTTP

    [8] => 1.0

    [9] => 200

    [10] => 6776

    [11] => "http://search.yahoo.com/search?p=key+words=UTF-8"

    [12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"

)



Notes:  

1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL.  Double quotes can appear as %22 or \".  Both have to be handled correctly.  So, I strip off the double quotes in a second step.

2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.

3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.

4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format.  If the record is truncated, the agent field will not be delimited properly with a quote at the end.  So, both cases must be handled.

5) A hyphen  (- or "-") means a field has no value.  It is necessary to convert these to appropriate value (such as empty string, null, or 0).

6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data.  I never assume I've seen all cases.

bjorn at kulturkonsult dot no
01-Apr-2003 10:56


I you want to match all scandinavian characters (����������) in addition to those matched by \w, you might want to use this regexp:



/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/



Remember that \w respects the current locale used in PCRE's character tables.

add a note

preg_quote

" width="11" height="7"/>

preg_match_all

Last updated: Sun, 25 Nov 2007