preg_quote

" width="11" height="7"/>

preg_match_all

Last updated: Mon, 05 Feb 2007

preg_match

(PHP 4, PHP 5)

preg_match — 正規表現によるマッチングを行う

説明

int preg_match ( string pattern, string subject [, array &matches [, int flags [, int offset]]] )

pattern で指定した正規表現により subject を検索します。

パラメータ

pattern

検索するパターンを表す文字列。

subject

入力文字列。

matches

matches を指定した場合、検索結果が代入されます。 $matches[0] にはパターン全体にマッチしたテキストが代入され、 $matches[1] には 1 番目ののキャプチャ用サブパターンにマッチした文字列が代入され、といったようになります。

flags

flags には以下のフラグを指定できます。

PREG_OFFSET_CAPTURE: このフラグを設定した場合、各マッチに対応する文字列のオフセットも返されます。これにより、返り値は配列となり、配列の要素 0 はマッチした文字列、要素 1は対象文字列中におけるマッチした文字列のオフセット値となることに注意してください。

offset

通常、検索は対象文字列の先頭から開始されます。オプションのパラメータ offset を使用して検索の開始位置を指定することも可能です。

注意: offset を用いるのと、 substr($subject, $offset) を preg_match_all()の対象文字列として指定するのとは等価ではありません。これは、pattern には、 ^, $ や (?<=x) のような言明を含めることができるためです。以下を比べてみてください。
<?php $subject = "abcdef"; $pattern = '/^def/'; preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3); print_r($matches); ?>
上の例の出力は以下となります。
Array
(
)

         
一方、この例を見てください。

<?php $subject = "abcdef"; $pattern = '/^def/'; preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE); print_r($matches); ?>
出力は以下のようになります。
Array
(
    [0] => Array
        (
            [0] => def
            [1] => 0
        )

)

         

返り値

preg_match() は、pattern がマッチした回数を返します。つまり、0 回（マッチせず）または 1 回となります。これは、最初にマッチした時点でpreg_match() は検索を止めるためです。逆にpreg_match_all()は、 subject の終わりまで検索を続けます。 preg_match() は、エラーが発生した場合にFALSEを返します。

変更履歴

バージョン	説明
4.3.3	パラメータ `offset` が追加されました。
4.3.0	フラグ `PREG_OFFSET_CAPTURE` が追加されました。
4.3.0	パラメータ `flags` が追加されました。

例

例 1613. 文字列 "php" を探す


<?php

// パターンのデリミタの後の "i" は、大小文字を区別しない検索を示す

if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {

    echo "A match was found.";

} else {

    echo "A match was not found.";

}

?>

例 1614. 単語 "web" を探す


<?php

/* パターン内の \b は単語の境界を示す。このため、独立した単語の

 *  "web"にのみマッチし、"webbing" や "cobweb" のような単語の一部にはマッチしない */

if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {

    echo "A match was found.";

} else {

    echo "A match was not found.";

}



if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {

    echo "A match was found.";

} else {

    echo "A match was not found.";

}

?>

例 1615. URL からドメイン名を得る


<?php

// get host name from URL

preg_match('@^(?:http://)?([^/]+)@i',

    "http://www.php.net/index.html", $matches);

$host = $matches[1];



// get last two segments of host name

preg_match('/[^.]+\.[^.]+$/', $host, $matches);

echo "domain name is: {$matches[0]}\n";

?>

上の例の出力は以下となります。


domain name is: php.net

注意

ティップ

ある文字列が他の文字列内に含まれているかどうかを調べるためだけに preg_match() を使うのは避けた方が良いでしょう。 strpos() か strstr() 関数を使う方が速くなります。

参考

preg_match_all()
preg_replace()
preg_split()

add a note User Contributed Notes
preg_match

Hayley Watson
22-Mar-2007 08:19


RFC3986 offers a regular expression for matching URIs. Modified slightly:



<?php preg_match(

'_^(?:([^:/?#]+):)?(?://([^/?#]*))?'.

'([^?#]*)(?:\?([^#]*))?(?:#(.*))?$_', 

$uri, $uri_parts); ?>



recognises anything that is a valid URI according to that specification; $uri_parts is of course an array with the elements

[0] => the entire URI (of course)

[1] => the protocol scheme (e.g., http)

[2] => the naming authority (the entity responsible for identifying the resource)

[3] => the path

[4] => the contents of any querystring

[5] => any value of any fragment



With the example



http://www.php.net/manual/en/function.preg-match.php#id5827438



those parts are

[0] => http://www.php.net/manual/en/function.preg-match.php#id5827438

[1] => http

[2] => www.php.net

[3] => /manual/en/function.preg-match.php

[4] => 

[5] => id5827438



Of course, the parse_url() function is probably a much more legible solution - but it's amusing to note that PHP uses over 200 lines of C to implement that function!

legolas558 d0t users dot sf dot net
18-Mar-2007 02:55


I wanted to match all php tags in a string but I found preg_match_all not to behave correctly:



<?php

// additional spacing added not to confuse markup highlighters

preg_match_all('/<\\?'.'php.*\\?'.'>/i', $content, $m);

?>



Returned only the single line php tags:

  [0]=>

  array(4) {

    [0]=>

    string(23) "<?php doSomething(); ?>"

    [1]=>

    string(19) "<?php echo 'something'; ?>"

    [2]=>

    string(34) "<?php doSomethingElse();?>"

    [3]=>

    string(35) "<?php echo 'something else'; ?>"

  }



So I gave up and wrote this small function:



<?php



function php_tags($s) {

// retrieve all PHP tags from $s

// PHP tags gatherer by legolas558

// initialize variables

$offset = 0;

$tags = array();

// loop till php tags are found

while (false !== ($p = strpos($s, '<'.'?php', $offset))) {

    $p += 6;

    $et = strpos($s, '?'.'>', $p);

    // could not find termination tag

    if ($et===false)

        break;

    // add tag to array

    $tags[]  substr($s, $p, $et-$p);

    // properly increment offset

    $offset = $et+2;

}

return $tags;

}



?>



This function does the job, however if you have used strings containing "<?php" and "?>" in your PHP code tags it will not work.

php at hood dot id dot au
05-Mar-2007 03:41


trevorhughdavis,



It's not that simple, for example di.fm is its own domain. What about something.com.au? But if you increase your 5 to 6, xx.xxx domains would then not be counted as international domains. And then there are longer 2LDs around.



To really do it properly would require a list of TLDs which require a 2LD.

trevorhughdavis at gmail dot com
04-Mar-2007 08:19


Getting the domain name out of a URL, improved



Foreign countries have more bits to their domains.



ie http://lse.ac.uk



ac.uk is not the hostname.  



function bigworldhostname($url){

    preg_match('@^(?:http://)?([^/]+)@i',$url, $matches);

    

    $host = $matches[1];



    preg_match('/[^.]+\.[^.]+$/', $host, $usHost);

     

    if (strlen($usHost[0])>5){

        return $usHost[0];

    }

    else {

        preg_match('/[^.]+\.[^.]+\.[^.]+$/', $host, $restofWorldHost);

        return $restofWorldHost[0];

    }

}

nospam1 at piacitelli dot org
24-Feb-2007 08:41


Concerning note by warcraft at libero dot it , 20-Jul-2006 07:29.



The proposed code for checking the "codice fiscale" (italian fiscal ID) could rule out valid submissions. 



Indeed, issued "codici fiscali"  may be modified by the goverment in case of persons with same name birthday and birthplace, by turning one or more of the seven numbers into letters respecting the correspondence

0123456789 - LMNPQRSTUV.



Hence the correct code to check a submitted   "codice fiscale" is the following modification of warcraft's code:



$cf='/[a-z]{6}[0-9lmnpqrstuv]{2}[abcdehlmprst][0-9lmnpqrstuv]{2}'._

       '[a-z][0-9lmnpqrstuv]{3}[a-z]/i";



return preg_match($cf,$var);



namely: 



- there are 16 characters



- characters at positions (count start from 1) from 1-6, 9, 12 and 16 

are in [a-z];



- all other characters are in [0-9lmnpqrstuv]



Gherardo

anon
22-Feb-2007 12:59


axima's function that attempts to match preg_match a date has an error in his pattern.

if the day is the 20th or 10th then it fails.  so, you want to use: /^([1-9]|0[1-9]|[12][0-9]|3[01])\D

([1-9]|0[1-9]|1[012])\D

(19[0-9][0-9]|20[0-9][0-9])$/

izua dot richard at gmail dot com
07-Feb-2007 02:04


Be careful when mixing curl with preg_match.

If you view the source from one browser, set the CURLOPT_USERAGENT as the same string supplied by the browser. I lost over three hours figuring out why this regular expression



/<h2 class=r><a href=([\s\S]+?) class=l>([\s\S]+?)<\/a><\/h2>/



won't actually give me the link and name of search results (google search). I was reading the source from Opera, and identifying myself as Mozilla from curl.



i know it's really out of topic, but it looked to me like a preg_match bug at first.

David
04-Feb-2007 08:36


The regex a couple of notes below this one for checking email addresses that uses eregi will fail for e-mail addresses of the forms:



david+php@risner.org

david%php@risner.org



These are both allowed according to the RFC's and are used.  For example, the + is often used to force sorting to a particular folder.

Chortos-2 <chortos at inbox dot lv>
17-Dec-2006 01:29


With respect to escaped backslashes, this pattern would be the one to be used rather than Elwin’s:



/[^\\\](\\\\\\\)*["']/



(don’t forget to escape the appropriate quotation mark when using in real code). If escaping backslashes is not allowed, this one should do the thing:



/[^\\\]["']/



The pattern supplied by Elwin matches nothing else than *any* apostrophe, quotation mark or exclamation mark. By the way, the double backslash is compiled into a single backslash by the Zend Engine, and then PCRE thinks it’s just escaping the following character, so in the end it’s completely lost.

zohar at zohararad dot com
11-Dec-2006 03:03


After a lot of work I finally managed to write a function that closes open HTML tags in a string. Its not perfect but it works OK on simple strings. Hope it helps anyone.



<?php

function cleanOpenTags($str){

  //clean up tabs and new lines

  $str = preg_replace("/\s\s+|\t\t+|\n\r+|\n\n+|\r\r+/","",$str);

  $pattern = "/(<[^>]+>).*>?/smU";

  //split into HTML tags

  $text = preg_split($pattern,$str,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

    

  $i = -1;

  $ordered = array();

  $otag = "/<([a-z]+)[^>]*>/";

  $noclose = "/<[a-z]{2}\s\/>/";

  $block = "/<(p|div)>/";

    

  //order the string into HTML elements and place in array

  foreach($text as $part){

    if(preg_match($otag,$part) && !preg_match($noclose,$part)) $i++;    

    $ordered[$i] .= $part;

  }



  //now iterate through new array and find unclosed elements

  $clean = "";

  foreach($ordered as $key => $part){

    preg_match($otag,$part,$m);

    $ctag = "</$m[1]>";

    if(!preg_match($ctag,$part)) { //if no closing tag is present

      //find out if there are any other closing tags in the part.

      //if yes, append close tag before them

      if(preg_match("/(<\/[a-z]+>)/",$part,$m1, PREG_OFFSET_CAPTURE)){

        $pos = $m1[0][1]; //get str position of the close tag

        $ctag = $ctag.$m1[0][0];

        $ordered[$key] = substr_replace($part,$ctag,$pos,strlen($ctag));

        //otherwise append at end of string

      } else $ordered[$key] .= $ctag;

    }

    

    //remove duplicate close tags

    if(preg_match("/<\/([a-z]+)><\/([a-z]+)>/",$ordered[$key], $m2, PREG_OFFSET_CAPTURE)){ 

      if($m2[1][0] == $m2[2][0]) {

        $ordered[$key] = substr($ordered[$key],0,$m2[2][1]-2);

      }

    }

    $clean .= $ordered[$key];

  }

return $clean;

}

?>

chuckie
06-Dec-2006 11:19


This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:



<?php



function mb_preg_match($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = NULL, $pn_offset = 0, $ps_encoding = NULL) {

  // WARNING! - All this function does is to correct offsets, nothing else:

  //

  if (is_null($ps_encoding))

    $ps_encoding = mb_internal_encoding();



  $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));

  $ret = preg_match($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);



  if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))

    foreach($pa_matches as &$ha_subpattern)

      $ha_subpattern[1] = mb_strlen(substr($ps_subject, 0, $ha_subpattern[1]), $ps_encoding);



  return $ret;

  }



?>

Elwin van Huissteden
30-Nov-2006 07:05


Searching for a way to figure out how to make patterns in the preg_match command, I tried figuring out any piece of the patterns of others, and came to some conclusions... to try something out, i tried to make a pattern that could read if a backslash ( \ ) was used before a single quote ( ' ). I ended up with a pattern that read if any quotes where there at all, in any way possible.



Maybe people could also use this, but if you see what i did wrong, please do tell me:



<?php

$string1 = "a \'quoted\' \"string\"";

$string2 = "a 'quoted' string";

$string3 = "a &#039;quoted&#039; &quot;string&quot;";

$string4 = "a not-quoted string";



if (preg_match("/[\\'!']|[\\\"!\"]/",$string1)) {

echo $string1.' wrong way'."\n";

} else {

echo $string1.' right way'."\n";

}

// returns: a \'quoted\' "string" wrong way



if (preg_match("/[\\'!']|[\\\"!\"]/",$string2)) {

echo $string2.' wrong way'."\n";

} else {

echo $string2.' right way'."\n";

}

// returns: a 'quoted' string wrong way



if (preg_match("/[\\'!']|[\\\"!\"]/",$string3)) {

echo $string3.' wrong way'."\n";

} else {

echo $string3.' right way'."\n";

}

// returns: a 'quoted' "string" right way



if (preg_match("/[\\'!']|[\\\"!\"]/",html_entity_decode($string3))) {

echo html_entity_decode($string3).' wrong way'."\n";

} else {

echo html_entity_decode($string3).' right way'."\n";

}

// returns: a 'quoted' "string" wrong way



if (preg_match("/[\\'!']|[\\\"!\"]/",$string4)) {

echo $string4.' wrong way'."\n";

} else {

echo $string4.' right way'."\n";

}

// returns: a not-quoted string right way



?>

09-Nov-2006 11:34


I've seen lots of different and increasingly complicated ways to check an email on here. The one I use and which I copied from a book years back is:



<?php

if(eregi ("^[[:alnum:]][a-z0-9_.-]*@[a-z0-9.-]+\.[a-z]{2,6}$", stripslashes(trim($_POST['email']))))

{

  echo "good";

}

else

{

  echo "bad";

}

?>



It's never let me down.

preg regexp
21-Sep-2006 06:37


If you want to have perl equivalent regexp match:

$`, $& and $'

before the match, the match itself, after the match



Here's one way to do it:



echo preg_match("/(.*?)(and)(.*)/", "this and that",$matches);

print_r($matches);



$` = ${1};

$& = ${2};

$' = ${3};



Notice (.*) else the end won't match.



Note that if you only need $&, simply use ${0}.



Here's another way, which is a bit simpler to remember:



echo preg_match("/^(.*?)(and)(.*?)$/", "this and that",$matches);

print_r($matches);

Niklas Åkerlund
14-Sep-2006 12:23


My somewhat simpler version of Fred Schenk excelent code...



<?

  function is_email($Addr) 

  {

    $p = '/^[a-z0-9!#$%&*+-=?^_`{|}~]+(\.[a-z0-9!#$%&*+-=?^_`{|}~]+)*';

    $p.= '@([-a-z0-9]+\.)+([a-z]{2,3}';

    $p.= '|info|arpa|aero|coop|name|museum)$/ix';

    return preg_match($p, $Addr);

  }

?>

Fred Schenk
09-Sep-2006 09:01


I've seen some regexp's here to validate email addresses. I just wanted to add my own version which accepts the following formats in front of the @-sign:

1. A single character from the following range: a-z0-9!#$%&*+-=?^_`{|}~

2. Two or more characters from the previous range including a dot but not starting or ending with a dot

3. Any quoted string



After the @-sign there is the possibility for any number of subdomains, then the domain name and finally a two letter TLD or one of the TLD's specified.



And finally the check is case insensitive.



I used the info on this page and in the RFCs to come to this format. If anybody sees any possible improvement I'd be happy to hear from you.



<?php

function isValidEmailAddress($address='') {

  $pattern = '/^(([a-z0-9!#$%&*+-=?^_`{|}~]'._

                 '[a-z0-9!#$%&*+-=?^_`{|}~.]*'._

                 '[a-z0-9!#$%&*+-=?^_`{|}~])'._

             '|[a-z0-9!#$%&*+-?^_`{|}~]|'._

             '("[^"]+"))'._

             '[@]'._

             '([-a-z0-9]+\.)+'._

             '([a-z]{2}'._

                 '|com|net|edu|org'._

                 '|gov|mil|int|biz'._

                 '|pro|info|arpa|aero'._

                 '|coop|name|museum)$/ix';

  return preg_match ($pattern, $address);

}

?>



PS: I have the pattern-code on one line, but I had to split it up for this post...I hope it still works this way but haven't tested it. If the code gives an error try removing all the extra lines...

Eric Vautier
09-Sep-2006 03:00

Was stumped for a little while trying to match a word, but exclude the same word if part of a specific expression. For example: match "media" but not "windows media" or "quicktime media". What needed to be used in my case is the following: <?php /(?<!windows|quicktime)\smedia/ ?> ? - check < - look left ! - exclude windows|quicktime - the list of words to exclude \s - white space media - word to match

dave at kupesoft dot com
28-Aug-2006 11:30


Here's a way to run a regular expression and catch the match in one line



<?php



/* ... */



$match = preg_match('/^GET (.+) HTTP\/1\.1/U', $request, $match) ? $match[1] : false;



?>



~

Izzy
18-Aug-2006 04:27


Concerning the German umlauts (and other language-specific chars as accented letters etc.): If you use unicode (utf-8), you can match them easily with the unicode character property \pL (match any unicode letter) and the "u" modifier, so e.g.



<?php preg_match("/[\w\pL]/u",$var); ?>



would really match all "words" in $var - whether they contain umlauts or not. Took me a while to figure this out, so maybe this comment will safe the day for someone else :-)

Michal
16-Aug-2006 07:34


I think that it is not ablolutely clear. RFC 3696 allows \@ or "@" in the local part of address, so test <?php explode('@', $email, 3 ) == 2 ?> is not RFC3696 clear ! It can make mistakes in valid addresses like those:



"Abc@def"@example.com or 

 Abc\@def@example.com

WK Group
21-Jul-2006 09:34


In spite of the note on the page with the posting form saying that my regex is 'almost certainly wrong,' I still think it may be worth to post it here :-)



The regex from the earlier message has a few defects (according to the email syntax described in RFC 3696):

- permits the local part to end with a dot,

- the size of the local part is not restricted,

- the length of the domain part is checked incorrectly (eg, allows up to 70 characters instead of 64),

- doesn't allow a dot at the very end.



I'd like to suggest the following function for validating emails (though I'm not sure whether or not neighbouring hyphens are allowed in domain part):



<?php

function is_email($email){

    $x = '\d\w!\#\$%&\'*+\-/=?\^_`{|}~';    //just for clarity



    return count($email = explode('@', $email, 3)) == 2

        && strlen($email[0]) < 65

        && strlen($email[1]) < 256

        && preg_match("#^[$x]+(\.?([$x]+\.)*[$x]+)?$#", $email[0])

        && preg_match('#^(([a-z0-9]+-*)?[a-z0-9]+\.)+[a-z]{2,6}.?$#', $email[1]);

}

?>

warcraft at libero dot it
20-Jul-2006 11:29


Useful functions for Italians:



Codice Fiscale (fiscal code):

preg_match("/[a-z]{6}[0-9]{2}[abcdehlmprst]{1}[0-9]{2}[a-z0-9]{5}/i", $text)



To Detect a percentual: -20% +50% etc



function is_percent($text)

    {

    return preg_match("/[\+-] *[0-9]{1,2}%/", $text);

    }

pnomolos --- gmail --- com
13-Jul-2006 05:36


A note to lcvalentine and najeli ... to make things much easier, you can include 'x' in the regular expression modifiers which makes whitespace outside of character classes mean nothing (meaning you don't have to remove breaks), as well as allowing you to comment the regular expression... like so!



preg_match( "/^

     [\d\w\/+!=#|$?%{^&}*`'~-] # Wow that's ugly looking

     [\d\w\/\.+!=#|$?%{^&}*`'~-]*@ # So's that one

     [A-Z0-9]

     [A-Z0-9.-]{0,61}

     [A-Z0-9]\. # Letters or numbers, then a dot

     [A-Z]{2,6}$/ix", 'user@subdom.dom.tld'

);

najeli at gmail dot com
10-Jul-2006 03:03


A little comment to lcvalentine mail validation expression - it recognizes emails like "user@fo.com" as non valid, but there are some valid ones (at least in Poland, ex. wp.pl, o2.pl etc.).



After changing {1,61} to {0,61} everything works fine, I hope.



<?php



preg_match( "/^

     [\d\w\/+!=#|$?%{^&}*`'~-]

     [\d\w\/\.+!=#|$?%{^&}*`'~-]*@

     [A-Z0-9]

     [A-Z0-9.-]{0,61}

     [A-Z0-9]\.

     [A-Z]{2,6}$/i", 'user@subdom.dom.tld'

);



?>



(remove breaks)

volkank at developera dot com
07-Jul-2006 11:01


I will add some note about my last post.



Leading zeros in IP addresses can cause problems on both Windows and Linux, because one can be confused if it is decimal or octal (if octal not written properly)



"66.163.161.117" is in a decimal syntax but in "066.163.161.117" the first octet 066 is in octal syntax.

So "066.163.161.117" is recognized as  decimal "54.163.161.117" by the operating system.

BTW octal is alittle rare syntax so you may not want or need to match it.



***

Unless you specially want to match IP addresses including both decimal and octal syntax; you can use Chortos-2's pattern which is suitable for most conditions.



<?php 

//DECIMAL syntax IP match



//$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";

$num='(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])';



if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) //validate IP

...



preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match); //collect IP addresses from a text(notice that ^$ not present in pattern)

...



?> 



***

Also my previous pattern still have bug and needs some changes to correctly match both decimal and octal syntax.

steve at webcommons dot biz
06-Jul-2006 03:48


This function (for PHP 4.3.0+) uses preg_match to return the regex position (like strpos, but using a regex pattern instead):



  function preg_pos($sPattern, $sSubject, &$FoundString, $iOffset = 0) {

      $FoundString = NULL;

      

      if (preg_match($sPattern, $sSubject, $aMatches, PREG_OFFSET_CAPTURE, $iOffset) > 0) {

        $FoundString = $aMatches[0][0];

        return $aMatches[0][1];

      }

      else {

        return FALSE;

      }

  }



It also returns the actual string found using the pattern, via $FoundString.

lcvalentine at gmail dot com
24-May-2006 06:53


After doing some testing for my company and reading the RFCs mentioned on wikipedia, I have found that the following RegEx appears to match any standards-based e-mail address.



Please give test it in your own configs and respond if it works well so other users don't waste too much time looking:



(remove breaks)



<?php



preg_match( "/^

     [\d\w\/+!=#|$?%{^&}*`'~-]

     [\d\w\/\.+!=#|$?%{^&}*`'~-]*@

     [A-Z0-9]

     [A-Z0-9.-]{1,61}

     [A-Z0-9]\.

     [A-Z]{2,6}$/i", 'user@subdom.dom.tld'

);



?>

agilo3 at gmail dot com
22-May-2006 06:54


I seem to have made a few critical mistakes in my previous entry, to correct the problem I'm re-pasting my entry (I hope an admin can delete the other entry?):



I want to make an important notice to everyone using preg_match to validate a file used for inclusion.



If you use preg_match like so (like I have in the past):

<?php

 if (preg_match("/(.*)\.txt$/", $_GET['file'])) {

  include($_GET['file']);

 }

?>



Be sure to know that you can get around that security by using a null string terminator, like so:

page.php?file=/etc/passwd%00.txt



Quick explanation: strings end in a null string terminator which is what seperates strings (%00 is hex for the null string terminator character).

What this does is effectively rule out everything after %00 and validate this string (if I understand correctly by the way preg_match handles this) leading in the inclusion of the servers' /etc/passwd file.



One way to go around it is by doing something like this:

<?php

 if (preg_match("/^[a-zA-Z0-9\/\-\_]+\.txt$/", $_GET['file'])) {

  include($_GET['file']);

 }

?>

Which will check if (from the start) it consists of alphanumberic characters and can possibly contain a slash (subdirectories), underscore and a dash (used by some in filenames) and ends at the end of the string in ".txt".

azuretek at gmail dot com
19-May-2006 03:34


If you want to use a prefix for a folder in your project you can do as follows. In my case I wanted to make our dev and prod environments include based on whichever folder we were using as the document root. Though it would be also useful if you want a system to act differently based on which folder it resides in, eg. different results for email, security, and urls. This will return the proper info no matter where the file is as long as it's contained within the document root.



You can do it like this:



$envPrefix = $_ENV['PWD'];

preg_match('/\/.*\/(.*)_project/', $envPrefix, $matches);

$envPrefix = $matches[1];



will return:



array(2) {

  [0]=>

  string(25) "/home/dev_project"

  [1]=>

  string(3) "dev"

}



You can then use that prefix to include the proper files. This method is useful for developers with seperate copies of their project, live and dev. It helps with merging updates, less or nothing to change in between each copy.

ickata at ickata dot net
17-May-2006 04:23


If you want to perform a case-insensitive match with

cyrillic characters, there is a problem if your server is 

Linux-based. So here is a useful function to perform 

a case-insensitive match:



<?php

// define some functions which will transform a string

// to lower- or uppercases, because strtolower() and

// strtoupper do not work with cyrillic characters:



function cstrtolower($str) {

   return strtr($str, 

"АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ",

 "абвгдежзийклмнопрстуфхцчшщъьюя");

}



function cstrtoupper($str) {

   return strtr($str, 

"абвгдежзийклмнопрстуфхцчшщъьюя",

 "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ");

}



// define a function which will create our filter  -we 

// need to break apart the characters of the keyword 

// into groups, containing the lowercase and the 

// uppercase of each character:



function createFilter ($string) {

    $string = cstrtolower($string);

    for ($i=0;$i<strlen($string);$i++) {

        $letter_small = substr($string,$i,1);

        $letter_big = cstrtoupper($letter_small);

        $newstr .= ‘[’.$letter_small.$letter_big.‘]’;

    }

    return $newstr;

}



// Example:



$string = "Това е ТесТ - проверка";

$keyword = "тест";



if (preg_match ("/".createFilter($word)."/", $text))

   echo "A match was found.";

else echo "A match was NOT found!";



?>



And one more thing - if you want to perform a match

with a whole word only, do not use "/\bkeyword\b/",

use this:



<?php

preg_match ("/[А-Яа-яA-Za-z]keyword[А-Яа-яA-Za-z]/");

?>

axima at prameasolutions dot com
11-May-2006 08:34


function to validate and extract date parts from the string in following formats:

dd-mm-yyyy or dd/mm/yyyy or dd mm yyyy

d-m-yyyy and so on

actually delimiter is what ever exept a number



funtion get_correct_date($input = ''){

//clean up the input

$input = trim($input);

//matching pattern 

$pattern = '/^([1-9]|0[1-9]|[12][1-9]|3[01])\D([1-9]|0[1-9]|1[012])\D

(19[0-9][0-9]|20[0-9][0-9])$/';

//check the input

preg_match(input ,$pattern, $parts);

return $parts;

}



function will return empty array on failure or 

parts of the date as array(dd,mm,yyyy);

note: remove empty space from pattern if you copy it from here

martin at kouba dot at
04-Apr-2006 08:02


in reply to "brodseba at brodseba dot com"



hmm, wouldn't it be much easier to do it this way?



function preg_match_between($a_sStart, $a_sEnd, $a_sSubject)

{

  $pattern = '/'. $a_sStart .'(.*?)'. $a_sEnd .'/';

  preg_match($pattern, $a_sSubject, $result);



  return $result[1];

}

brodseba at brodseba dot com
21-Mar-2006 03:56


This little function is self-explaining.



function preg_match_between($a_sStart, $a_sEnd, $a_sSubject)

{

  $pattern = '/'. $a_sStart .'(.*?)'. $a_sEnd .'/';

  preg_match($pattern, $a_sSubject, $result);



  $pattern = '/'. $a_sStart .'/';

  $result = preg_replace($pattern, '', $result[0]);



  $pattern = '/'. $a_sEnd .'/';

  $result = preg_replace($pattern, '', $result);



  return $result;

}

Chris Shucksmith <chris at shucksmith dot com>
05-Mar-2006 03:05


I created the following snippit to parse fixed width tabular data to an array of arrays. I use this to create a HTML table showing the output from a linux shell command. It requires an array of column widths used to build a regular expression. This is passed to preg_match_all in multiline mode over the entire command line output. $matches is examined and the table built.



This example formats a table of SIP Peers connected to an Asterisk VOIP Server. The command output looks like:



| # asterisk -rqx 'sip show peers'

| Name/username              Host            Dyn Nat ACL Port      Status

| 308/308                    (Unspecified)    D          0       UNKNOWN

| 303/303                    45.230.86.123    D   N       5060     OK (84 ms)

| 302/302                    192.168.14.71    D          5060     OK (80 ms)

| 6 sip peers [3 online , 3 offline]



Code:

<table>

<tr><th></th> <th>Extension</th> <th>Host</th> <th>Dynamic</th> <th>NAT</th><th>ACL</th> <th>Port</th> <th>Status</th> </tr>

<?php

    $dfout = exec("sudo asterisk -rqx 'sip show peers'", $pss);

    $psline = implode("\\n", $pss); // merge command output with linebreaks



    $table = array(27,16,4,4,4,9,-1);   // table column widths (-1 = all remaining)

    unset($pattern);

    foreach($table as $t) {

        $pattern = ($t == -1) ? $pattern.'(.{0,})' : $pattern.'(.{'.$t.'})';

    }

    $pattern = '/^'.$pattern.'$/m';    // anchor pattern on lines, multiline mode



    if (preg_match_all($pattern, $psline, $matches,PREG_SET_ORDER)) {

        unset($matches[0]);     // remove first row (column headings)

        // print_r($matches);   // handy to visualise the data structure



        foreach ($matches as $m) {

                echo '<tr><td>';

                if (strpos($m[7],'OK')      !== false) echo '<img src="/img/dg.png">';

                if (strpos($m[7],'LAGGED')  !== false) echo '<img src="/img/dy.png">';

                if (strpos($m[7],'UNKNOWN') !== false) echo '<img src="/img/dr.png">';

                echo '</td><td>'.$m[1].'</td><td>'.$m[2].'</td><td>'.$m[3].'</td><td>';

                echo $m[4].'</td><td>'.$m[5].'</td><td>'.$m[6].'</td><td>'.$m[7];

                echo '</td><tr>';

        }

  } else {

        echo '<img src="/img/dr.png"> Connection to server returned no data.';

  }

?>

</table>

jpittman2 at gmail dot com
23-Feb-2006 02:08


Here's an extremely complicated regular expression to match the various parts of an Oracle 8i SELECT statement.  Obviously the SELECT statment is contrived, but hopefully this RegExp will work on just about any valid SELECT statement.  If there are any problems, feel free to comment.



<?

$sql = "SELECT /*+ ORDERED */ UNIQUE foo, bar, baz FROM fly, flam"

     . " WHERE bee=1 AND fly=3"

     . " START WITH foo=1 CONNECT BY some_condition"

     . " CONNECT BY some_other_condition"

     . " GROUP BY a_bunch_of_fields HAVING having_clause"

     . " CONNECT BY some_other_connect_by_condition"

     . " INTERSECT (SELECT * FROM friday WHERE beep=1)"

     . " ORDER BY one, two, three"

     . " FOR UPDATE bee bop boo";



$re = "/^                                              # Match beginning of string

         SELECT\\s+                                    # SELECT

         (?:(?P<hints>\\/\\*\\+\\s+.+\\s+\\*\\/|--\\+\\s+.+\\s+--)\\s+)? # Hints

         (?:(?P<DUA>DISTINCT|UNIQUE|ALL)\\s+)?        # (self-explanatory)

         (?P<fields>.+?)                              # fields

         \\s+FROM\\s+                                  # FROM

         (?:(?P<tables>.+?)                          # tables

          (?:\\s+WHERE\\s+(?P<where>.+?))?             # WHERE Clauses

          (?:\\s+

           (?:(?:START\\s+WITH\\s(?P<startWith>.+?)\\s+)? # START WITH

              CONNECT\\s+BY\\s+(?P<connectBy>.+?)        # CONNECT BY

           )                                           # Hierarchical Query

           |

           (?:GROUP\\s+BY\\s+(?P<groupBy>.+?)            # Group By

              (?:\\s+HAVING\\s+(?P<having>.+?))?)        # Having

          )*                                         # Hier,Group

          (?:\\s+(?P<UIM>UNION(?:\\s+ALL)?|INTERSECT|MINUS) # UNION,INTSECT,MINUS

             \\s+\\((?P<subquery>.+)\\))?                    # UIM subquery

          (?:\\s+ORDER\\s+BY\\s+(?P<orderBy>.+?))?      # Order by

          (?:\\s+FOR\\s+UPDATE\\s*(?P<forUpdate>.+?)?)? # For Update

         )                                           # tables

        $                                            # Match end of string

       /xi";

 

$matches = array();

preg_match($re, $sql, $matches);

var_dump($matches);

?>

volkank at developera dot com
17-Feb-2006 04:12


Correct IP matching Pattern:



Max's IP match pattern fail on this IP '009.111.111.1',

Chortos-2's pattern  fail on both '09.111.111.1' and '009.111.111.1'



Most of other patterns written also fail if you use them in preg_match_all, they return incorrect IP

ie.

  $num="([0-9]|[0-9]{2}|1\d\d|2[0-4]\d|25[0-5])";

  $test="127.0.0.112 10.0.0.2";

  preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);

  print_r($match);

will print "127.0.0.1" not "127.0.0.112"; so its wrong.



To make my pattern compatible with preg_match_all IP matching (parsing multiple IPs)

I write my pattern reverse order also.

  

This is my new IP octet pattern probably perfect:)

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";



/*

25[0-5]    => 250-255

2[0-4]\d   => 200-249

[01]?\d\d  => 00-99,000-199

\d         => 0-9

*/



<?

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

$ip_addr='009.111.111.100';

if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) echo "Wrong IP Address\\n";

echo $match[0];



?>



<?

    $num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

    $test="127.0.0.112 10.0.0.2";

    preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);

    print_r($match);

      

?>

roberta at lexi dot net
14-Feb-2006 03:25


How to verify a Canadian postal code!



if (!preg_match("/^[a-z]\d[a-z] ?\d[a-z]\d$/i" , $postalcode)) 

{

     echo "Your postal code has an incorrect format."

}

brunosermeus at gmail dot com
08-Feb-2006 05:21


I've created this function to let you see the ease of using regular expressions instead of using some class that are available online, and that are verry slow in proceeding.



This function is an RSS-reader that only need the URL as parameter.



<?php

function RSSreader($url)

{

$rssstring = file_get_contents($url);

preg_match_all("#<title>(.*?)</title>#s",$rssstring,$titel);   



preg_match_all("#<item>(.*?)</item>#s",$rssstring,$items);   

$n=count($items[0]);



for($i=0;$i<$n;$i++)

    {

    $rsstemp= $items[0][$i];

    preg_match_all("#<title>(.*?)</title>#s",$rsstemp,$titles);   

    $title[$i]= $titles[1][0];

    preg_match_all("#<pubDate>(.*?)</pubDate>#s",$rsstemp,$dates);  

    $date[$i]= $dates[1][0]; 

    preg_match_all("#<link>(.*?)</link>#s",$rsstemp,$links);   

    $link[$i]= $links[1][0];

    }    



echo "<h2>".$titel[1][0]."</h2>";

    for($i=0;$i<$n;$i++)

    {

        $timestamp=strtotime($date[$i]);

        $datum=date('d-m-Y H\hi', $timestamp);    

        if(!empty($title[$i])) echo $datum."\t\t\t <a href=".$link[$i]." target=\"_blank\">".$title[$i]."</a><br>";

    }

}



?>

patrick at procurios dot nl
30-Jan-2006 03:17


This is the only function in which the assertion \\G can be used in a regular expression. \\G matches only if the current position in 'subject' is the same as specified by the index 'offset'. It is comparable to the ^ assertion, but whereas ^ matches at position 0, \\G matches at position 'offset'.

bloopy.org
26-Jan-2006 08:18


Intending to use preg_match to check whether an email address is in a valid format? The following page contains some very useful information about possible formats of email addresses, some of which may surprise you: http://en.wikipedia.org/wiki/E-mail_address

Zientar
03-Jan-2006 12:54


With this function you can check your date and time in this format: "YYYY-MM-DD HH:MM:SS"



<?php



function Check_Date_Time($date_time)

{

 if (preg_match("/^([123456789][[:digit:]]{3})-

     (0[1-9]|1[012])-(0[1-9]|[12][[:digit:]]|3[01]) 

     (0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])$/", 

     $date_time, $part) && checkdate($part[2], $part[3], $part[1]))

    {

     return true;

    }

     else 

     {

      return false;

     }

}            



$my_date_time = "2006-01-02 16:50:15";



if (Check_Date_Time($my_date_time))

    {

    echo "My date '".$my_date_time."' is correct";

    }

    else

        {

    echo "My date '".$my_date_time."' is incorrect";

        }



?>

john at recaffeinated d0t c0m
28-Dec-2005 01:27


Here's a format for matching US phone numbers in the following formats:



###-###-####

(###) ###-####

##########



It restricts the area codes to >= 200 and exchanges to >= 100, since values below these are invalid.



<?php

$pattern = "/(\([2-9]\d{2}\)\s?|[2-9]\d{2}-|[2-9]\d{2})" 

         . "[1-9]\d{2}"

         . "-?\d{4}/";

?>

max99x [at] gmail [dot] com
07-Nov-2005 02:11


Here's an improvement on the URL detecting function written by [rickyale at ig dot com dot br]. It detects SRC, HREF and URL links, in addition to URLs in CSS code, and Javascript imports. It also understands html entities(such as &amp;) inside URLs.



<?php

function get_links($url) {

    if( !($body = @file_get_contents($url)) ) return FALSE;

    //Pattern building across multiple lines to avoid page distortion.

    $pattern  = "/((@import\s+[\"'`]([\w:?=@&\/#._;-]+)[\"'`];)|";

    $pattern .= "(:\s*url\s*\([\s\"'`]*([\w:?=@&\/#._;-]+)";

    $pattern .= "([\s\"'`]*\))|<[^>]*\s+(src|href|url)\=[\s\"'`]*";

    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>))/i";

    //End pattern building.

    preg_match_all ($pattern, $body, $matches);

    return (is_array($matches)) ? $matches:FALSE;

}

?>



$matches[3] will contain Javascript import links, $matches[5] will contain the CSS links, and $matches[8] will contain the regular URL/SRC/HREF HTML links. To get them all in one neat array, you might use something like this:



<?php

function x_array_merge($arr1,$arr2) {

    for($i=0;$i<count($arr1);$i++) {

        $arr[$i]=($arr1[$i] == '')?$arr2[$i]:$arr1[$i];

    }

    return $arr;

}



$url = 'http://www.google.com';

$m = get_links($url);

$links = x_array_merge($m[3],x_array_merge($m[5],$m[8]));

?>

rebootconcepts.com
06-Nov-2005 12:33


Guarantee (one) trailing slash in $dir:



<?php

$dir = preg_match( '|(.*)/*$|U', $dir, $matches );

$dir = $matches[1] . '/';

?>



For whatever reason,

<?php $dir = preg_replace( '|^([^/]*)/*$|', '$1/', $dir ); ?>

and

<?php $dir = preg_replace( '|/*$|U', '/', $dir ); ?>

don't work (perfectly).  The match, concat combo is the only thing I could get to work if there was a '/' within $dir (like $dir = "foo/bar";

phpnet_spam at erif dot org
27-Oct-2005 02:37


Test for valid US phone number, and get it back formatted at the same time:



  function getUSPhone($var) {

    $US_PHONE_PREG ="/^(?:\+?1[\-\s]?)?(\(\d{3}\)|\d{3})[\-\s\.]?"; //area code

    $US_PHONE_PREG.="(\d{3})[\-\.]?(\d{4})"; // seven digits

    $US_PHONE_PREG.="(?:\s?x|\s|\s?ext(?:\.|\s)?)?(\d*)?$/"; // any extension

    if (!preg_match($US_PHONE_PREG,$var,$match)) {

      return false;

    } else {

      $tmp = "+1 ";

      if (substr($match[1],0,1) == "(") {

        $tmp.=$match[1];

      } else {

        $tmp.="(".$match[1].")";

      }

      $tmp.=" ".$match[2]."-".$match[3];

      if ($match[4] <> '') $tmp.=" x".$match[4];

      return $tmp;

    }

  }



usage:



  $phone = $_REQUEST["phone"];

  if (!($phone = getUSPhone($phone))) {

    //error gracefully :)

  }

1413 at blargh dot com
07-Oct-2005 05:41


For a system I'm writing, I get MAC addresses in a huge number of formats.  I needed something to handle all of the following:



0-1-2-3-4-5

00:a0:e0:15:55:2f

89 78 77 87 88 9a

0098:8832:aa33

bc de f3-00 e0 90

00e090-ee33cc

::5c:12::3c

0123456789ab



and more.  The function I came up with is:



<?php

function ValidateMAC($str)

{

  preg_match("/^([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-

: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})$/", $str, $arr);

  if(strlen($arr[0]) != strlen($str))

    return FALSE;

  return sprintf("%02X:%02X:%02X:%02X:%02X:%02X", hexdec($arr[1]), hexdec($arr[2]), hexdec($arr[3]), hexdec($arr[4]), hexdec($arr[5]

), hexdec($arr[6]));

}



   $testStrings = array("0-1-2-3-4-5","00:a0:e0:15:55:2f","89 78 77 87 88 9a","0098:8832:aa33","bc de f3-00 e0 90","00e090-ee33cc","

bf:55:6e:7t:55:44", "::5c:12::3c","0123456789ab");

   foreach($testStrings as $str)

     {

       $res = ValidateMAC($str);

       print("$str => $res<br>");

     }



?>



This returns:



0-1-2-3-4-5 => 00:01:02:03:04:05

00:a0:e0:15:55:2f => 00:A0:E0:15:55:2F

89 78 77 87 88 9a => 89:78:77:87:88:9A

0098:8832:aa33 => 00:98:88:32:AA:33

bc de f3-00 e0 90 => BC:DE:F3:00:E0:90

00e090-ee33cc => 00:E0:90:EE:33:CC

bf:55:6e:7t:55:44 =>

::5c:12::3c => 00:00:5C:12:00:3C

0123456789ab => 01:23:45:67:89:AB

tlex at NOSPAM dot psyko dot ro
23-Sep-2005 03:34


To check a Romanian landline phone number, and to return "Bucharest", "Proper" or "Unknown", I've used this function:



<?

function verify_destination($destination) {

    $dst_length=strlen($destination);

    if ($dst_length=="10"){

        if(preg_match("/^021[2-7]{1}[0-9]{6}$/",$destination)) {

            $destination_match="Bucharest";

        } elseif (preg_match("/^02[3-6]{1}[0-9]{1}[1-7]{1}[0-9]{5}$/",$destination)) {

            $destination_match = "Proper";

        } else {

            $destination_match = "Unknown";

        }

    }

    return ($destination_match);

}

?>

paullomax at gmail dot com
07-Sep-2005 02:01


If you want some email validation that doesn't reject valid emails (which the ones above do), try this code (from http://iamcal.com/publish/articles/php/parsing_email)



function is_valid_email_address($email){



        $qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]';



        $dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]';



        $atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c'.

            '\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+';



        $quoted_pair = '\\x5c\\x00-\\x7f';



        $domain_literal = "\\x5b($dtext|$quoted_pair)*\\x5d";



        $quoted_string = "\\x22($qtext|$quoted_pair)*\\x22";



        $domain_ref = $atom;



        $sub_domain = "($domain_ref|$domain_literal)";



        $word = "($atom|$quoted_string)";



        $domain = "$sub_domain(\\x2e$sub_domain)*";



        $local_part = "$word(\\x2e$word)*";



        $addr_spec = "$local_part\\x40$domain";



        return preg_match("!^$addr_spec$!", $email) ? 1 : 0;

    }

webmaster at swirldrop dot com
27-Jul-2005 03:46


To replace any characters in a string that could be 'dangerous' to put in an HTML/XML file with their numeric entities (e.g. &#233 for � [e acute]), you can use the following function:



function htmlnumericentities($str){

  return preg_replace('/[^!-%\x27-;=?-~ ]/e', '"&#".ord("$0").chr(59)', $str);

};//EoFn htmlnumericentities



To change any normal entities (e.g. &euro;) to numerical entities call:

$str = htmlnumericalentities(html_entity_decode($str));

hippiejohn1020 --- attT --- yahoo.com
27-Jul-2005 01:38


Watch out when using c-style comments around a preg_match or preg_* for that matter. In certain situations (like example below) the result will not be as expected. This one is of course easy to catch but worth noting.



/* 

    we will comment out this section



    if (preg_match ("/anything.*/", $var)) {

        code here;

    }

*/



This is (I believe) because comments are interpreted first when parsing the code (and they should be). So in the preg_match the asterisk (*) and the ending delimiter (/) are interpreted as the end of the comment and the rest of your (supposedly commented) code is intrepreted as php.

ian at remove-this dot mecnet dot net
13-Jul-2005 11:26


Linux kernel 2.6.11 changed the format of /proc/net/ip_conntrack.  I have updated the regular expression mark@portinc.net created in a comment below so that his function works again.



// Updated this regular expression for kernel 2.6.11 changes to /proc/net/ip_conntrack

$GREP = '!([a-z]+) '    .// [1] protocol

      '\\s*([^ ]+) '        .// [2] protocl in decimal

      '([^ ]+) '            .// [3] time-to-live

      '?([A-Z_]|[^ ]+)?'    .// [4] state

      ' src=(.*?) '        .// [5] source address

      'dst=(.*?) '            .// [6] destination address

      'sport=(\\d{1,5}) '    .// [7] source port

      'dport=(\\d{1,5}) '    .// [8] destination port

      'packets=(.*?) '        .// [9] num of packets so far

      'bytes=(.*?) '        .// [10] num of bytes so far

      'src=(.*?) '            .// [11] reversed source

      'dst=(.*?) '            .// [12] reversed destination

      'sport=(\\d{1,5}) '    .// [13] reversed source port

      'dport=(\\d{1,5}) '    .// [14] reversed destination port

      'packets=(.*?) '        .// [15] reversed num of packets so far

      'bytes=(.*?) '        .// [16] reversed num of bytes so far

      '\\[([^]]+)\\] '        .// [17] status

      'mark=(.*?) '        .// [18] marked?

      'use=([0-9]+)!';         // [19] use

masterkumon at yahoo dot com
07-Jul-2005 09:49


LITTLE NOTE ON PATTERN FOR NOOBIE :

int preg_match ( string pattern, string subject)

"/^[a-z0-9 ]*$/"

/ =begin/end pattern

^=matching from exactly beginning of subject

$=matching from exactly end of subject

[]=match with any character in the "[]" brackets.

[a-z0-9 ]=match with any character between a to z OR 0 to 9 OR "space" (there is space between 9 and ])

*=the number of matching character in the subject can be 0 or more. Actually "[]"brackets only match for 1 character position so if you to 

    match 1 or more  use "+"

    match 0 or more use "*"

    match 5 characters use {5}

    match 5 to 6 characters use {5,6}

         on the "*" position in the example.

<?

preg_match ($pattern, $subject);

?>

$pattern="/^[a-z0-9 ]*$/";

    $subject="abcdefgadfafda65" ->TRUE

    $subject="abcdefg ad f afda 65" ->TRUE

$pattern="/[a-z0-9 ]*/";

    $subject="$$abcdefgadfafda65><" ->TRUE 

        why? because there's no "^" on the beginning and no "$" on the end of $pattern. So the regex matchs entire $subject and found the correct one on the middle, it is OK because there's no "^" and "$" boundary.

        If you put only one either "^" nor "$" the regex will matchs for head nor tail of $subject only.

LITTLE MORE ADVANCE

checking file name string that must contain 3 characters extension ".xxx" and the file name contains alphabetic character only.

here is the pattern "/^[a-zA-Z0-9]*\.[a-zA-Z0-9]{3}$/"

\.=there is "." character for extension separation. Constant character must be preceeded by an "\".

{3}= 3 characters extension



OTHER SPECIAL CHARACTERS. I haven't examined them.

.=wild card for any character

|=OR

Rasqual
05-Jul-2005 01:03


Do not forget PCRE has many compatible features with Perl.

One that is often neglected is the ability to return the matches as an associative array (Perl's hash).



For example, here's a code snippet that will parse a subset of the XML Schema 'duration' datatype:



<?php

$duration_tag = 'PT2M37.5S';  // 2 minutes and 37.5 seconds



// drop the milliseconds part

preg_match(

  '#^PT(?:(?P<minutes>\d+)M)?(?P<seconds>\d+)(?:\.\d+)?S$#',

  $duration_tag,

  $matches);



print_r($matches);

?>



Here is the corresponding output:

Array

(

    [0] => PT2M37.5S

    [minutes] => 2

    [1] => 2

    [seconds] => 37

    [2] => 37

)

ligjason at netscape dot net
28-Jun-2005 12:02


Regular Expression Library - http://www.regexlib.com/Default.aspx

i at camerongreen dot org
26-Jun-2005 12:01


The isvalidemail function has any number of things wrong with it, for a start there is a missing ) bracket so it won't compile.



Once you fix that, the delimiters used give me an error, so you need to enclose it in forward slashes.  Using word boundaries as delimeters is a bad idea as any string that contained a valid email anywhere in it (along with who knows what else, maybe a CSS attack or SQL injection) would be returned as a valid email.



Moving on it then only accepts emails in uppercase, the author of this expressions email address for instance won't pass his own regular expression. 



I don't have time at the moment to look up the appropriate rfc, but until someone puts up a better one here is my email checking function which at least compiles :)  



function isValidEmail($email_address) {

    $regex = '/^[A-z0-9][\w.-]*@[A-z0-9][\w\-\.]+\.[A-z0-9]{2,6}$/';



    return (preg_match($regex, $email_address));

}



Note : It doesn't accept emails with percentage signs (easy to change) and it requires the user id, first subdomain and last subdomain to start with a letter or number.



Cameron Green

MKP dev a.t g! mail d0t com (parseme)
24-Jun-2005 07:30


Domain name parsing is tricky with an RE. The simplest, most efficient method, is probably just to split the domain by . and individually verify each part, like so:

<?php

$email_address = 'tom@rocks.my.socks.tld'; // An RE subject



function verify_addr ($address) {

   $return = false;

   if (preg_match ('/^[\w.]+@([\w.]+)\.[a-z]{2,6}$/i', $address, $domain)) {

      $domain = explode ('.', $domain[0]);

      // Split the domain into sections wherein the last element is either 'co', 'org', or the likes, or the primary domain name

      foreach ($domain as $part) { // Iterate through the parts

         if (substr ($part, 0, 1) == '_' || substr ($part, strlen ($part) - 1, 1) == '_')

            $return = false; // The first or last character is _

         else

            $return = true; // The parts are fine. The address seems syntactically valid

      }

   }

   return $return;

}



if (verify_addr ($email_address)) {

   // The address is good, what now?

} else {

   // Address is invalid... Do something.

}

?>



An alternative would be to look for _. and ._ in the domain section, or to just ignore that restriction entirely and use this RE:

/^[\w.]+@(?:[\w.]{2,63}\.)+[a-z]{2,6}$/i

tom at depman dot com
23-Jun-2005 10:14


There have been several examples of abbreviating strings or an ellipse function like some may call it. Of course I couldn't find any until after I wrote this, so thought I'd share with you. Basically what this does is takes a long string (like a TEXT from MySQL) and shortens it to give a quick display of the text. But instead of chopping it in the middle of the word it looks for a period or a space using preg_match and chops there. Hope this helps someone.



<?php

// the maximum length of characters of text to display

$MAX_LEN = 50; 

$text_to_display = "Connect to a MySQL database or get 

        some other source for a long string you'd like 

        to display but don't want to chop the words in half";

/*

    check to see if the text is longer than our wanted length

    and if so find the last word less than the $MAX_LEN 

    and display the string up to that last word.

    If youre using PHP version 4.3 or above you may need to

    specify PREG_OFFSET_CAPTURE to get the preg_match to work.

*/

function abreviated_text( $text_to_display, $MAX_LEN=30 ){

    if ( strlen($text_to_display) > $MAX_LEN ){

        preg_match ( "/.* /", substr($text_to_display,0,$MAX_LEN), $found );

        $text_to_display_abr = substr("{$found[0]}",0,-1);

    }

    // if abreviation took place

    if ( $text_to_display_abr ) 

        // do something special like add a <a href=""> tag

        return $text_to_display_abr."...";

    else

        // simply display the text if it does not exceed the length

        return $text_to_display;

}

// use as

echo abreviated_text($text_to_display,$MAX_LEN);

/*

    prints out:

    Connect to a MySQL database or get some other...

    instead of:

    Connect to a MySQL database or get some other sour...

*/

?>

webmaster at swirldrop dot com
07-Jun-2005 10:05


An imporvement of the regular expression from hackajar <matt> yahoo <trot> com for e-mail addresses is this:



<?php

if(preg_match( '/^[A-Z0-9._-]+@[A-Z0-9][A-Z0-9.-]{0,61}[A-Z0-9]\.[A-Z.]{2,6}$/i' , $data)

) return true;

?>



This stops the domain name starting or ending with a hyphen (or a full stop), and limits the domain to a minimum 2 and a maximum 63 characters. I've also added a full stop in the last character class to allow for 63-character domain names with a country code like .org.uk.



The 63 character limit is just for the bit before the TLD (i.e. only 'php', not '.net'). I think this is right, but I'm not totally sure.

webmaster at m-bread dot com
07-Jun-2005 09:47


If you want to get all the text characters from a string, possibly entered by a user, and filter out all the non alpha-numeric characters (perhaps to make an ID to enter user-submitted details into a database record), then you can use the function below. It returns a string of only the alpha-numeric characters from the input string (all in lower case), with all other chracters removed.



<?php

function removeNonAN($string){

preg_match_all('/(?:([a-z]+)|.)/i', $string, $matches);

return strtolower(implode('', $matches[1]));

};//EoFn removeNonAN

?>



It took me quite a while tocome up with this regular expression. I hope it saves someone else that time.

hackajar <matt> yahoo <trot> com
06-Jun-2005 06:12


In regards to Stony666 email validator:



Per RFC 1035, domain names must be combination of letter, numbers and hyphens.  _ and % are not allowed.  They should be no less the 2 and no greater then 63 characters.  Hypens may not appear at begging or end.



Per RFC 822, email address- when regarding username - % generally not accepted in "new" (ha 1982 'new') format.  _ and - are OK (as well as "." but not in subdomain but rather "real" username)



here's something a little better:



if(preg_match('/^[A-Z0-9._-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i', $data)) return true;



Small problem with this thought, I can't wrap my mind around the limit domain name to 2-63 characters, nor how to check for hypens at begging and end.  Maybe someone else can toss in a better revision?

stoney666 at gmail dot com
25-May-2005 06:31


Update to my last entry, i noticed that the email validation function didnt actually work like it was supposed to. Here's the working version.



<?php

function validate_email($email_address) {

    if (preg_match("/^[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,6}$/i", $email_address)) {

        return true; }

    else { return false; }

}

?>

Chortos-2
14-May-2005 05:30


max wrote a fix for satch666's function, but it too has a little bug... If you write IP 09.111.111.1, it will return TRUE.



<?

$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";

 

/* 

\\d => numbers 0-9 

[1-9]\\d => numbers 10-99 --> Here was the bug in max's code

1\\d\\d => numbers 100-199

2[0-4]\\d => numbers 200-249

25[0-5] => numbers 250-255

*/



if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $$ip_addr)) echo "Wrong IP Address\\n";

?>



P.S. Why did you write [0-9] and not \\d?

Gaspard
10-May-2005 06:47


If someone need it.. It validates a birth date in format JJMMAAAA



<?php

if (preg_match("/

      ^(0[1-9]|[1-2]{1}[0-9]{1}|3[0-1]{1}) 

      (0[1-9]{1}|1[0-2]{1})

      (19[\d]{2}|200[0-5])$/", $date)



    echo "Ok" ;

?>

21-Apr-2005 08:37


If you are using an older version of PHP, you will find that preg_match(",", "foo,bar") works as one might like. However, for newer versions, this needs to be preg_match("/,/", "foobar"). You'll get an odd message about a delimiter if this is the problem.

MikeS
09-Apr-2005 05:34


For anyone that's looking around for info about preg_match crashes on long stings I may have a solution for you.  After wasting 2 hours I finally found out it is a bug w/ PCRE and not a problem w/ my input data or regex.  In my case I was able to turn on UnGreedy (U modifier) and it worked fine!  Before my regex would crash on strings around 1800 chars.  With no modification to the regex aside from the ungreeder modifier I ran it on strings up to 500,000 chars long! (not that it crashed at 500K, i just stopped trying to find a limit after that)



Of course this "fix" depends on the nature of regex and what you're trying to do.



Hope this helps someone!

max at clnet dot cz
07-Apr-2005 11:40


satch666 writed fix for the function valid_ipv4(), but it's not working good. I think that this code is realy functionaly. 



<?

$num="([0-9]|[0-9]{2}|1\d\d|2[0-4]\d|25[0-5])";

 

/* 

[0-9] => numbers 0-9 

[0-9]{2} => numbers 0-99 --> This is missing in satch666 code. It means, that if you write IP 25.213.110.1 function return FALSE!

1\d\d => numbers 100-199

2[0-4]\d => numbers 200-249

25[0-5] => numbers 250-255

*/



if (!preg_match("/^$num\.$num\.$num\.$num$/", $$ip_addr)) echo "Wrong IP Address\n";

?>

carsten at senseofview dot de
14-Mar-2005 08:57


The ExtractString function does not have a real error, but some disfunction. What if is called like this:



ExtractString($row, 'action="', '"');



It would find 'action="' correctly, but perhaps not the first " after the $start-string. If $row consists of



<form method="post" action="script.php">



strpos($str_lower, $end) would return the first " in the method-attribute. So I made some modifications and it seems to work fine.



function ExtractString($str, $start, $end)

{

    $str_low = strtolower($str);

    $pos_start = strpos($str_low, $start);

    $pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));

    if ( ($pos_start !== false) && ($pos_end !== false) )

    {

        $pos1 = $pos_start + strlen($start);

        $pos2 = $pos_end - $pos1;

        return substr($str, $pos1, $pos2);

    }

}

erm(at)the[dash]erm/dot/com
12-Mar-2005 06:15


This is a modified version of the valid_ipv4 function that will test for a valid ip address with wild cards.



ie 192.168.0.*

or even 192.168.*.1



function valid_ipv4($ip_addr)

{



        $num="(\*|[0-9]{1,3}|^1?\d\d$|2[0-4]\d|25[0-5])";



        if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))

        {

            print_r ($matches);

                return $matches[0];

        } else {

                return false;

        }

}

info at reiner-keller dot de
12-Feb-2005 03:03


Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like



<?php



$bolMatch = preg_match("/^[a-zA-Z������]+$/", $strData);



?>



use the setlocal command and the POSIX format like



<?php



setlocale (LC_ALL, 'de_DE');

$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);



?>



This works for any country related special character set.



Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).



Live can be so easy reading the manual ;-)

mikeblake a.t. akunno d.o.t net
25-Jan-2005 12:38


The author of ExtractString below has made an error (email at albert-martin dot com). 



if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false)



Should have been



if (strpos($str_low, $start) !== false && strpos($str_low, $end) !== false)



Note the slight variable name mistake at the second strpos

kalaxy at nospam dot gmail dot com
19-Jan-2005 12:20


This is another way of implimenting array_preg_match.  It also shows use of the array_walk() and create_function() functions.



<?php

// array_preg_match

//  make sure $subject is an array and it will return an array 

//  of all the elements in $subject that preg_match the 

//  $pattern.

function array_preg_match($pattern, $subject, $retainkey = false){

  $matches = '';        //make sure it's empty

  array_walk($subject, 

    create_function('$val, $key, $array', 

                    'if (preg_match("' . $pattern . '", "$val")) $array['. ($retainkey ? '$key':'') .'] = $val;'),

    &$matches);

  return $matches;

}

?>



kalon mills

hfuecks at phppatterns dot com
13-Jan-2005 10:11


Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)

29-Dec-2004 05:44


This is a constant that helps in getting a valid phone number that does not need to be in a particular format. The following is a constant that matches the following US Phone formats:



Phone number can be in many variations of the following:

(Xxx) Xxx-Xxxx

(Xxx) Xxx Xxxx

Xxx Xxx Xxxx

Xxx-Xxx-Xxxx

XxxXxxXxxx

Xxx.Xxx.Xxxx



define( "REGEXP_PHONE", "/^(\(|){1}[2-9][0-9]{2}(\)|){1}([\.- ]|)[2-9][0-9]{2}([\.- ]|)[0-9]{4}$/" );

carboffin at msn dot com
24-Dec-2004 12:54


Heres just some quick code intended to be used in validating url vars or input strings.



<?php

if(preg_match("/^[a-z0-9]/i", $file)){

# Include or validate existance, etc.

}

?>

satch666 at dot nospam dot hotmail dot com
18-Dec-2004 01:53


what a lapsus! where i said 'subpattern' at my post below, replace such by 'type of number' or by 'case';

satch666 at dot nospam dot hotmail dot com
18-Dec-2004 01:44


some fix for the function valid_ipv4() proposed by selt:



if trying, for example this wrong IP: 257.255.34.6, it is got as valid IP, getting as result: 57.255.34.6



the first subpattern of numbers defined at pattern matches with '257', because '57' is a valid string for '1?\d\d' pattern;  this happens because it is not added there some logic for the string limits ...; 



i have tried using '^1?\d\d$', and it works, as we are saying in plain english: if the string has 3 chars, then it is starting by '1' digit and followed by other 2, ending the string there; if it has 2 chars, then both are any digit; any other case out of this 2 doesnt match the pattern; in other words, it is defined the subrange of numbers from '10' to '199'



so the function would get as this (after modifying pattern and erasing a var, called $range,  not used at function):



<?

function valid_ipv4($ip_addr)

{

       $num="([0-9]|^1?\d\d$|2[0-4]\d|25[0-5])";



       if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))

       {

               return $matches[0];

       } else {

               return false;

       }

}

?>

internet at sourcelibre dot com
04-Dec-2004 01:34


This helped me to make a mask for all french characters. Just modify the $str in ordre to find your mask. 



<pre>

<?php

$str = "��������������������";

$strlen = strlen($str);

$array = array();

$mask = "/^[a-zA-Z";

for ($i = 0; $i < $strlen; $i++) {

    $char = $str{$i};

    $hexa = dechex(ord($char));

    echo htmlentities($char)." = ". $hexa . "\n";

    $array[$i] = $hexa;

    $mask .= '\\x' . $hexa;

}

$mask .= " ]+$/";

echo $mask;

?>

</pre>

zubfatal, root at it dot dk
25-Nov-2004 10:56


<?php



/**

 *  Search through an array using a regular expression, and return an array with the matched values

 *    Note: if $boolMatchesOnly is set to true, a new array will be created,

 *  regardless of $boolNewArray

 *

 * @author      Tim Johannessen <admin@stillborn.dk>

 * @version     1.0.0

 * @param        string    $strRegEx    Regular expressiom to search through the array

 * @param       array    $arrHaystack    The array to search through

 * @param        bool    $boolNewArray    If set to true, the index for the array will be re-assigned - If set to false the existing key value will be used

 * @return      array    Returns an array with the matches found in the array

 */



function array_preg_match($strRegEx = "", $arrHaystack = NULL, $boolNewArray = 0, $boolMatchesOnly = 0) {

    if (strlen($strRegEx) < 1) {

        return "ERR: \$strRegEx argument is missing.";

    }

    

    elseif ((!is_array($arrHaystack)) || (!count($arrHaystack) > 0)) {

        return "ERR: \$arrHaystack is empty, or not an array.";

    }

    

    else {

        unset($arrTmp);

        

        // search through $arrHaystack, and build new array

        foreach($arrHaystack as $key => $value) {

            if ($boolMatchesOnly) {

                if (preg_match_all($strRegEx, $value, $tmpRes)) {

                    $arrTmp[] = $tmpRes;

                }

            }

            

            else {

                if (preg_match($strRegEx, $value, $tmpRes)) {

                    if ($boolNewArray) { $arrTmp[] = $value; }

                    else { $arrTmp[$key] = $value; }

                }

            }

        }

        

        return $arrTmp;

    }

}



?>



// zubfatal

email at albert-martin dot com
23-Oct-2004 06:39


Here is a faster way of extracting a special phrase from a HTML page:



Instead of using preg_match, e.g. like this:

preg_match("/<title>(.*)<\/title>/i", $html_content, $match);



use the following:

<?php

function ExtractString($str, $start, $end) {

  $str_low = strtolower($str);

  if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false) {

    $pos1 = strpos($str_low, $start) + strlen($start);

    $pos2 = strpos($str_low, $end) - $pos1;

    return substr($str, $pos1, $pos2);

  }

}

$match = ExtractString($html_content, "<title>", "</title>");

?>

j dot gizmo at aon dot at
09-Oct-2004 10:00


in reply to rchoudhury --} pinkgreetings {-- com....



the code pasted below (with the switch statement) CANNOT work.



the construct works like this

<?php

switch ($key)

{

case <expr>:

 //this will be executed if $key==<expr>, where <expr> can be a literal, function call etc.

echo "1";

break;

}



//the following will work

switch (true)

{

case preg_match("/pattern/",$key):

//if $key matches pattern, preg_match will return true, and this case will be executed (true==true)

blablablabla();

break;

}

?>



however, it makes no sense to compare $key to the return value of preg_match(), and calling preg_match without a second parameter is utterly senseless as well (PHP can't smell what you want to compare pattern to)

the syntax error in your regular expression is the double slash in the beginning.



(RTFM)

rchoudhury --} pinkgreetings {-- com
18-Aug-2004 01:57


I was looking for an easy way to match multiple conditions inside a switch, and preg_match() seemed like a straightforward solution:



<?php

// stuff

foreach (func_get_arg(0) as $key => $value) {

    switch ($key) {

        case preg_match("//^(meta_keywords | meta_desc | doctype | xmlns | lang | dir | charset)$/"):

            $this->g_page_vars[$key] = $value;

            break 1;

        case preg_match("//^(site_title|site_desc|site_css)$/"):

            $this->g_page_vars[$key] = $g_site_vars[$key];

            break 1;

    }

// do stuff inside loop

}

// etc

?>



However, while it seemed to work on one server using php 4.3.8, where it accepted only one argument (pattern) and assumed the second one (subject) to be $key, another server running 4.3.8 breaks and returns an obvious warning of "Warning: preg_match() expects at least 2 parameters, 1 given". 



You probably think "why not just give preg_match a second argument then?" -- well, if we were to do that it'd be $key in this context, but that returns this error: "Warning: Unknown modifier '^'". So now the regex is bad?



One possible solution may lie in php.ini settings, though since I don't have access to that file on either server I can't check and find out.



http://www.phpbuilder.com/lists/php-developer-list/2003101/0201.php has some comments and other suggestions for the same concept, namely in using:

<?php

switch(true) {

    case preg_match("/regex/",$data):

    // etc.

}

?>

...but this doesn't address the current single argument problem. 



Either way, it's a useful way of working a switch, but it might not work.

ebiven
06-Jul-2004 05:53


To regex a North American phone number you can assume NxxNxxXXXX, where N = 2 through 9 and x = 0 through 9.  North American numbers can not start with a 0 or a 1 in either the Area Code or the Office Code.  So, adpated from the other phone number regex here you would get:



/^[2-9][0-9]{2}[-][2-9][0-9]{2}[-][0-9]{4}$/

05-May-2004 11:23


A very simple Phone number validation function.

Returns the Phone number if the number is in the xxx-xxx-xxxx format. x being 0-9.

Returns false if missing digits or improper characters are included.



<?

function VALIDATE_USPHONE($phonenumber)

{

if ( (preg_match("/^[0-9]{3,3}[-]{1,1}[0-9]{3,3}[-]{1,1}

      [0-9]{4,4}$/", $phonenumber) ) == TRUE ) {

   return $phonenumber;

 } else {

   return false;

   }

}



?>

selt
11-Feb-2004 08:11


Concerning a list of notes started on November 11; ie

<?

$num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";

?>

It is interesting to note that the pattern matching is done using precedence from left to right, therefore; an address such as 127.0.0.127 sent to preg_match with a hash for the matched patterns would return 127.0.0.1.



so, to obtain a proper mechanism for stripping valid IPs from a string (any string that is) one would have to use:



<?

function valid_ipv4($ip_addr)

{

       $num="(1?\d\d|2[0-4]\d|25[0-5]|[0-9])";

       $range="([1-9]|1\d|2\d|3[0-2])";



       if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))

       {

               return $matches[0];

       } else {

               return false;

       }

}

?>



thanks for all the postings ! They're the best way to learn.

mark at portinc dot net
03-Feb-2004 11:30


<?php // some may find this usefull... :)



$iptables = file ('/proc/net/ip_conntrack'); 

$services = file ('/etc/services');

$GREP = '!([a-z]+) '     .// [1] protocol 

        '\\s*([^ ]+) '     .// [2] protocl in decimal

        '([^ ]+) '        .// [3] time-to-live 

        '?([A-Z_]|[^ ]+)?'.// [4] state 

        ' src=(.*?) '     .// [5] source address 

        'dst=(.*?) '      .// [6] destination address

        'sport=(\\d{1,5}) '.// [7] source port 

        'dport=(\\d{1,5}) '.// [8] destination port 

        'src=(.*?) '      .// [9] reversed source

        'dst=(.*?) '      .//[10] reversed destination

        'sport=(\\d{1,5}) './/[11] reversed source port

        'dport=(\\d{1,5}) './/[12] reversed destination port

        '\\[([^]]+)\\] '    .//[13] status

        'use=([0-9]+)!';   //[14] use



$ports = array();

foreach($services as $s) { 

  if (preg_match ("/^([a-zA-Z-]+)\\s*([0-9]{1,5})\\//",$s,$x)) {

     $ports[ $x[2] ] = $x[1];

} }

for($i=0;$i <= count($iptables);$i++) { 

  if ( preg_match ($GREP, $iptables[$i], $x) ) {

     // translate known ports... . . 

     $x[7] =(array_key_exists($x[7],$ports))?$ports[$x[7]]:$x[7]; 

     $x[8] =(array_key_exists($x[8],$ports))?$ports[$x[8]]:$x[8]; 

     print_r($x);

  }  // on a nice sortable-table... bon appetite!

}

?>

nico at kamensek dot de
18-Jan-2004 04:31


As I did not find any working IPv6 Regexp, I just created one. Here is it:



$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';

$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';

$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';

$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';

$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';

$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';

$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';



patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.



$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$

|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";

brion at pobox dot com
01-Dec-2003 12:35


Some patterns may cause the PCRE functions to crash PHP, particularly when dealing with relatively large amounts of input data.



See the 'LIMITATIONS' section of http://www.pcre.org/pcre.txt about this and other limitations.

thivierr at telus dot net
24-Nov-2003 06:23


A web server log record can be parsed as follows:



$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';



if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',

  $line_in,

  $elements))

{

  print_r($elements);

}



Array

(

    [0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"

    [1] => 209.6.145.47

    [2] => -

    [3] => -

    [4] => 22/Nov/2003:19:02:30 -0500

    [5] => GET

    [6] => /dir/doc.htm

    [7] => HTTP

    [8] => 1.0

    [9] => 200

    [10] => 6776

    [11] => "http://search.yahoo.com/search?p=key+words=UTF-8"

    [12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"

)



Notes:  

1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL.  Double quotes can appear as %22 or \".  Both have to be handled correctly.  So, I strip off the double quotes in a second step.

2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.

3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.

4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format.  If the record is truncated, the agent field will not be delimited properly with a quote at the end.  So, both cases must be handled.

5) A hyphen  (- or "-") means a field has no value.  It is necessary to convert these to appropriate value (such as empty string, null, or 0).

6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data.  I never assume I've seen all cases.

nospam at 1111-internet dot com
12-Nov-2003 05:29


Backreferences (ala preg_replace) work within the search string if you use the backslash syntax. Consider:



<?php

if (preg_match("/([0-9])(.*?)(\\1)/", "01231234", $match))

{

    print_r($match);

}

?>



Result: Array ( [0] => 1231 [1] => 1 [2] => 23 [3] => 1 )



This is alluded to in the description of preg_match_all, but worth reiterating here.

bjorn at kulturkonsult dot no
01-Apr-2003 10:56


I you want to match all scandinavian characters (����������) in addition to those matched by \w, you might want to use this regexp:



/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/



Remember that \w respects the current locale used in PCRE's character tables.

add a note

preg_quote

" width="11" height="7"/>

preg_match_all

Last updated: Mon, 05 Feb 2007