I'm trying to extract the postal codes 开发者_StackOverflowfrom yell.com using php and preg_replace. I successfully extracted the postal code but only along with the address. Here is an example
$URL = "http://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=17824062&keywords=shop&layout=&companyName=&location=London&searchType=advance&broaderLocation=&clarifyIndex=0&clarifyOptions=CLOTHES+SHOPS|CLOTHES+SHOPS+-+LADIES|&ooa=&M=&ssm=1&lCOption32=RES|CLOTHES+SHOPS+-+LADIES&bandedclarifyResults=1";//get yell.com page in a string $htmlContent = $baseClass->getContent($URL); //get postal code along with the address $result2 = preg_match_all("/(.*)</span>/", $htmlContent, $matches);
print_r($matches);
The above code ouputs something like Array ( [0] => Array ( [0] => 7, Royal Parade, Chislehurst, Kent BR7 6NR [1] => 55, Monmouth St, London, WC2H 9DG .... the problem that I have is that I don't know how to extract only the postal code without the address because it doesn't have an exact number of digits (sometimes it has 6 digits and sometimes has only 5 times). Basically I should extract the lasted 2 words from each array . Thank you in advance for any help !
quick & dirty:
# your array item
$string = "7, Royal Parade, Chislehurst, Kent BR7 6NR";
# split on spaces
$bits = preg_split('/\s/', $string);
# last two bits
end($bits);
$postcode = prev($bits) . " " . end($bits);
echo $postcode;
See it run at: code pad
If you just need to match the last two words in a string, you can use this regex:
\b\w+\s+\w+$
This will match what it says: a word boundary, some non-empty word, some white spaces, then another word, followed by end of string anchor.
<?php
$text = "7, Royal Parade, Chislehurst, Kent BR7 6NR";
$result = preg_match("/\\b\\w+\\s+\\w+$/", $text, $matches);
print_r($matches);
?>
This prints:
Array
(
[0] => BR7 6NR
)
You may also make the regex more robust by allowing optional trailing white spaces after the last word \s*
, etc, but using the $
is the main idea.
精彩评论