I'm trying to debug some PHP but I am not so hot on my regex, can someone please translate this for me? (if even it is regex)
public static function fetch($number)
{
$number = str_replace(" ", "", $number);
$html = file_get_contents('http://w2.brreg.no/enhet/sok/detalj.jsp?orgnr=' . $number);
preg_match_all('/\<td style="width.*\<b\>(.*)[: ]*\<\/b\>/msU', $html, $keys);
preg_match_all('/\<\/b\>.*\<td.*\>(.*)\<\开发者_如何转开发/td\>/msU', $html, $values);
if (!$keys[1])
{
return null;
}
Kept the PHP snippet for context, if it helps :D Thanks :)
I'm only translating the first one, the second one is similar.
/ # regex delimiter
\<td style="width # match <td style="width (unnecessary escaping of < !)
.* # match anything (as few characters as possible, see below)
\<b\> # match <b> (again, unnecessary escaping!)
(.*) # match anything (lazily) and capture it
[: ]* # match any number of colons or spaces
\<\/b\> # match </b>
/msU # regex delimiter; multiline option (unnecessary),
# dot-all option (dot matches newline)
# and ungreedy option (quantifiers are lazy by default).
EDIT: U
is not the Unicode option, but the ungreedy option. My mistake. The regex isn't that bad after all :)
I'd suggest using these regexes instead:
/<td style="width.*?<b>(.*?)[: ]*<\/b>/s
/<\/b>.*?<td.*?>(.*?)<\/td>/s
More or less, it returns the {extracted}
part from <td style="width ..."><b>{extracted}: </b>
To help understand regular expressions I recommend downloading Expresso (for Windows) which is a free (but registration required) expression parser and testing tool.
I believe its trying to match the following structure:
<td width=.....><b>key:</b></td><td>value</td>
Its parsing the string twice, once for keys, which are taken from the first column, and a second time for values, which are taken from the second column.
I you want an advice, your regex may won't work as expected. In your case, it's better to use xpath.
See this snippet :
$str = "
<html>
<body>
<table>
<tr>
<td style='width:500px'><b>foo : </b> bar</td>
<td style='width:200;vertical-align:'><b>baz :</b> qux</td>
</tr>
</table>
</body>
</html>
";
$xml = simplexml_load_string($str);
$results = array();
foreach($xml->xpath('//td[@style][b]') as $row) {
$value = trim(sprintf("%s", $row));
$key = trim((string)$row->b, ' :');
$results[$key] = $value;
}
var_dump($results);
Will prints
array(2) {
["foo"]=>
string(3) "bar"
["baz"]=>
string(3) "qux"
}
精彩评论