How can I fix this?
REGEX:
//REGEX
$match_expression = '/Rt..tt<\/td> <td>(.*)<\/td>/';
preg_match($match_expression,$text,$matches1);
$final = $matches1[1];
//THIS IS WORKING
<tr> <td class="rowhead vtop">RtÅ¡tt</td> <td><img border=0 src="http://somephoto"><br /> <br />开发者_StackOverflow中文版INFO INFO INFO</td>
</tr>
//THIS IS NOT WORKING
<tr> <td class="rowhead vtop">Rtštt</td> <td> <br />
IFNO<br />
INFO<br /></td></tr>
And this is exactly why you shouldn't be using Regular Expressions to extract data from an HTML document.
The markup structure is so arbitrary that it is simply too unreliable, which is exactly why I won't give you a proper regular expression to use because there is none (the solutions given by other users might work... until they break). Use a DOM Parser like DOMDocument or phpQuery to extract data from your document.
Here is an example using phpQuery:
$pq = phpQuery::newDocumentFile('somefile.html');
$rows = $pq->find('td.rowhead.vtop:parent');
$matches = array();
foreach($rows as $row) {
$matches[] = $row->eq(1)->html();
}
You're doing it wrong!
Having said that, a solution to your question is:
/Rt..tt<\/td> <td>(.*)<\/td>/
should be
/Rt..tt<\/td> <td>(.*)<\/td>/s
see http://php.net/manual/en/reference.pcre.pattern.modifiers.php
$s = explode('</tr>',$str);
foreach($s as $v){
$m=strpos($v,"img border");
if($m!==FALSE){
print substr($v,$m);
}
}
精彩评论