Greetings everyone
I have this regular expression which goes as follow:
$thread_views_exp = '~<td class="alt1" align="center">.*</td> <td class="alt2" align="center">(.*)</td> </tr>~isU';
The purpose of this is to get all the 'views' ( first column from left ) for this sample thread url http://www.swalif.net/softs/swalif45. Everything works fine except for the first value.
Sample Output:
Array
(
[0] => 12 528
[1] => 2,732
[2] => 506
[3] => 73
[4] => 83
[5] => 245
[6] => 100
[7] => 201
[8] => 55
[9] => 55
[10] => 37
[11] => 349
[12] => 123
[13] => 75
[14] => 173
[15] => 260
[16] => 101
[17] => 660
[18] => 158
[19] => 66
[20] => 177
[21] => 165
[22] => 228
[开发者_StackOverflow23] => 812
[24] => 347
[25] => 197
[26] => 348
[27] => 263
[28] => 176
[29] => 315
[30] => 173
[31] => 273
[32] => 199
)
Thanks for your assistance. Imran
It seems to be a case of table cell greedyness. My test also gave me an extraneous <td>
. But there is a simple way to make the regex more stringent:
$rx = '~<td class="alt1" align="center">.*</td> <td class="alt2" align="center">([\d,]+)</td> </tr>~isU';
Here the \d+
used in place of .*?
returns only exact matches. The previous .*
was eating up too much.
General tip: you might want to use [^<>]*
for safely matching text content between html brackets, instead of .*
. Maybe apply \s+
instead of just spaces.
Maybe try
~<td class="alt2" [^\<\>]+?>([\d,]+)</td>~isU
This assumes that the td
s you are interested in are always of class="alt2"
And there's probably no need to escape the LT and GT signs ie...
~<td class="alt2" [^<>]+?>([\d,]+)</td>~isU
精彩评论