Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.
<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>
If I use the line below
preg_match_all('/<li>(.*)<\/li>/', $text, $result);
i will get an array with a single row containing:
Content1</li>
<li>Conte开发者_C百科nt2</li>
<li>Content3</li>
<li>Content4
And by using this code:
preg_match_all('/<li>(.*?)<\/li>/', $text, $result);
i will get an array with 4 row containing Content1, Content2, ...
Why (.*) is not working since it means match any character zero or more times
*
matches in a greedy fashion, *?
matches in a non-greedy fashion.
What this means is that .*
will match as many characters as possible, including all intermediate </li><li>
pairs, stopping only at the last occurrence of </li>
. On the other hand, .*?
will match as few characters as possible, stopping at the first occurrence of </li>
.
Because .*
itself is greedy and eats up as much as it can (i.e. up to the last </li>
) while still allowing the pattern to match. .*?
on the other hand is not greedy and eats up as little as possible (stopping at first </li>
).
See this article's section about greedyness of regular expressions.
精彩评论