开发者

(.*) instead of (.*?)

开发者 https://www.devze.com 2022-12-26 03:41 出处:网络
Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.

Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.

<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>

If I use the line below

preg_match_all('/<li>(.*)<\/li>/', $text, $result);

i will get an array with a single row containing:

Content1</li>
<li>Conte开发者_C百科nt2</li>
<li>Content3</li>
<li>Content4

And by using this code:

preg_match_all('/<li>(.*?)<\/li>/', $text, $result);

i will get an array with 4 row containing Content1, Content2, ...

Why (.*) is not working since it means match any character zero or more times


* matches in a greedy fashion, *? matches in a non-greedy fashion.

What this means is that .* will match as many characters as possible, including all intermediate </li><li> pairs, stopping only at the last occurrence of </li>. On the other hand, .*? will match as few characters as possible, stopping at the first occurrence of </li>.


Because .* itself is greedy and eats up as much as it can (i.e. up to the last </li>) while still allowing the pattern to match. .*? on the other hand is not greedy and eats up as little as possible (stopping at first </li>).


See this article's section about greedyness of regular expressions.

0

精彩评论

暂无评论...
验证码 换一张
取 消