开发者

How to get this regex working?

开发者 https://www.devze.com 2023-01-14 16:36 出处:网络
i have a small problem, i want to find in <tr><td>3</td><td>foo</td><td>2</td>开发者_Python百科

i have a small problem, i want to find in

<tr><td>3</td><td>foo</td><td>2</td>开发者_Python百科

the foo, i use:

$<tr><td>\d</td><td>(.*)</td>$

to find the foo, but it dont work because it dont match with the </td> at the end of foo but with the </td> at the end of the string


You have to make the .* lazy instead of greedy. Read more about lazy vs greedy here.
Your end of string anchors ($) also don't make sense. Try:

<tr><td>\d<\/td><td>(.*?)<\/td>

(As seen on rubular.)

NOTE: I don't advocate using regex to parse HTML. But some times the task at hand is simple enough to be handled by regex, for which a full-blown XML parser is overkill (for example: this question). Knowing to pick the "right tool for the job" is an important skill in programming.


Your leading $ should be a ^.

If you don't want to match all of the way to the end of the string, don't use a $ at the end. However, since * is greedy, it'll grab as much as it can. Some regex implementations have a non-greedy version which would work, but you probably just want to change (.*) to ([^<]*).


Use:

^<tr><td>\d</td><td>(.*?)</td>

(insert obligatory comment about not using regex to parse xml)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号