开发者

Get content inside HTML tags with RegExp

开发者 https://www.devze.com 2023-02-03 23:51 出处:网络
I\'d like to extract the content from a large file of t开发者_JAVA百科able cells using regexp and process the data using PHP.

I'd like to extract the content from a large file of t开发者_JAVA百科able cells using regexp and process the data using PHP.

Here's the data I would like to match:

<td>Current Value: </td><td>100.178</td>

I tried using this regexp to match and retrieve the text:

preg_match("<td>Current Value: </td><td>(.+?)</td>", $data, $output);

However I get an "Unknown modifier" warning and my variable $output comes out empty.

How can I accomplish this - and could you give me a brief summary of how the solution works so I can try to understand why my code didn't?


You need to add delimiters around your regex:

preg_match("#<td>Current Value: </td><td>(.+?)</td>#", $data, $output);

The standard delimiter is /, but you can use other non-alphanumeric characters if you wish (which makes sense here because the regex itself contains slashes). In your case, the regex engine thought you wanted to use angle brackets as delimiters - and failed.

One more tip (aside from the canonical exhortation "Thou shalt not parse HTML with regexen" (which I think is perfectly OK in a specific case like this)): Use ([^<>]+) instead of (.*?). This ensures that your regex will never travel across nested tags, a common source of errors when dealing with markup languages.


I would suggest you use a DOM Parser. It will make your life a lot easier, keep your code cleaner, and will be easier to maintain.

http://simplehtmldom.sourceforge.net/

This has some examples of accessing child elements: http://simplehtmldom.sourceforge.net/manual.htm#section_traverse

0

精彩评论

暂无评论...
验证码 换一张
取 消