I want to parse a html content that have something like this:
<di开发者_JAVA百科v id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>
I need to catch just the "Lorem<br> <b>Ipsun</b>
" inside the first div. How can I achieve this?
Ps: the html inside the first div have multiple lines, its an article.
Thanks
Trying to use regex to parse HTML is not a very nice experience as HTML isn't a regular language. An alternative would be to use a HTML parser like Simple HTML DOM or the DOM library/
Simple HTML DOM Example:
$html = str_get_html('<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>');
echo $html->find('div[id=sometext]', 0)->innertext;
Assuming that the id
is known:
preg_match('#<div id="sometext">(.*?)</div>#s', $text, $match);
精彩评论