I want to parse html to a dom tree, and find all the开发者_开发问答 text NOT inside the <a>
tags, so, I googled it, and found "PHP Simple HTML DOM Parser". It seems it can help me to parse the HTML DOM to a DOM Tree. I would like to find the text NOT inside <a>
tags, but I only can find the element which is inside <a>
tag. *ps: it don't support the CSS3 not selector yet. Thank you.
Any one experience on this? Thank you.
I hope I'm not misunderstanding the question, but can't you use the built-in DOM functions for PHP to find the text inside the <a>
tags?
$doc = new DOMDocument();
$doc->loadHTMLFile("http://blahblah.com/blah.html");
$elem_list = $doc->getElementsByTagName("a");
foreach($elem_list as $elem)
echo $elem->textContent;
In that case I would remove all <a>
tags and their contents (for example with regular expressions) and then load the resulting HTML into your DOM parser of choice.
Update: Even better, immediately parse the HTML and use the built-in functions to remove the <a>
tags, or loop through all tags and just skip the <a>
tags. Regex with HTML should be avoided.
I have used this class many times. Its an excellent solution to parse html/dom in php.
$html = new simple_html_dom();
// Load your html as string
$html->load('........ HTML ..........');
$a = $html->find('a');
$text='';
for($i=0;$i<count($a);$i++)
$text.=$a[$i]->innertext;
variable $text containing all the text in a tags. Hope it will help you.
精彩评论