开发者

How to extract blocks of text from a HTML page?

开发者 https://www.devze.com 2023-02-15 20:11 出处:网络
I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the 开发者_C百科text is contained in <p>...</p> doesn\'t matter. I only care abou

I would like to extract blocks of texts with more than 100 words from a large HTML page using PHP. Whether the 开发者_C百科text is contained in <p>...</p> doesn't matter. I only care about the number of words that makes a coherent text block so texts outside of HTML paragraphs should also be taken into consideration.

How can this be done?


I use phpQuery. Are you familiar with jQuery? they share the same syntax. You might be concerned about installing a new library, but trust me this library is well worth the extra over head

phpQuery

You can then access it like this:

foreach($doc->find('p') as $element){
   $element = pq($element);
   echo str_word_count($element->text());
}


Use the PHP Simple DOM Parser.

foreach($html->find('p') as $element){
   echo str_word_count($element->src);
}
0

精彩评论

暂无评论...
验证码 换一张
取 消