开发者

Inside a innermost tag. How to get all the formating operations effective on the text?

开发者 https://www.devze.com 2023-02-14 10:51 出处:网络
My requirement is to get the news content from different news websites..approximately...250. so news content is somewhere in the body, i can go to the first paragraph of where ever the news content is

My requirement is to get the news content from different news websites..approximately...250. so news content is somewhere in the body, i can go to the first paragraph of where ever the news content is based on the google snippets/metainfo. but to get the other paragraphs of the news content i am trying to go up the HTML tree till i find a division or a table body...but because of that i am getting some undesired text i.e is not related to the news item. so what i found out is...all the relevant news items in most of the webpages are styled or formatted in a similar way. So is there a way i can capture all the styling happening in the first pa开发者_StackOverflowragraph and then i can filter out unwanted text using that formating information.

I am using HTML agility pack and xpath for my requirement. Thank you.


You could like at my answer of the following question on SO: Advanced HTML Agility Pack usage, it seems to be somewhat related to yours.

0

精彩评论

暂无评论...
验证码 换一张
取 消