开发者

Inside a innermost tag. How to get all the formating operations effective on the text?

开发者 https://www.devze.com 2023-02-14 10:51 出处：网络

相关专题：css-selectors html-agility-pack

My requirement is to get the news content from different news websites..approximately...250. so news content is somewhere in the body, i can go to the first paragraph of where ever the news content is based on the google snippets/metainfo. but to get the other paragraphs of the news content i am trying to go up the HTML tree till i find a division or a table body...but because of that i am getting some undesired text i.e is not related to the news item. so what i found out is...all the relevant news items in most of the webpages are styled or formatted in a similar way. So is there a way i can capture all the styling happening in the first pa开发者_StackOverflowragraph and then i can filter out unwanted text using that formating information.

I am using HTML agility pack and xpath for my requirement. Thank you.

You could like at my answer of the following question on SO: Advanced HTML Agility Pack usage, it seems to be somewhat related to yours.