开发者

How to determine whether a web page has RSS or not in C#

开发者 https://www.devze.com 2022-12-12 12:25 出处:网络
I have a task to do. I need to download a web page and to see if the page contains any RSS feeds. I know how to download a web page to string using Http APIs in C#, but how can I determine the http

I have a task to do.

I need to download a web page and to see if the page contains any RSS feeds.

I know how to download a web page to string using Http APIs in C#, but how can I determine the http page开发者_JAVA技巧 string contains any RSS feeds or not?

Thanks

Jack


I expect you would have to load the page into a dom (XmlDocument, XDocument or HtmlDocument) and check for any nodes like:

<link rel="alternate" type="application/atom+xml" ...

This should be (in xpath) something like "/html/head/link[@rel='alternate' and @type='application/atom+xml']" - then look at @title and @href.


Instead of loading the HTML into an XMLDocument (which may not be possible if it isn't XHTML compliant), try the HTML Agility Pack instead. It gives you XMLDocument-like syntax but you can use malformed HTML with it.

but generally, you would look for that link tag in the pages head..


Use a regular expression to check the HTML for the link tag.

An exhaustive approach would be to spider each href link and examine the content-type and presence of rss or atom tags...

0

精彩评论

暂无评论...
验证码 换一张
取 消