some of my website urls are duplicated. i need to know which of them are indexed by google i need开发者_如何学Python some function in c# to know which of my url is indexed.
In Google's search you can type: site:yourdomain
And it will show you the results. you can use the Google Custom Search API programmatically to do this. http://code.google.com/apis/customsearch/v1/overview.html
It returns JSON results that you can convert into C# objects using the DataContractSerializer.
You'll need to sign up for an API key if you go this route.
Edit As for Html Agility Pack, I have a blog post that shows how you can extract the links on a page
Finding links on a Web page
精彩评论