I am screen scraping a webpage and sending it as a html email.
What is the easiest/best way to manipulate the h开发者_StackOverflowtml to set full http addresses for all images and css files?
Current method is similar to (manually typed) + this is very open to error.
string html = rawHtml.replace("=\"", "=\"" + Request["SERVER_NAME"]);
.
.Here is the current function we use to screen scrape using GET
public static string WebGet(string address)
{
string result = "";
using (WebClient client = new WebClient())
{
using (StreamReader reader = new StreamReader(client.OpenRead(address)))
{
string s = reader.ReadToEnd();
result = s;
}
}
return result;
}
It sounds like what you need is an HTML parser. Once you parse the html string with the parser, you can execute commands that easily manipulate the DOM, and thus you could find all img elements, check their src and append the Request["SERVER_NAME"] if you need to.
I don't code in ASP, but I found this:
http://htmlagilitypack.codeplex.com/
And here is a useful article I found explaining how to use it:
https://web.archive.org/web/20211020001935/https://www.4guysfromrolla.com/articles/011211-1.aspx
精彩评论