开发者

Screen scrape to email with full url for images and css

开发者 https://www.devze.com 2023-03-26 15:03 出处:网络
I am screen scraping a webpage and sending it as a html email. What is the easiest/best way to manipulate the h开发者_StackOverflowtml to set full http addresses for all images and css files?

I am screen scraping a webpage and sending it as a html email.

What is the easiest/best way to manipulate the h开发者_StackOverflowtml to set full http addresses for all images and css files?

Current method is similar to (manually typed) + this is very open to error.

string html = rawHtml.replace("=\"", "=\"" + Request["SERVER_NAME"]);

.

.

Here is the current function we use to screen scrape using GET

public static string WebGet(string address)
{
    string result = "";
    using (WebClient client = new WebClient())
    {
        using (StreamReader reader = new StreamReader(client.OpenRead(address)))
        {
            string s = reader.ReadToEnd();
            result = s;
        }
    }

    return result;
}


It sounds like what you need is an HTML parser. Once you parse the html string with the parser, you can execute commands that easily manipulate the DOM, and thus you could find all img elements, check their src and append the Request["SERVER_NAME"] if you need to.

I don't code in ASP, but I found this:

http://htmlagilitypack.codeplex.com/

And here is a useful article I found explaining how to use it:

https://web.archive.org/web/20211020001935/https://www.4guysfromrolla.com/articles/011211-1.aspx

0

精彩评论

暂无评论...
验证码 换一张
取 消