开发者

Parse page HTML output

开发者 https://www.devze.com 2023-01-17 14:39 出处:网络
I\'d like to know one (or more) ways to parse the HTML page output. I\'d like to detect some patterns on the HTML that will be send to the client and log some info if开发者_如何转开发 present.Everyth

I'd like to know one (or more) ways to parse the HTML page output. I'd like to detect some patterns on the HTML that will be send to the client and log some info if开发者_如何转开发 present.


Everything you need is in the

   Page.Render 

method, override it and do what you want to in there.

protected override void Render(HtmlTextWriter writer)
{
    // do your stuff here
     StringBuilder  stringBuilder = new StringBuilder();
     StringWriter   stringWriter = new StringWriter(stringBuilder); 
     HtmlTextWriter htmlTextWriter = new HtmlTextWriter(stringWriter);

     base.Render(htmlTextWriter); // <-- render the page into the htmlTextwriter
     // the htmlTextwriter connects trough the stringWriter to the stringBuilder 
     string theHtml = stringBuilder.ToString(); // <---- html captured in string
     //---------------------------------------------
     //do stuff on theHtml here
     //---------------------------------------------
     writer.Write(theHtml); // <----write html with the original writer
}


It depends on what you mean by "parse" exactly, but something like the HTML Agility Pack can create an XML-like structure from an HTML document - essentially creating a proper HTML DOM data structure. You can even then convert it straight to XML, use LINQ, etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号