开发者

read html from C# win forms

开发者 https://www.devze.com 2023-02-15 04:17 出处:网络
I need to read title of the w开发者_如何学编程eb site using C# win forms.so what is the best way to do it.i search on the google but i didnt get anyone.

I need to read title of the w开发者_如何学编程eb site using C# win forms.so what is the best way to do it.i search on the google but i didnt get anyone.

thanks in advance


You want to use the WebClient object found in the System.Net.WebClient namespace.

using System.Net;

With WebClient you can download a whole website as a string and then do whatever you want with that string. :)

Example:

WebClient client = new WebClient(); 
string content = wc.DownloadString("http://www.google.com");

Then just parse the string anyway you want it. :) In this example you might want to find the title element and extract the title like this:

string title = content.Substring(content.IndexOf("<title>"), content.IndexOf("</title>") - content.IndexOf("<title>")).Replace("<title>", "").Trim();

Hope it helps. :)


If you have to do whole webpage parsing then you can try HTML Agility pack. If what you need is just the Title then some Regular Expression will do it.

Since most of the Time Title is in <title> tag you can straight away extract that.

For downloading the HTML then you can use a WebClient or HttpRequest/Response objects


Personally I like and use SgmlReader to parse HTML:

using System;
using System.IO;
using System.Net;
using System.Xml;
using Sgml;

class Program
{
    static void Main()
    {
        var url = "http://www.stackoverflow.com";
        using (var reader = new SgmlReader())
        using (var client = new WebClient())
        using (var streamReader = new StreamReader(client.OpenRead(url)))
        {
            reader.DocType = "HTML";
            reader.WhitespaceHandling = WhitespaceHandling.All;
            reader.CaseFolding = Sgml.CaseFolding.ToLower;
            reader.InputStream = streamReader;

            var doc = new XmlDocument();
            doc.PreserveWhitespace = true;
            doc.XmlResolver = null;
            doc.Load(reader);
            var title = doc.SelectSingleNode("//title");
            if (title != null)
            {
                Console.WriteLine(title.InnerText);
            }
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消