read html from C# win forms_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-15 04:17 出处：网络

I need to read title of the w开发者_如何学编程eb site using C# win forms.so what is the best way to do it.i search on the google but i didnt get anyone.

相关专题：

I need to read title of the w开发者_如何学编程eb site using C# win forms.so what is the best way to do it.i search on the google but i didnt get anyone.

thanks in advance

You want to use the WebClient object found in the System.Net.WebClient namespace.

using System.Net;

With WebClient you can download a whole website as a string and then do whatever you want with that string. :)

Example:

WebClient client = new WebClient(); 
string content = wc.DownloadString("http://www.google.com");

Then just parse the string anyway you want it. :) In this example you might want to find the title element and extract the title like this:

string title = content.Substring(content.IndexOf("<title>"), content.IndexOf("</title>") - content.IndexOf("<title>")).Replace("<title>", "").Trim();

Hope it helps. :)

If you have to do whole webpage parsing then you can try HTML Agility pack. If what you need is just the Title then some Regular Expression will do it.

Since most of the Time Title is in <title> tag you can straight away extract that.

For downloading the HTML then you can use a WebClient or HttpRequest/Response objects

Personally I like and use SgmlReader to parse HTML:

using System;
using System.IO;
using System.Net;
using System.Xml;
using Sgml;

class Program
{
    static void Main()
    {
        var url = "http://www.stackoverflow.com";
        using (var reader = new SgmlReader())
        using (var client = new WebClient())
        using (var streamReader = new StreamReader(client.OpenRead(url)))
        {
            reader.DocType = "HTML";
            reader.WhitespaceHandling = WhitespaceHandling.All;
            reader.CaseFolding = Sgml.CaseFolding.ToLower;
            reader.InputStream = streamReader;

            var doc = new XmlDocument();
            doc.PreserveWhitespace = true;
            doc.XmlResolver = null;
            doc.Load(reader);
            var title = doc.SelectSingleNode("//title");
            if (title != null)
            {
                Console.WriteLine(title.InnerText);
            }
        }
    }
}