开发者

HtmlAgilityPack - how to grab <DIV> data in a large web page

开发者 https://www.devze.com 2023-03-16 20:08 出处:网络
I am tryingto graba data from a WEBPAGE ,<DIV>particular class<DIV class=\"personal_info\">it has 10similar <DIV>S and is of same Class \"Personal_info\" ( as shown in HTMLCodeand no

I am trying to grab a data from a WEBPAGE , <DIV>particular class <DIV class="personal_info"> it has 10 similar <DIV>S and is of same Class "Personal_info" ( as shown in HTML Code and now i want to extract all the DIVs of Class personal_info which are in 10 - 15 in every webpage .

<div class="personal_info"><span class="bold">Rama Anand</span><br><br> Mobile: 9916184586<br>rama_asset@hotmail.com<br> Bangalore</div>

to do the needful i started using HTML AGILE PACK as suggested by some one in Stack overflow and i stuck at the beginn开发者_运维问答ing it self bcoz of lack of knowledge in HtmlAgilePack my C# code goes like this

HtmlAgilityPack.HtmlDocument docHtml = new HtmlAgilityPack.HtmlDocument();
        HtmlAgilityPack.HtmlWeb docHFile = new HtmlWeb();

        docHtml = docHFile.Load("http://127.0.0.1/2.html");

then how to code further so that data from DIV whose class is "personal_info" can be grabbed ... suggestion with example will be appreciated


I can't check this right now, but isn't it:

var infos = from info in docHtml.DocumentNode.SelectNodes("//div[@class='personal_info']") select info; 


To get a url loaded you can do something like:

 var document = new HtmlAgilityPack.HtmlDocument(); 
 var url = "http://www.google.com";
 var request = (HttpWebRequest)WebRequest.Create(url);
 using (var responseStream =  request.GetResponse().GetResponseStream())
 {
   document.Load(responseStream, Encoding.UTF8);
 }

Also note there is a fork to let you use jquery selectors in agility pack.

IEnumerable<HtmlNode> myList = document.QuerySelectorAll(".personal_info");

http://yosi-havia.blogspot.com/2010/10/using-jquery-selectors-on-server-sidec.html


What happened to Where?

node.DescendantNodes().Where(node_it => node_it.Name=="div");

if you want top node (root) you use page.DocumentNode as "node".

0

精彩评论

暂无评论...
验证码 换一张
取 消