How to extract innermost table from html file with the help of the html agility pack?_问答_开发者

How to extract innermost table from html file with the help of the html agility pack?

开发者 https://www.devze.com 2022-12-25 18:29 出处：网络

I am parsing the tabular information from the html file with the help of the html agility pack. Now I can do it and it works.

I am parsing the tabular information from the html file with the help of the html agility pack.

Now I can do it and it works.

But when the table what I want to extract is inner most.

Or I don't know at which position it is in nested tables.And there can be any number of nested tables and from that I want to extract the information of the table which has column name name,address.

Ex.

<table>
    <table>
           <tr><td>PHONE NO.</td><td>OTHER INFO.</td></tr>
           <tr><td>
              <table>
                 <tr><td>AMOUNT</td></tr>
                 <tr><td>50000</td></tr>
                 <tr><td>80000</td></tr>
              </table>
           </td></tr>
           <tr><td>
              <table>
                 <tr><td>
                     <table>
                         <tr><td>
                              <table>
                                 <tr><td> NAME </td><td>ADDRESS</td>
                                 <tr><td> ABC  </td><td> kfks   </td>
                                 <tr><td> BCD  </td><td> fdsa   </td>
                              </table>
                         </tr></td>
                     </table>
                 </td></tr>
              </table>
           </td></tr>
        </table>

There are many tables but I want to extract the tab开发者_运维问答le which has column name name,address. So what should I do ?

Load the document as a HtmlDocument. Then use an XPath query to find a table that contains no other tables and which has a td in the first row containing "Name".

The XPath implementation is the standard .NET one from System.Xml.XPath, so any documentation about using XPath with XmlDocument will be applicable.

HtmlDocument doc = new HtmlDocument();
doc.Load("file.html");
HtmlNode el = (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[not(descendant::table) and tr[1]/td['NAME' = normalize-space()]]");

If the "Name" column was fixed, you could use something like 'Name' = normalize-space(tr[1]/td[2]).

To find a table based on several column names, but not the inner most table condition.

HtmlNode el = (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[tr[1]/td['NAME' = normalize-space()] and tr[1]/td['ADDRESS' = normalize-space()]]");

var table = doc.DocumentNode.SelectSingleNode("//table [not(descendant::table) and tr[1]/td[normalize-space()='ADDRESS'] ]");

How to extract innermost table from html file with the help of the html agility pack?

精彩评论

关注公众号

热门标签

图文推荐

How to extract innermost table from html file with the help of the html agility pack?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：