开发者

libxml2 on iPhone

开发者 https://www.devze.com 2023-01-02 05:35 出处:网络
I\'m trying to parse HTML file with libxml2. Usually this works fine, but not in this case: <p> <b>Titles</b>

I'm trying to parse HTML file with libxml2. Usually this works fine, but not in this case:

<p>
    <b>Titles</b>
    (Some Text)
    <table>
        <tr>
            <td valign="top">
                …Something1...
            </td>
            <td align="right" valign="top">
                …Something2...
            </td>
        </tr>
    </table>
</p>

I do this query to get the first <td>

//p[b='Titles']/table/tr/td[0]

but nothing is returned because libxml think that <table> tag is not a child of a tag <p>开发者_如何学C and following him.

And finally the question WHY?


Are you using HTML or XML parser? AFAIR, HTML allows only inline elements inside <p> (you cannot put <table> in <p>), so that it auto-closes <p> tag after seeing <table> tag (in HTML, you don't have to close every tag). So, your HTML is roughly equivalent to (attributes omitted):

<P>
  <B>Titles</B>
  Some text...

<TABLE>
  <TR>
    <TD>...Something1...
    <TD>...Something2...
</TABLE>

Try using XML parser form libxml instead of HTML.


//p[b='Titles']/table/tr/td[0]

The error is in the indexing. XPath uses 1-based indexing.

The corrected XPath expression is:

//p[b='Titles']/table/tr/td[1]

0

精彩评论

暂无评论...
验证码 换一张
取 消