I have series of HTML files with the same structures.
Let take this example code.
> <html>
> <head>
> <title>main page</title>
> </head>
> <body>
> <t开发者_开发百科able><tr>
> <td>content1</td>
> </tr></table>
> </body>
> </html>
I want to extract the title tag content and td tag content. How to do this using htmlunit? I am new to htmlunit. Please help me.
See this instructive snippet from the HTMLUnit page.
In there you first construct a client, then retrieve your page, finally ask for the title text (page.getTitleText()
), or get the entire page as a HTML String (page.asXml()
). You could then assertContains
on that string.
There are plenty of other options, like retrieving elements by id. Best see the examples for yourself.
htmlunit is a testing system. Not a DOM parser.
To parse HTML to a DOM use http://about.validator.nu/htmlparser/ and use the HtmlDocumentBuilder class.
Once you have a Document
you can do myDocument.getElementsByTagName("title")
to find the title element.
精彩评论