How do you get a DOMDocument from a given HTML string using WebKit? In other words, what's the implementation for DOMDocumentFromHTML:
for something like the following:
NSString * htmlString = @"<html><body><p>Test</body></html>";
DOMDocument * document = [self DOMDocumentFromHTML: htmlString];
DOMNode * bodyNode = [[document getElementsByTagName: @"body"] item: 0];
// ... etc.
This seems like it should be straightforward to do, yet I'm still having trouble figuring out how :(开发者_如何转开发 ...
Not an actual answer to the question, but I've now concluded that WebKit and DOMDocument are likely not the most appropriate tools for what I want to do; which is to process an HTML document that is not shown to the user. The class NSXMLDocument straightforwardly and synchronously supports turning an HTML document into a manipulable object structure:
NSError * error = nil;
NSString * htmlString = @"<html><body><p>Test</body></html>";
NSXMLDocument * doc =
[[NSXMLDocument alloc]
initWithXMLString: htmlString
options: NSXMLDocumentTidyHTML
error: &error];
NSLog(@"Error is: %@", error);
NSLog(@"Doc is: %@", doc);
NSLog(@"Root element is: %@", [doc rootElement]);
NSLog(@"Root element's children are: %@", [[doc rootElement] children]);
According to what I can derive from another answer on this site, there is no synchronous method such as my requested DOMDocumentFromHTML:
available in WebKit.
So far, the best I've been able to do is the following asynchronous combination of giveDOMDocumentFromHTML:usingBaseURL:
and takeDOMDocument:
.
- (void) giveDOMDocumentFromHTML: (NSString *) htmlString
usingBaseURL: (NSURL *) baseURL
{
WebView * webView = [[WebView alloc] init];
[webView setFrameLoadDelegate: self];
[[webView mainFrame] loadHTMLString: htmlString
baseURL: baseURL];
}
- (void) takeDOMDocument: (DOMDocument *) document
{
DOMHTMLElement * bodyNode =
(DOMHTMLElement *) [[document getElementsByTagName: @"body"] item: 0];
NSLog(@"Body is: %@", [bodyNode innerHTML]);
}
They are hooked together through the following delegate method:
- (void) webView: (WebView *) webView
didFinishLoadForFrame: (WebFrame *) frame
{
if (frame == [webView mainFrame]) {
[self takeDOMDocument: [frame DOMDocument]];
}
}
The above works, but has at least the following remaining issues:
- I'm not sure where the allocated WebView should be sent a
release
orautorelease
message. - I would prefer/need the application to remain blocked until the HTML page has been processed. In the above scheme the application will be processing any user input while the WebView is loading/parsing the HTML. (Note that the WebView will never be shown on screen.)
So this is still very much up for improvement. Anyone who can provide a synchronous implementation for DOMDocumentFromHTML:
as outlined in the original question?
精彩评论