开发者

Parsing HTML with XPath in Objective-C

开发者 https://www.devze.com 2023-01-29 09:19 出处:网络
Hey guys, I\'m trying to parse HTML with XPath from http://lib.harvard.edu/libraries/hours.html in Objective-C for an application that shows the operating hours for each day of the week at开发者_JAVA技

Hey guys, I'm trying to parse HTML with XPath from http://lib.harvard.edu/libraries/hours.html in Objective-C for an application that shows the operating hours for each day of the week at开发者_JAVA技巧 each of the 50 libraries listed on the website. I found code to facilitate XPath parsing of HTML in Objective-C at cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html, but I'm still a little confused about how I should go about obtaining the hours for each day for each library. The relevant method to use seems to be

NSArray *PerformHTMLXPathQuery(NSData *document, NSString *query)

and my code so far is

NSURL *urlPath = [NSURL URLWithString:@"http://lib.harvard.edu/libraries/hours.html"];
NSArray *array = PerformHTMLXPathQuery([NSData dataWithContentsOfURL:urlPath], NSString *query);

but, since I've never used XPath before, I'm not sure what string I should use in the second parameter of the method. Does anyone have any ideas?

Also, I'm not quite sure what to do with the array that gets returned by PerformHTMLXPathQuery(). I feel like cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html gives a pretty good explanation, it's just that I've never used XPath before so it doesn't make much sense to me at this point. So, to summarize, as long as my code so far is correct, I want to know what to use for the second parameter in the PerformHTMLXPathQuery() method and how to extract the relevant data from the array it returns. Any help would be much appreciated!


XPath is a language for navigating XML documents. The query parameter is an XPath query string, which you hope will be able to extract the elements you want from the HTML file. I say "hope" because

  1. I don't know how well XPath plays with HTML 4 documents
  2. I've had a look at the source of the page you want to parse and it is quite complex.

Anyway, those points aside, you'll be wanting to learn how to create an XPath expression. Fortunately, Google is your friend and typing "XPath" into it brings up the W3Schools tutorial on XPath. I have only skimmed it but it looks like what you need.

0

精彩评论

暂无评论...
验证码 换一张
取 消