开发者

iPhone - possible to query Wikipedia to see if web page exists?

开发者 https://www.devze.com 2023-03-24 04:55 出处:网络
I am curious if there is a way to see if a wikipedia page exists, I have a custom search implemented, that replaces the spaces in a search with _, however I have no way to see if this path actually ex

I am curious if there is a way to see if a wikipedia page exists, I have a custom search implemented, that replaces the spaces in a search with _, however I have no way to see if this path actually exists.

  开发者_如何学编程  targetWiki = inputCustomTarget.text;
    targetWiki = [targetWiki stringByReplacingOccurrencesOfString:@" " withString:@"_"];
    targetWiki = [NSString stringWithFormat:@"http://en.m.wikipedia.org/wiki/%@", targetWiki];    

Would I have to parse the response in order to find out if a page exists?


There should be no need to parse the result just check for a 200 response code in the - (void)connection:(NSURLConnection*)connection didReceiveResponse:(NSURLResponse*)response callback. If it does not exist you should get a 404.

Edit:

I would like to add that pages that do not exist on the Main Wikipedia page (not the mobile .m) do return the correct 404 error code. This could change in the future and may not be completely reliable if they change anything but neither is parsing the content. Here is a sample I put together to prove this.

NSURLRequest *exists = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Qwerty"]];
//Redirects to Blivet
NSURLRequest *redirects = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Poiuyt"]];
NSURLRequest *nonexistant = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Jklfdsa"]];

NSHTTPURLResponse *resp_exists;
NSHTTPURLResponse *resp_redirects;
NSHTTPURLResponse *resp_nonexistant;

[NSURLConnection sendSynchronousRequest:exists returningResponse:&resp_exists error:NULL];
[NSURLConnection sendSynchronousRequest:redirects returningResponse:&resp_redirects error:NULL];
[NSURLConnection sendSynchronousRequest:nonexistant returningResponse:&resp_nonexistant error:NULL];

NSLog(@"\nExists: %d\nRedirects: %d\nNon Existant: %d", 
      [resp_exists statusCode], 
      [resp_redirects statusCode], 
      [resp_nonexistant statusCode] );

And here is the output

Exists: 200
Redirects: 200
Non Existant: 404

So if a page exists or automatically redirects to a page that does exist you will get a 200 error code, if it does not exist then you will get 404. If you would like to capture the redirect you will need implement -connection:willSendRequest:redirectResponse: and act accordingly.

Note: This example code is synchronous for the sake of being compact. This is not ideal and production implementations should be sending asynchronous request and use the NSURLConectionDelegate methods.


You can't check the response code because it will always return a 200 response code.

I think the best way to see if a page exists is to parse the response and check if you land on the default 'search results' page.

Another option would be to make use of MediaWiki's API.

http://en.wikipedia.org/w/api.php?action=opensearch&search=term

Check if the term that was searched for exists in the returned response.


Yes, I'm afraid you will probably need to parse the results to know if the page exists. However there might be an alternative if you look at the complete English wikipedia dump files which are made available here;

http://en.wikipedia.org/wiki/Wikipedia:Database_download#Latest_complete_dump_of_English_Wikipedia

Obviously this raw data is huge, but you could write a parser to find all the valid links and then compress that information into (say) a coreData database which you might find could fit on the iPhone. Then you could run a check without having to test the page.

But to be honest, I'd probably parse the page and perhaps cache the answer so I only have to do it once.

EDIT: I'm afraid the answer given by Joe is not fully correct. When I use the domain that the original question used (ie en.m.wikipedia.org) then Joe's sample code gives the following output.

Exists: 200
Redirects: 200
Non Existant: 200

If I use en.wikipedia.org then my results concur with Joe, however that was not the question asked. I am based in the UK and that might also have a bearing on the results.

0

精彩评论

暂无评论...
验证码 换一张
取 消