So I am trying to read the contents of an HTML file to scrape some metadata off of a particular website.
The issue I am running into however is that performing the HTTP requests in objective-c using the cocoa library calls gives me a different HTML file then when I perform the call via a web browser or my implemented python call.
The reason why this is annoying, is that I am scraping a key that is generated on every request. The site seems to know when I performing the request via cocoa instead of from the python library or from the browser.
Can anyone shed any light on this?
Here is the following python code I perform:
self.url = 'http://sample-site.com/1?ax=1ts=123123.12'
request = urllib2.Request(complete_url)
response = urllib2.urlopen(request)
html_data = response.read()
Here is the following cocoa attempts I've tried:
NSString * completeUrl = [url stringByAppendingFormat:@"//%d?ax=1&ts=%1.2f", pageNumber, time];
Another attempt:
NSMutableURL开发者_高级运维Request* request = [[[NSMutableURLRequest alloc] initWithURL:hypeURL] autorelease]; [request setValue:userAgent forHTTPHeaderField:@"User-Agent"]; NSURLResponse* response = nil; NSError* error = nil; NSData* data = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error]; NSString *hypeHTML = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
The attempts in cocoa are retrieving the HTML however the HTML contains key values which I parse which are generated each refresh. When performing the requests using cocoa however the key values do not change upon each execution of the application (the same key is in the HTML) where in the Python, the HTML correctly has different keys for each request.
Thanks
The website probably detects the user-agent and returns different content based on it.
Simply change the user-agent in the header of your request:
NSString* userAgent = @"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 BAVM/1.0.0";
NSURL* url = [NSURL URLWithString:@"http://www.stackoverflow.com/"];
NSMutableURLRequest* request = [[[NSMutableURLRequest alloc] initWithURL:url] autorelease];
[request setValue:userAgent forHTTPHeaderField:@"User-Agent"];
NSURLResponse* response = nil;
NSError* error = nil;
NSData* data = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error];
NSString *result = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
NSLog(@"%@",result);
With this code, the server thinks you're running Firefox on Linux :)
Get current user-agent / lookup user-agents for specific browsers:
http://www.useragentstring.com/
精彩评论