i try to scrap a web page. I use Nokogiri/ Mechanize. so if i make
page = agent.get(url)
page.class
=> Mechanize::开发者_JAVA百科File
, sometimes i get a page object sometimes a file object. but what i need is, everytime a page object. i tried to add a pluggable_parser for plain/text but this don't work for me.
have anyone an idea how i can fix it, or how i can find out the content-type from a file object or know, how i can cast a file to an page object?
Thanks Michael
Most likely the page you're requesting is unavailable and the server returns a plaintext error page.
See the docs on Mechanize::File.
The content type is in page.response['content-type']
.
It's definitely possible to change the content type of the response and then create a Mechanize::Page from the data without having to download it again - but I don't think that would give you anything useful.
Check the response code as well, it's in page.code
.
精彩评论