开发者

Mechanize get a file instead of a page object

开发者 https://www.devze.com 2023-04-08 16:19 出处:网络
i try to scrap a web page. I use Nokogiri/ Mechanize. so if i make page = agent.get(url) page.class => Mechanize::开发者_JAVA百科File

i try to scrap a web page. I use Nokogiri/ Mechanize. so if i make

page = agent.get(url)
page.class
 => Mechanize::开发者_JAVA百科File

, sometimes i get a page object sometimes a file object. but what i need is, everytime a page object. i tried to add a pluggable_parser for plain/text but this don't work for me.

have anyone an idea how i can fix it, or how i can find out the content-type from a file object or know, how i can cast a file to an page object?

Thanks Michael


Most likely the page you're requesting is unavailable and the server returns a plaintext error page.

See the docs on Mechanize::File.

The content type is in page.response['content-type'].

It's definitely possible to change the content type of the response and then create a Mechanize::Page from the data without having to download it again - but I don't think that would give you anything useful.

Check the response code as well, it's in page.code.

0

精彩评论

暂无评论...
验证码 换一张
取 消