开发者

How do I open Amazon reviews iframe with Nokogiri?

开发者 https://www.devze.com 2023-03-10 01:44 出处:网络
I am able to get the reviews iframe url from Amazon::AWS::ItemSearch but I\'m having trouble loading the iframe content into Nokogiri.

I am able to get the reviews iframe url from Amazon::AWS::ItemSearch but I'm having trouble loading the iframe content into Nokogiri.

iframe_url = "http://www.amazon.com/reviews/iframe?[...snip...]" 
iframe = Nokogiri::HTML(open(iframe_url))

This gives a 403 error:

OpenURI::HTTPError: 403 Forbidden

If I paste the url into my browser the page is just blank. However, if I put it in an iframe on my webpage like this the content loads fine:

%iframe{:src => "http://www.amazon.com/reviews/iframe?[...snip...]"}/

I thought of doing something li开发者_JS百科ke this:

iframe_url = "http://www.amazon.com/reviews/iframe?[...snip...]" 
html = "<iframe src = #{iframe_url}></iframe>"
iframe = Nokogiri::HTML(html)

but this won't fetch the remote content. How can I do this?


It looks like Amazon checks for the HTTP referer value and then returns the appropriate response. I recommend you to try Mechanize to retrieve that page with all of the headers correctly specified.


The 403 forbidden is due to Amazon blocking iframe requests. Google does the same thing, and if you load google in chrome and open up the Network inspector you'll see something like this in the headers:

How do I open Amazon reviews iframe with Nokogiri?

If you look at the screenshot, you can see that google sets a header called x-frame-options with the value SAMEORIGIN. Modern browsers read this header and the browser itself prevents you from injecting it. That's why it works when you put it straight into your webpage.

I've gotten around this by creating a proxy on the same domain and iframing that instead. For example, set up a route at yourapp.com/proxy and in the GET action, do a server-side HTTP request to fetch the URL you want. Then you render that into your view, and pull the URL to request from a query parameter so instead of iframing http://amazon.com/... you iframe http://yourapp.com/proxy?url=http://amazon.com/... and voila! you get the same result.

0

精彩评论

暂无评论...
验证码 换一张
取 消