Ruby open-uri, returns error when opening a png URL_问答_开发者

Ruby open-uri, returns error when opening a png URL

开发者 https://www.devze.com 2022-12-21 03:57 出处：网络

I am making a crawler parsing images on the Gantz manga at http://manga.bleachexile.com/gantz-chapter-1.html and on.

I had success until my crawler tried to open a image (on chapt 273):

bad URI(is not URI?): http://static.bleachexile.com/manga/gantz/273/Gantz[0273]_p001[Whatever-Illumin开发者_如何学Goati].png

BUT this url is valid I guess, because I can open from Firefox.. Any thoughts?

Partial code:

img_link = nav.page.image_urls.find {|x| x.include?("manga/gantz")}
img_name = RAILS_ROOT+"/public/#{nome}/#{cap}/"+nome+((template).sub('::cap::', cap.to_s).sub('::pag::', i.to_s))
img = File.new( img_name, 'w' )
img.write( open(img_link) {|f| f.read} )
img.close

It is not a valid uri. Only certain characters are allowed for uri's. By the way firefox like all browsers try to do as much as possible for the user instead of complaining when it does not look standard compliant.

It is valid in the following form:

open("http://static.bleachexile.com/manga/gantz/273/Gantz%5B0273%5D_p001%5BWhatever-Illuminati%5D.png") # => #<File:/tmp/open-uri20100226-3342-clj08a-0>

You could try to escape it like this:

uri.gsub(/\/.*/) do |t|
  t.gsub(/[^.\/a-zA-Z0-9\-_ ]/) do |c|
    "%#{ c[0]<16 ? "0" : "" }#{ c[0].to_s(16).upcase }"
  end.gsub(" ", "+")
end

But be carefull, if the website uses correct escaped uri's and you escape them a second time. The uri's wont point to the same location anymore.

Ruby open-uri, returns error when opening a png URL

精彩评论

关注公众号

热门标签

图文推荐

Ruby open-uri, returns error when opening a png URL

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：