开发者

how to convert webpage apostrophe (’) to ascii 39 in ruby 1.8.7

开发者 https://www.devze.com 2022-12-29 18:49 出处:网络
That\'s pretty much it.I\'m using Nokogiri to scrape a web page what has &#8217 ; characters in it, and I can\'t figure out how to do the conversion.here\'s what I tried:

That's pretty much it. I'm using Nokogiri to scrape a web page what has &#8217 ; characters in it, and I can't figure out how to do the conversion. here's what I tried:

str.gsub(/&开发者_如何学JAVAamp;#8217;/,"'")  
str.gsub("’","'")  
str.gsub("ΓÇÖ","'") # that's how it looks when I do a puts

(In the above, there's no space between the &#8217 and the ";", but if I don't put the space in, SO converts it to an apostrophe -- the cruel, cruel irony!)

I'm sure this is covered somewhere, but couldn't find the solution here or on the web.

TIA


str.gsub("\342\200\231", "'") should work

I got this from:

    '’'.to_s
=> "\342\200\231"

Other html characters that may be substituted ( http://ask.metafilter.com/62656/Eliminating-odd-characters-from-web-site ):

"\342\200\176" - "'"  
"\342\200\177" - "'"  
"\342\200\230" - "'"  
"\342\200\231" - "'"  
"\342\200\232" - ','  
"\342\200\233" - "'"  
"\342\200\234" - '"'  
"\342\200\235" - '"'  
"\342\200\041" - '-'  
"\342\200\174" - '-'  
"\342\200\220" - '-'  
"\342\200\223" - '-'  
"\342\200\224" - '--'  
"\342\200\225" - '--'  
"\342\200\042" - '--'  
"\342\200\246" - '...' 
0

精彩评论

暂无评论...
验证码 换一张
取 消