That's pretty much it. I'm using Nokogiri to scrape a web page what has ’ ; characters in it, and I can't figure out how to do the conversion. here's what I tried:
str.gsub(/&开发者_如何学JAVAamp;#8217;/,"'")
str.gsub("’","'")
str.gsub("ΓÇÖ","'") # that's how it looks when I do a puts
(In the above, there's no space between the ’ and the ";", but if I don't put the space in, SO converts it to an apostrophe -- the cruel, cruel irony!)
I'm sure this is covered somewhere, but couldn't find the solution here or on the web.
TIA
str.gsub("\342\200\231", "'")
should work
I got this from:
'’'.to_s
=> "\342\200\231"
Other html characters that may be substituted ( http://ask.metafilter.com/62656/Eliminating-odd-characters-from-web-site ):
"\342\200\176" - "'"
"\342\200\177" - "'"
"\342\200\230" - "'"
"\342\200\231" - "'"
"\342\200\232" - ','
"\342\200\233" - "'"
"\342\200\234" - '"'
"\342\200\235" - '"'
"\342\200\041" - '-'
"\342\200\174" - '-'
"\342\200\220" - '-'
"\342\200\223" - '-'
"\342\200\224" - '--'
"\342\200\225" - '--'
"\342\200\042" - '--'
"\342\200\246" - '...'
精彩评论