开发者

"Invalid multibyte char (US-ASCII)" error for ä, ü, ö, ß which are Ascii!

开发者 https://www.devze.com 2023-01-11 19:34 出处:网络
My application needs to handle some international characters, namely ä, ü, ö and ß, which are still ascii.

My application needs to handle some international characters, namely ä, ü, ö and ß, which are still ascii.

When I tested the behavior of ruby when dealing with these chars, I got this error:

test.rb:1: invalid multibyte char (US-ASCII)
test.rb:1: invalid multibyte char (US-ASCII)

for this code:

puts "i like my chars: ä, ü, ö and ß!"

But the strange thing is: When using the Interactive Ruby Shell, I get no error!

EDIT: 开发者_C百科In my application, I'm retrieving the data from an external api. The above code is just an example!


No, those characters aren't in ASCII. ASCII doesn't have any values above Unicode U+007F (decimal 127). See the wikipedia ASCII entry for more details.

I suspect the interactive Ruby shell is taking the native encoding of your shell, rather than ASCII.

Do you have a way of specifying the encoding of your .rb file? If so, use that - or change your scripts so they genuinely are ASCII.


Put the magic comment # coding: utf-8 at the beginning your your script (on the second line if you're using shebang).

#!/usr/local/bin/ruby
# coding: utf-8

puts "i like my chars: ä, ü, ö and ß!"


Those are not ASCII characters… They just happen to still be encoded in one byte in some legacy, ASCII-derived character sets. Most likely what is happening is that your source file is being saved as UTF-8 because it contains non-ASCII characters, and ruby is correctly handling this.

You're only getting away with it at the interactive prompt because your terminal is using some legacy character encoding.


To escape the characters for the ASCII encoding, use Unicode escape sequences:

puts "i like my chars: \u00E4, \u00FC, \u00F6 and \u00DF!"

Ruby 1.9, anyway - I can't remember if this works in 1.8.


What worked for me is removing the shebang line and running the script directly from ruby19 under macosx:

GreyJewel:Moralytics atma$ head -n1 moralytics 
**# coding: utf-8**
GreyJewel:Moralytics atma$ which ruby
/opt/local/bin/ruby
GreyJewel:Moralytics atma$ ruby --version
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11]
GreyJewel:Moralytics atma$ ruby moralytics 
Το λεξικό που χρησιμοποιούμε έχει: 407117 λέξεις
Το λεξιλόγειο του χρήστη atmat εμπεριέχει: 3943 λέξεις συνολικά.

That worked. Using shebang causes the error. I can't tell for sure why.


Add a magic comment in the script where you use non-ascii chars? It should go on top of the script.

# encoding: utf-8

It worked for me like charm.

Or if you want to make the project wide, you have an option of magic-encoding gem

0

精彩评论

暂无评论...
验证码 换一张
取 消