开发者

CSV importing in Rails - invalid byte sequence in UTF-8 with non-english characters

开发者 https://www.devze.com 2023-04-03 10:49 出处:网络
I\'m using the CSVMapper Gem to import some records in a CSV file to a Rails 3 model. (I used this gem because it is what I\'ve found the easiest way to do this)

I'm using the CSVMapper Gem to import some records in a CSV file to a Rails 3 model. (I used this gem because it is what I've found the easiest way to do this)

Anyway, the code I'm using to import the records is the following:

r = import('doc/socios_full.csv') do
    map_to Associate
    after_row lambda{|row, associate| associate.save }
    start_at_row 1
    [group,member,family_relationship_code,family_relationship_description,last_name,names,...]
#The previous line is actually longer, with more atts, but it's been cut to explain the example
end

And it works very well, except when the parser encounters some non-english characters, like ó, é, ñ, í, °.... That's when I get the following error:

ArgumentError: invalid byte sequence in UTF-8
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `sub!'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `block in shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `loop'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1767:in `each'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in `each_with_index'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:10开发者_开发技巧6:in `import'
    from (irb):63
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands.rb:23:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

I'm really certain of this because if I replace all of these characters, the problem goes away until the parser finds another non-english character. The thing is that I have a 50k records file, so searching for each character I can think of and trying to import all of these records every time is very time consuming.

Is there a way to ignore these errors and allow the parser to go on? Or is there an easier way to import this CSV file?


Do it like this:

CSV.foreach(filename, :headers => true , :encoding => 'ISO-8859-1') do |row|

I had the same problem trying to read in a CSV file saved via MS Excel. You can specify the encoding as an option. I guess it assumes UTF-8 by default.


Solved it with a different approach, this is a much easier solution for importing CSV files into a Rails 3 model than using an external gem:

    require 'csv'
    CSV.foreach('doc/socios_full.csv') do |row|
        record = Associate.new(
            :media_format   => row[0], 
            :group => row[0],
            :member => row[1],
            :family_relationship_code => row[2],
            :family_relationship_description => row[3],
            :last_name => row[4],
            :names => row[5],
            ...
        )
        record.save!
    end

It works flawlessly, even with non-english characters (just tried a 75k import file!). Hope it's helpful for someone.


Maybe, you can try something like this:

csv_string.force_encoding('ISO-8859-1')


The following approach should work in any model assuming you are confident that the CSV will contain the correct header names:

  def self.import(file)
    CSV.foreach(file.path, headers: true) do |row|
      obj = self.new
      obj.attributes.each_key do |attribute|
        index = row.headers.index(attribute)
        obj.send("#{attribute}=",row[index]) if index
      end
      obj.save
    end
  end
0

精彩评论

暂无评论...
验证码 换一张
取 消