开发者

Storing CSV data in Ruby hash

开发者 https://www.devze.com 2023-01-29 22:34 出处:网络
Say I have a CSV file with 4 fields, ID,name,pay,age and about 32,000 records. What\'s the best way to stick this into a hash in Ruby?

Say I have a CSV file with 4 fields,

ID,name,pay,age

and about 32,000 records.

What's the best way to stick this into a hash in Ruby?

In other words, an example record would look like:

{:rec1 => {:id=>"00001", :name => "Bo开发者_JS百科b", :pay => 150, :age => 95 } }

Thanks for the help!


You can use the Excelsior rubygem for this:

csv = ...
result = Hash.new
counter = 1
Excelsior::Reader.rows(csv) do |row|
   row_hash = result[("rec#{counter}".intern)] = Hash.new

   row.each do |col_name, col_val|
      row_hash[col_name.intern] = col_val
   end
   counter += 1
end

# do something with result...


Typically we'd want to use an :id field for the Hash key, since it'd be the same as a primary key in a database table:

{"00001" => {:name => "Bob", :pay => 150, :age => 95 } }

This will create a hash looking like that:

require 'ap'

# Pretend this is CSV data...
csv = [
  %w[ id     name  pay age ],
  %w[ 1      bob   150 95  ],
  %w[ 2      fred  151 90  ],
  %w[ 3      sam   140 85  ],
  %w[ 31999  jane  150 95  ]

]

# pull headers from the first record
headers = csv.shift

# drop the first header, which is the ID. We'll use it as the key so we won't need a name for it.
headers.shift

# loop over the remaining records, adding them to a hash
data = csv.inject({}) { |h, row| h[row.shift.rjust(5, '0')] = Hash[headers.zip(row)]; h }
ap data

# >> {
# >>     "00001" => {
# >>         "name" => "bob",
# >>          "pay" => "150",
# >>          "age" => "95"
# >>     },
# >>     "00002" => {
# >>         "name" => "fred",
# >>          "pay" => "151",
# >>          "age" => "90"
# >>     },
# >>     "00003" => {
# >>         "name" => "sam",
# >>          "pay" => "140",
# >>          "age" => "85"
# >>     },
# >>     "31999" => {
# >>         "name" => "jane",
# >>          "pay" => "150",
# >>          "age" => "95"
# >>     }
# >> }


Check out the Ruby Gem smarter_csv, which parses CSV-files and returns array(s) of hashes for the rows in the CSV-file. It can also do chunking, to more efficiently deal with large CSV-files, so you can pass the chunks to parallel Resque workers or mass-create records with Mongoid or MongoMapper.

It comes with plenty of useful options - check out the documentation on GitHub

require 'smarter_csv'
filename = '/tmp/input.csv'
array = SmarterCSV.process(filename)

=>

[ {:id=> 1, :name => "Bob", :pay => 150, :age => 95 } ,
 ...
]

See also:

  • https://github.com/tilo/smarter_csv
  • http://www.unixgods.org/~tilo/Ruby/process_csv_as_hashes.html


Hash[*CSV.read(filename, :headers => true).flat_map.with_index{|r,i| ["rec#{i+1}", r.to_hash]}]
0

精彩评论

暂无评论...
验证码 换一张
取 消