开发者

How parse the data from TXT file with tab separator?

开发者 https://www.devze.com 2023-03-16 10:23 出处:网络
I am using ruby 1.8.7 , rails 2.3.8. I want to parse the data from TXT dump file separated by tab. In this TXT dump contain some CSS property look like has som开发者_JS百科e invalid data.

I am using ruby 1.8.7 , rails 2.3.8. I want to parse the data from TXT dump file separated by tab.

In this TXT dump contain some CSS property look like has som开发者_JS百科e invalid data.

How parse the data from TXT file with tab separator?

When run my code using FasterCSV gem

  FasterCSV.foreach(txt_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
  col= row.to_s.split(/\t/)
  puts col[15]
  end

the error written in console as "Illegal quoting on line 38." Can any one suggest me how to skip the row which has invalid data and proceed data load process of remaining rows?


Here's one way to do it. We go to lower level, using shift to parse each row and then silent the MalformedCSVError exception, continuing with the next iteration. The problem with this is the loop doesn't look so nice. If anyone can improve this, you're welcome to edit the code.

FasterCSV.open(filename, :quote_char => '"', :col_sep => "\t", :headers => true) do |csv|
  row = true
  while row
    begin
      row = csv.shift
      break unless row

      # Do things with the row here...
    rescue FasterCSV::MalformedCSVError
      next
    end
  end
end


Just read the file as a regular one (not with FasterCSV), split it like you do know by \t and it should work


So the problem is that TSV files don't have a quote character. The specification simply specifies that you aren't allowed to have tabs in the data.

The CSV library doesn't really support this use case. I've worked around it by specifying a quote character that I know won't appear in my data. For example

CSV.parse(txt_file, :quote_char => '☎', :col_sep => "\t" do |row|
   puts row[15] 
end
0

精彩评论

暂无评论...
验证码 换一张
取 消