I have data files in text format which have several rows. Now there are certain rows that have wrong data which I need to update with those that have the correct data. For example,
Col1 Col2 Col3 Col4 .......
A1?% A foo fooo .......
B€(2 B .................
C&6 Z .................
A?04 Y .................
B++3 Q .................
C!5 C .................
D*9 D .................
The actual data is different but this is a simplified version of it. As you can see there are certain Col1 where A1 is A but A4 is Y and so on. The rest of the columns Col3, Col4 ... depend on Col2. So, I need to check if Col2 is A when there is an A in Col1 (A1, A2, A3 etc). If not I have to update Col2, Col3 .... based on the row where it is A.
How may this be accomplished in Perl. I kn开发者_C百科ow this kind of operations can be done in an database with an update statement but I don't have that luxury here and have to do it programatically.
Edit: The files are tab delimited and the data are strings that can contain any alphanumeric or ascii character.
The way I would do this is to open an input file handle and an output file handle, and go line by line through the file checking column one and, if its fine, just plop it into my output just as it is.
If it does need to change, I would make a new line with the necessary changes and put it into my output file as well.
This is a simple approach, that while not the greatest/elegant/whatever, would give you what you need quickly.
Populate a hashmap where the key is Col2 (A,B,C, etc) and the value is the rest of the columns (Col3, Col4, etc). Only make Col2 the key if Col1 and Col2 match as you want.
Then when writing out the file if Col1 and Col2 do not match, do a lookup in the hash on the first character of Col1. This will get you the Col3, Col4... values to insert.
Use a CSV processor!
At least Text::CSV
or relatives like Text::CSV_XS
(faster) or Text::CSV::Encoded
(e.g. for UTF-8).
DBD::CSV
provides SQL.
Below is a skeleton of a basic program structure to allow you to do this. If I knew what you wanted to do I could be a lot more helpful.
I had made the easiest guess possible, and I treated your input files as if they were fixed-column with widths=7,6,*. As you have since informed me that they are tab-delimited, I have changed the code that breaks up the data into fields.
use autodie;
use strict;
use warnings;
use English qw<$INPUT_LINE_NUMBER>;
my %data;
my $line_no;
open ( my $h, '<', 'good_file.dat' );
while ( <$h> ) {
my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
# next unless index( $col1, 'A' ) == 0;
$line_no = $INPUT_LINE_NUMBER;
my $rec
= { col1 => $col1
, col2 => $col2
, data => $data
, line => $line_no
};
push( @{ $data{"$col1-$col2"} }, $rec );
$data{ $line_no } = $rec;
}
close $h;
open ( $h, '<', 'old_file.dat' );
while ( <$h> ) {
my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
...
}
The following is just a way you could print your values back into the file.
open ( $h, '>', 'old_file.dat' );
foreach my $rec ( grep {; defined } @data{ 1..$line_no } ) {
printf $h "%s\t%s\t%s\n", @$rec{qw<col1 col2 data>};
}
But you really haven't given anyone enough help to help you.
精彩评论