I have the following CSV files, I want to merge these into a single CSV
01.csv
apples,48,12,7
pear,17,16,2
orange,22,6,1
02.csv
apples,51,8,6
grape,87,42,12
pear,22,3,7
03.csv
apples,11,12,13
grape,81,5,8
pear,11,5,6
04.csv
apples,14,12,8
orange,5,7,9
Desired output:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,87,42,12,81,5,8,,,
pear,17,16,2,22,3,7,11,5,6,,,
orange,22,6,1,,,,,,5,7,9
Can anyone provide guidance on how to achieve this? Preferably using Powershell but open to alternatives like Perl if that's easier.
Thanks Pantik, your code's output is close to what I want:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,87,42,12,81,5,8
orange,22,6,1,5,7,9
pear,17,16,2,22,3,7,11,5,6
Unfortunately I need "placeholder" commas in place for w开发者_如何学Pythonhen the entry is NOT present in a CSV file, e.g. orange,22,6,1,,,,,,5,7,9 rather than orange,22,6,1,5,7,9
UPDATE: I would like these parsed in order of the filenames, e.g.:
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
regards ted
Here is my Perl version:
use strict;
use warnings;
my $filenum = 0;
my ( %fruits, %data );
foreach my $file ( sort glob("*.csv") ) {
$filenum++;
open my $fh, "<", $file or die $!;
while ( my $line = <$fh> ) {
chomp $line;
my ( $fruit, @values ) = split /,/, $line;
$fruits{$fruit} = 1;
$data{$filenum}{$fruit} = \@values;
}
close $fh;
}
foreach my $fruit ( sort keys %fruits ) {
print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? @{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n";
}
Which gives me:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
So do you have a typo for grape or i have misunderstood something?
Ok, gangabass solution works, and is cooler than mine, but I'll add mine anyway. It is slightly stricter, and preserves a data structure that can be used as well. So, enjoy. ;)
use strict;
use warnings;
opendir my $dir, '.' or die $!;
my @csv = grep (/^\d+\.csv$/i, readdir $dir);
closedir $dir;
# sorting numerically based on leading digits in filename
@csv = sort {($a=~/^(\d+)/)[0] <=> ($b=~/^(\d+)/)[0]} @csv;
my %data;
# To print empty records we first need to know all the names
for my $file (@csv) {
open my $fh, '<', $file or die $!;
while (<$fh>) {
if (m/^([^,]+),/) {
@{ $data{$1} } = ();
}
}
close $fh;
}
# Now we can fill in values
for my $file (@csv) {
open my $fh, '<', $file or die $!;
my %tmp;
while (<$fh>) {
chomp;
next if (/^\s*$/);
my ($tag,@values) = split (/,/);
$tmp{$tag} = \@values;
}
for my $key (keys %data) {
unless (defined $tmp{$key}) {
# Fill in empty values
@{$tmp{$key}} = ("","","");
}
push @{ $data{$key} }, @{ $tmp{$key} };
}
}
&myreport;
sub myreport {
for my $key (sort keys %data) {
print "$key," . (join ',', @{$data{$key}}), "\n";
}
}
Powershell:
$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}
$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){
$file_hash = @{}
$produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
get-content $file | foreach-object{
$line = $_.split(",")
$file_hash[$line[0]] = $line[1..3]
}
$produce | foreach-object {
$produce_hash[$_] += $file_hash[$_]
}
}
$ofs = ","
$out = @()
$produce | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
Should be easy to modify for additional items. Just add them to the $produce array.
Second Powershell solution (as requested)
$produce = @()
$produce_hash = @{}
$file_count = -1
$myFiles = @(gci 0*.csv) | sort Name
foreach ($file in $myFiles){
$file_count ++
$file_hash = @{}
get-content $file | foreach-object{
$line = $_.split(",")
if ($produce -contains $line[0]){
$file_hash[$line[0]] += $line[1..3]
}
else {
$produce += $line[0]
$file_hash[$line[0]] = @(,$line[0]) + (@($null) * 3 * $file_count) + $line[1..3]
}
}
$produce | foreach-object {
if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]}
else {$produce_hash[$_] += @(,$null) * 3}
}
}
$ofs = ","
$out = @()
$produce_hash.keys | foreach-object {
$out += [string]$produce_hash[$_]
}
$out | out-file "outputfile.csv"
gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
you have to parse the files, I don't see easier way hot to do it
solution in powershell:
UPDATE: ok, adjusted a bit - hopefully understandable
$items = @{}
$colCount = 0 # total amount of columns
# loop through all files
foreach ($file in (gci *.csv | sort Name))
{
$content = Get-Content $file
$itemsToAdd = 0; # columns added by this file
foreach ($line in $content)
{
if ($line -match "^(?<group>\w+),(?<value>.*)")
{
$group = $matches["group"]
if (-not $items.ContainsKey($group))
{ # in case the row doesn't exists add and fill with empty columns
$items.Add($group, @())
for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" }
}
# add new values to correct row
$matches["value"].Split(",") | foreach { $items[$group] += $_ }
$itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count
}
}
# in case that file didn't contain some row, add empty cols for those rows
$colCount += $itemsToAdd
$toAddEmpty = @()
$items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ }
foreach ($key in $toAddEmpty)
{
for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" }
}
}
# output
Remove-Item "output.csv" -ea 0
foreach ($key in $items.Keys)
{
"$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv"
}
Output:
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,
Here is a more consise way how to do it. However, it still doesn't add the commas when the item is missing.
Get-ChildItem D:\temp\a\ *.csv |
Get-Content |
ForEach-Object -begin { $result=@{} } -process {
$name, $otherCols = $_ -split '(?<=\w+),'
if (!$result[$name]) { $result[$name] = @() }
$result[$name] += $otherCols
} -end {
$result.GetEnumerator() | % {
"{0},{1}" -f $_.Key, ($_.Value -join ",")
}
} | Sort
精彩评论