We receive fixed length datasets from a client that look something like this:
1 SOMEFILE 20110922
2 20110101ABC999
3 JOHN SMITH 19800201
4 5000000 1000
2 20060101DEF999
3 JANE KOTZE 19811001
4 200000 800
5 5200000 1800
where the number in the first position on each line indicates the type of information in the line. The types are:
1 Header record (only appears once, in the first line)
2 Contract record
3 Person record
4 Amounts record
5 Trailer record (only appears once, in the last line)
The information in 2, 3 and 4 all actually relate to one record, and I need to find a way at upload stage to combine them into one. There are no identifiers that explicitly specify which combinations of 2, 3 and 4 belong with one another, but in all cases they have been ordered in the raw data to appear directly below one another.
What I need is a preprocessing step that will take the original data and then combine the correct 2,3 and 4 lines into one record (and then output again as a txt file), like this:
20110101ABC999JOHN SMITH 198002015000000 1000
20060101DEF999JANE KOTZE 19811001200000 800
I have thought of bcp'ing into SQL (or even just using Access) and assigning an auto-incremented integer as PK. i.e:
PK Type Record
1 1 SOMEFILE 20110922
2 2 20110101ABC999
3 3 JOHN SMITH 19800201
4 4 5000000 1000
5 2 20060101DEF999
6 3 JANE KOTZE 19811001
7 4 200000 800
8 5 5200000 1800
and then doing something like:
select
type2.[record]+type3.[record]+type4.[record]
from
(select [record] from uploaded where [type]=2) as type2
join
(select [record] from uploaded where [type]=3) as type3
on type2.PK + 1 = type3.PK
join
(select [record] from uploaded where [type]=4) as type4
on type2.PK + 2 = type4.PK
But what I am worried about is that this is entirely dependent on SQL Server assigning the PKs in the order that the data appears in die input file; I am not sure that this would necessarily be the case.
Does anyo开发者_运维问答ne know? Or know of a better way to do this?
Thanks
KarlEdit: added second solution
Solution 1:
You can not be sure regarding SQL Server insert order.
You have to do some text file processings before importing your data in SQL Server. For example, you can use PowerShell
to add a PK
into file thus:
$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt
for($i=0; $i -lt $rows.length; $i++)
{
$row = $rows[$i]
$temp=("00000"+[string]($i+1))
$rows[$i]=$temp.substring($temp.length-5)+" "+$row
}
SET-CONTENT -PATH D:\BD\Samples\MyDataResults.txt $rows
Before (MyData.txt
content):
1 SOMEFILE 20110922
2 20110101ABC999
3 JOHN SMITH 19800201
4 5000000 1000
2 20060101DEF999
3 JANE KOTZE 19811001
4 200000 800
5 5200000 1800
After PowerShell
processing (MyDataResults.txt
content):
00001 1 SOMEFILE 20110922
00002 2 20110101ABC999
00003 3 JOHN SMITH 19800201
00004 4 5000000 1000
00005 2 20060101DEF999
00006 3 JANE KOTZE 19811001
00007 4 200000 800
00008 5 5200000 1800
In both PS scripts I assume you can insert max. 99999 rows.
Solution 2:
$rows = GET-CONTENT -PATH D:\BD\Samples\MyData.txt
$rows[0]="00000 "+$row
$rows[$rows.length-1]="99999 "+$row
$groupid=0
for($i=1; $i -lt $rows.length-1; $i=$i+3)
{
$groupid++
$row = $rows[$i]
$temp=("00000"+[string]$groupid)
$rows[$i]=$temp.substring($temp.length-5)+" "+$row
$row = $rows[$i+1]
$temp=("00000"+[string]$groupid)
$rows[$i+1]=$temp.substring($temp.length-5)+" "+$row
$row = $rows[$i+2]
$temp=("00000"+[string]$groupid)
$rows[$i+2]=$temp.substring($temp.length-5)+" "+$row
}
SET-CONTENT -PATH D:\BD\Samples\MyDataResults2.txt $rows
Results:
00000 4 200000 800
00001 2 20110101ABC999
00001 3 JOHN SMITH 19800201
00001 4 5000000 1000
00002 2 20060101DEF999
00002 3 JANE KOTZE 19811001
00002 4 200000 800
99999 4 200000 800
精彩评论