I have a HUGE file with a lot of HL7 segments. It must be split into 1000 (or so ) smaller files. Since it has HL7 data, there is a pattern (logic) to go by. Each data chunk starts with "MSH|" and ends when next segment starts with "MSH|".
The script must be windows (c开发者_Python百科md) based or VBS as I cannot install any software on that machine.
File structure:
MSH|abc|123|....
s2|sdsd|2323|
...
..
MSH|ns|43|...
...
..
..
MSH|sdfns|4343|...
...
..
asds|sds
MSH|sfns|3|...
...
..
as|ss
File in above example, must be split into 2 or 3 files. Also, the files comes from UNIX, so newlines must remain as they are in the source file.
Any help?
This is a sample script that I used to parse large hl7 files into separate files with the new file names based on the data file. Uses REBOL which does not require installation ie. the core version does not make any registry entries.
I have a more generalised version that scans an incoming directory and splits them into single files and then waits for the next file to arrive.
Rebol [
file: %split-hl7.r
author: "Graham Chiu"
date: 17-Feb-2010
purpose: {split HL7 messages into single messages}
]
fn: %05112010_0730.dat
outdir: %05112010_0730/
if not exists? outdir [
make-dir outdir
]
data: read fn
cnt: 0
filename: join copy/part form fn -4 + length? form fn "-"
separator: rejoin [ newline "MSH"]
parse/all data [
some [
[ copy result to separator | copy result to end ]
(
write to-file rejoin [ outdir filename cnt ".txt" ] result
print "Got result"
?? result
cnt: cnt + 1
)
1 skip
]
]
HL7 has a lot of segments - I assume that you know that your file has only MSH segments. So, have you tried parsing the file for the string "(newline)MSH|"? Just keep a running buffer and dump that into an output file when it gets too big.
精彩评论