开发者

Parsing using awk or sed in Unix

开发者 https://www.devze.com 2023-01-14 02:00 出处:网络
I have multiple files with hundreds of thousands of records in following format: 2010/08/10 10:07:52|TrainZoom|1393|38797|MyCustomerAPI->,mask = ZPTA,TransId = 1281460071578,testing :-> ,captur

I have multiple files with hundreds of thousands of records in following format:

2010/08/10 10:07:52|TrainZoom|1393|38797|MyCustomerAPI->,mask = ZPTA,TransId = 1281460071578,testing :-> , capture -> : , IMDB = 9113290830, equipmentName = GMT, technologyName = RMS,,,)|

There are fields separated by pipes and inside a field (last) there are many other fields separated by commas. What I want to 开发者_StackOverflowdo is to store all these values in database. For the above example my DB fields would be date-time, process-name, thread-id, sequence-num, API name, transId, imdb, equipmentName, technologyName.

API Name is not in a fixed format but I can assume that its a fixed length.

Please let me know if I can do this in Unix using awk or sed.

Thanks,


Just for fun:

cat file.txt | sed -e 's/\([^|]*\)|\([^|]*\)|\([^|]*\)|\([^|]*\)|/insert into table set date-time=\1, process-name=\2, thread-id=\3, sequence-num=\4, /' -e 's/,[^=]*,/,/g' -e 's/ \([,=]\)/\1/g' -e 's/\([,=]\) /\1/g' -e 's/\([^,]*\)=\([^,]*\),/\1="\2",/g' -e 's/"[^"]*$/";/' | mysql dbname


yes you can do at least the parsing in awk,

awk -F"|" 'BEGIN{q="\047"}
{
 printf "insert into table values("
 printf q $1 q","q $2,","
 # fill up the rest yourself
}
' file  

the above produces a bunch of insert statements. You can either pipe them to your database client, or save them to a .sql file, then use the database client to execute it.


I would use perl or python to do the job as they provide DB connectors for different flavor of DBs. I don't see what you want awk or sed to do, as they are "text to text transformers".

You should complete your question to clarify your goal (text => sql script? for example) and target DB if you want some sample code.


You can do the parsing, at least, directly in bash. I'm not sure what you want to do with the values, but this script parses each line and displays the individual fields:

#!/bin/bash

while IFS='|' read dateTime processName threadId sequenceNum other; do
    IFS=',' read apiName mask transId testing capture imdb equipmentName technologyName other <<< "$other"

    echo "date-time:       $dateTime"
    echo "process-name:    $processName"
    echo "thread-id:       $threadId"
    echo "sequence-num:    $sequenceNum"
    echo "api-name:        $apiName"
    echo "trans-id:        $transId"
    echo "imdb:            $imdb"
    echo "equipment-name:  $equipmentName"
    echo "technology-name: $technologyName"
done
0

精彩评论

暂无评论...
验证码 换一张
取 消