开发者

String tokenisation algorithm won't tokenise

开发者 https://www.devze.com 2023-01-18 21:58 出处:网络
Morning all, I am writing a bash script to extract the values of certain XML tags from all files in a given directory.I have decided to do this by tokenising each line and returning th4e relavent toke

Morning all, I am writing a bash script to extract the values of certain XML tags from all files in a given directory. I have decided to do this by tokenising each line and returning th4e relavent token. The problem is that it isn't tokenising correctly and I can't quite work out why. Here is the smallest example that I could make that reconstructs the issue

#!/bin/bash
for file in `ls $MY_DIRECTORY`
do
    for line in `cat $MY_DIRECTORY/$file`
    do
        LOCALIFS=$IFS
        IFS=<>\"

        TOKENS=( $line )
        IFS=$LOCALIFS
        echo "Token 0: ${TOKENS[0]}" 
        echo "Token 1: ${TOKENS[1]}" 
        echo "Token 2: ${TOKENS[2]}" 
        echo "Token 3: ${TOKENS[3]}" 

    done
 done

I'm guessing the issue is to do with my fiddl开发者_运维百科ing with IFS inside a loop which itself uses IFS (i.e. the cat operation), but this has never been a problem before.

Any ideas?

Thanks, Rik


Use a better tool to parse xml, ideally it should be a parser, but if your requirement is simple and you know how your xml is structured, simple string manipulation might suffice. For example, xml file and you want to get value of tag3

$  cat file
blah
<tag1>value1 </tag1>
<tag2>value2 </tag2>
<tag3>value3
</tag3>
blah

$ awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' file
value3

so to iterate over your directory

for file in *.xml
do
  value="$(awk -vRS="</tag3>" '/tag2/{ gsub(/.*tag3>/,"");print}' "$file" )"
  echo "$value"
done 
0

精彩评论

暂无评论...
验证码 换一张
取 消