开发者

AWK Script - What does this script do?

开发者 https://www.devze.com 2023-03-11 15:21 出处:网络
I need to duplicate processing of this AWK script but cannot figure out what it is doing. Can anyone please advise what the basic function of this script is?

I need to duplicate processing of this AWK script but cannot figure out what it is doing. Can anyone please advise what the basic function of this script is?

It takes an input file, and creates an output file but I do not have access to either files to see what it is doing. It has something to do with the pipe delimiter which delimits开发者_JAVA百科 columns in the input file.

{ 
   if (NR == 1) {
     line = $0
     len = length(line)
       newlen = len
     while ( substr(line,newlen-1,1) == "|" )
       {
         newlen = newlen - 1
       }
     line = substr(line,1,newlen-1)
   }
     else {
     print line
     line = $0
     }
 }
 END{
      len = length(line)
      newlen = len
    while ( substr(line,newlen-1,1) == "|" ) {
      newlen = newlen - 1
    }
    line = substr(line,1,newlen-1)
      print line
}


it looks like it's trimming all trailing pipe chars on the first and last lines only.


Wow, whoever wrote this must have been paid by the line.

The block of code that occurs twice, from len = length(line) to line = substr(line,1,newlen-1), is doing a string transformation that could be simply (and more clearly) expressed as a regular expression replacement. It's calculating the number of | characters at the end of line and stripping them. When the line ends with a character other than |, one character is stripped (this may be accidental). This could be simply performed as gsub(/(\|+|.)$/, "", line), or gsub(/\|+)$/, "", line) if the behavior with no final | doesn't matter.

As for the overall structure, there are three parts in the code: what's done for the first line (if (NR == 1) {…}, what's done for other lines (else {…}), and what's done after the last line (END {…}). On the first line, the variable line is set to $0 transformed. On subsequent lines, the saved line is printed then line is set to the current line. Finally the last line is printed, transformed. This print-previous-then-save-current pattern is a common trick to act differently on the last line: when you read a line, you can't know whether it's the last one, so you save it, print the previous line and move on; in the END block you do that different thing for the last line.

Here's how I'd write it. The data flow is similarly nontrivial (but hardly contrived either), but at least it's not drowned in a messy text transformation.

function cleanup (line) { gsub(/(\|+|.)$/, "", line); return line }
NR != 1 { print prev }
{ prev = (NR == 1 ? cleanup($0) : $0) }
END { print cleanup(prev) }


I may be wrong but on quick glance it seems to filter out the | caracter in a file.

0

精彩评论

暂无评论...
验证码 换一张
取 消