(I use BSD Sed.)
This bash script:
sed -E -f parsefile < parsewords.d
With this command file:
# Delete everything before BEGIN RTL and after END RTL
\?/\* BEGIN RTL \*/?,\?/\* END RTL \*/?!d
# Delete comments unless they begin with /*!
s?/\*[^!].*\*/??g
# Delete blank lines
/^[ ]*$/d
# Break line into words
s/[^A-Za-z0-9_]+/ /g
# Remove leading and trailing spaces and tabs
s/^[ ]*(.*)[ ]*$/\1/
With this input file:
any stuff
/* BEGIN RTL */
/*! INPUTS: a b c d ph1 */ /* Comment */
x = a && b || c && d;
y = x ? a : b; /* hello */
z = ph1 ? x : z;
w = c || x || (z || d);
/* END RTL */
Produces this result:
INPUTS a b c d ph1
x a b c d
y x a b
z ph1 x z
w c x z d
That's fine so far but what I'd really like to have is something like this:
x = a && b || c && d; x a b c d
y = x ? a : b; y x a b
z = ph1 ? x : z; z ph1 x z
w = c || x || (z || d); w c x z d
so that the original line is retained along with the mods that the script is making.
Is this possible with s开发者_如何学Pythoned or should I use something else. (Any other comments are welcome too.)
EDIT: This is not a parsing question. It is about retaining the original input line along with sed modifications.
A solution using 'sed'.
Input file (infile):
any stuff
/* BEGIN RTL */
/*! INPUTS: a b c d ph1 */ /* Comment */
x = a && b || c && d;
y = x ? a : b; /* hello */
z = ph1 ? x : z;
w = c || x || (z || d);
/* END RTL */
'Sed' program (script.sed):
# Delete everything before BEGIN RTL and after END RTL
\?/\* BEGIN RTL \*/?,\?/\* END RTL \*/?!d
# Delete comments unless they begin with /*!
s?/\*[^!].*\*/??g
# Delete blank lines
/^[ ]*$/d
# Copy current line in hold space.
h
# Break line into words
s/[^A-Za-z0-9_]+/ /g
# Join both lines with a ';'.
H ; g ; s/\n/ / ; s/;\s+/; /
# Remove leading and trailing spaces and tabs
s/^[ ]*(.*)[ ]*$/\1/
Execution:
$ sed -E -f script.sed infile
Output (I don't understand the line with the 'INPUTS' word, but change the script to adapt it):
/*! INPUTS: a b c d ph1 */ INPUTS a b c d ph1
x = a && b || c && d; x a b c d
y = x ? a : b; y x a b
z = ph1 ? x : z; z ph1 x z
w = c || x || (z || d); w c x z d
I'd say using sed for this kind of task will prove to be difficult.
Maybe you need to look into parsing/lexing?
精彩评论