开发者

Parse text using regular expressions

开发者 https://www.devze.com 2022-12-19 05:29 出处:网络
I have a dictionary in 开发者_开发问答.txt format, which looks like this: term 1 definition 1 definition 2

I have a dictionary in 开发者_开发问答.txt format, which looks like this:

term 1
    definition 1
    definition 2

term 2
    definition 1
    definition 2
    definition 3
etc.

There is a tab always before a definition, basically it's like this:

term 1
[tab]definition 1
[tab]definition 2
etc.

Now I need to wrap every term and it's definitions with <term> tag, i.e:

<term>
term 1
    definition 1
    definition 2
</term>

I was trying to use regular expressions to find term with it's definitions, but with no luck. Could you please help me with this?

Thank you for any suggestions!


Try this regular expression:

(^|\n).+(\n[ \t]+.+)*

Assuming that ^ marks the start of the string, \n is the line break character and . does not match line breaks.


Assuming an implementation that

  1. Matches multiple lines (/.../m)
  2. Uses \A to indicate the start of a line

this should match one "term":

\A[^\t][^\n]+\n(\t[^\n]+\n)+


Match a line with a leading non-whitespace character followed by one or more lines with leading TABs:

$ perl -0077 -pe 's/^(\S.+\n(^\t.+\n)+)/<term>\n$1<\/term>\n/mg' dict
<term>
term 1
        definition 1
        definition 2
</term>

<term>
term 2
        definition 1
        definition 2
        definition 3
</term>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号