Parse text using regular expressions_问答_开发者

开发者 https://www.devze.com 2022-12-19 05:29 出处：网络

I have a dictionary in 开发者_开发问答.txt format, which looks like this: term 1 definition 1 definition 2

相关专题：parsing regex

I have a dictionary in 开发者_开发问答.txt format, which looks like this:

term 1
    definition 1
    definition 2

term 2
    definition 1
    definition 2
    definition 3
etc.

There is a tab always before a definition, basically it's like this:

term 1
[tab]definition 1
[tab]definition 2
etc.

Now I need to wrap every term and it's definitions with <term> tag, i.e:

<term>
term 1
    definition 1
    definition 2
</term>

I was trying to use regular expressions to find term with it's definitions, but with no luck. Could you please help me with this?

Thank you for any suggestions!

Try this regular expression:

(^|\n).+(\n[ \t]+.+)*

Assuming that ^ marks the start of the string, \n is the line break character and . does not match line breaks.

Assuming an implementation that

Matches multiple lines (/.../m)
Uses \A to indicate the start of a line

this should match one "term":

\A[^\t][^\n]+\n(\t[^\n]+\n)+

Match a line with a leading non-whitespace character followed by one or more lines with leading TABs:

$ perl -0077 -pe 's/^(\S.+\n(^\t.+\n)+)/<term>\n$1<\/term>\n/mg' dict
<term>
term 1
        definition 1
        definition 2
</term>

<term>
term 2
        definition 1
        definition 2
        definition 3
</term>

Parse text using regular expressions

精彩评论

关注公众号

热门标签

图文推荐

Parse text using regular expressions

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：