Given a wikiText string such as:
{{ValueDescription
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
}}
I'd like to parse templates ValueDescription
and Tag
in Java/Groovy.
I tried with with regex 开发者_开发技巧/\{\{\s*Tag(.+)\}\}/
and it's fine (it returns |name
|ref
and |motorcar||yes
), but
/\{\{\s*ValueDescription(.+)\}\}/
doesn't work (it should return all the text above).
The expected output
Is there a way to skip nested templates in the regex?
Ideally I would rather use a simple wikiText 2 xml tool, but I couldn't find anything like that.
Thanks! Mulone
Arbitrarily nested tags won't work since that's makes the grammar non-regular. You need something capable of dealing with a context-free grammar. ANTLR is a fine option.
Create your regex pattern using Pattern.DOTALL
option like this:
Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}", Pattern.DOTALL);
Sample Code:
Pattern p=Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}",Pattern.DOTALL);
Matcher m=p.matcher(str);
while (m.find())
System.out.println("Matched: [" + m.group(1) + ']');
OUTPUT
Matched: [
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
]
Update
Assuming closing }}
appears on a separate line for {{ValueDescription
following pattern will work to capture multiple ValueDescription
:
Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);
精彩评论