I have a simple xml file and I want to remove everything before the first <item>
tag.
<sometag>
<something>
.....
</something>
<item>item1
</item>
....
</sometag>
The following java code is not working:
Str开发者_如何学Going cleanxml = rawxml.replace("^[\\s\\S]+<item>", "");
What is the correct way to do this? And how do I address the non-greedy issue? Sorry I'm a C# programmer.
Well, if you want to use regex, then you can use replaceAll
. This solution uses a reluctant quantifier and a backreference:
String cleanxml = rawxml.replaceAll(".*?(<item>.*)", "$1");
Alternately you can use replaceFirst
. This solution uses a positive lookahead.
String cleanxml = rawxml.replaceFirst(".*?(?=<item>)", "");
It makes more sense to just use indexOf
and substring
, though.
String cleanxml = rawxml.substring(rawxml.indexOf("<item>"));
The reason why replace
doesn't work is that neither char
nor CharSequence
overloads is regex-based. It's simple character (sequence) replacement.
Also, as others are warning you, unless you're doing processing of simple XMLs, you shouldn't use regex. You should use an actual XML parser instead.
... What is the correct way to do this? ...
Since you asked about the correct way the correct way to do this is to parse the XML and remove the nodes and re-serialize to a String. You should never use regular expressions for manipulating XML or any other structured document that has parsers available ( JSON, YAML, etc).
For small XML I would suggest JDOM.
use
replaceAll
or
replaceFirst
just replace will look for string matches HTH
精彩评论