I'm trying to do some parsing in Java and I'm 开发者_JAVA百科using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like
"\n\n\nValue\n-\nValue\n\n\n"
Is there a built in way to get rid of the line breaks? I would like to do a RegEx like
(?:\s*([^-]+)\s*-\s*([^-]+)\s*)
on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.
Example Input:
Value
-
Value
Thanks
You can use String.replaceAll().
String trimmed = original_string.replaceAll("\n", "");
The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "")
for instance.
I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:
String s = node.getTextContents().replaceAll("\\s","");
If you just want to get rid of the leading/trailing whitespace, use trim().
精彩评论