开发者

Java Inner Text (getTextContents()) Problem

开发者 https://www.devze.com 2022-12-13 14:35 出处:网络
I\'m trying to do some parsing in Java and I\'m 开发者_JAVA百科using Cobra HTML Parser to get the HTML into a DOM then I\'m using XPath to get the nodes I want. When I get down to the desired level I

I'm trying to do some parsing in Java and I'm 开发者_JAVA百科using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like

"\n\n\nValue\n-\nValue\n\n\n"

Is there a built in way to get rid of the line breaks? I would like to do a RegEx like

(?:\s*([^-]+)\s*-\s*([^-]+)\s*)

on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.

Example Input:

Value
-
Value

Thanks


You can use String.replaceAll().

String trimmed = original_string.replaceAll("\n", "");

The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "") for instance.


I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:

String s = node.getTextContents().replaceAll("\\s","");

If you just want to get rid of the leading/trailing whitespace, use trim().

0

精彩评论

暂无评论...
验证码 换一张
取 消