开发者

Java - Reg. Ex. File Question

开发者 https://www.devze.com 2022-12-27 18:23 出处:网络
I\'m grabbing lines from a text file and sifting line by line using regular expressions. I\'m trying to search for blank lines, meaning nothing or just whitespace.

I'm grabbing lines from a text file and sifting line by line using regular expressions. I'm trying to search for blank lines, meaning nothing or just whitespace.

However, what exactly is empty space? I know that whitespace is \s but what is a line that is nothing at all? null (\0)? newline (\n)?

I tried the test harness开发者_开发问答 in the Java tutorial to try and test to see what an empty space is but no luck so far.


An empty string "" is a string. It's not null. It doesn't have any character, not even \0 (which is just a character in Java, i.e. it's not a string terminator (JLS 10.9)).

The following are all true:

"" != null
"" instanceof String
"".contains("")

The following are true exclusively for an empty string:

"".matches("")
"".matches("^$")
"".length() == 0
"".isEmpty()

This is also true for an empty string as well as all other strings containing only whitespaces:

"".matches("\\s*");

This is because * is zero-or-more repetition of a pattern. Zero repetition of a whitespace is an empty string.

The following is also true for all strings containing only whitespaces:

s.trim().isEmpty()

Further discussions

I notiched that \s* detects one or more whitespaces. How do I make it so that it detects only whitespace? For example "test test" would be invalid?

\s* matches zero or more whitespaces, and "test test".matches("\\s*") is false.

However, you can find \s* in "test test", just as you can find it in any string, because \s* can match the empty string, and all strings contains("").

Figured it out... ^\s*[^a-zA-Z0-9\W]|^$

[^a-zA-Z0-9\W] doesn't really make any sense, and in fact "_".matches("^\\s*[^a-zA-Z0-9\\W]|^$").

Perhaps the confusion is because matches in Java needs to match the whole string (i.e. as if you've surrounded the entire pattern with ^ and $), so you can drop the anchors for matches but you'd need it for, say find. The proper regex for such methods would then be "^\\s*$", with the anchors explicitly included.

The following is an excerpt from cletus's original answer (which is now deleted):

Pattern p = Pattern.compile("^\\s*$", Pattern.MULTILINE);
Matcher m = p.matcher(fileString);
while (m.find()) {
  ...
}

The Pattern.MULTILINE allows ^ and $ to also match line terminators within fileString.


I usually use Apache Commons StringUtils -class. It has nice isEmpty() and isBlank() methods that handle also nulls nicely:

Checks if a String is empty ("") or null.

 StringUtils.isEmpty(null)      = true
 StringUtils.isEmpty("")        = true
 StringUtils.isEmpty(" ")       = false
 StringUtils.isEmpty("bob")     = false
 StringUtils.isEmpty("  bob  ") = false

.

Checks if a String is whitespace, empty ("") or null.

 StringUtils.isBlank(null)      = true
 StringUtils.isBlank("")        = true
 StringUtils.isBlank(" ")       = true
 StringUtils.isBlank("bob")     = false
 StringUtils.isBlank("  bob  ") = false
0

精彩评论

暂无评论...
验证码 换一张
取 消