开发者

How to match any non white space character except a particular one?

开发者 https://www.devze.com 2023-03-08 02:46 出处:网络
In Perl \\S matches any non-whitespace character. How can I match any non-whitespace character except a backslash \\?开发者_JAVA百科You can use a character class:

In Perl \S matches any non-whitespace character.

How can I match any non-whitespace character except a backslash \?开发者_JAVA百科


You can use a character class:

/[^\s\\]/

matches anything that is not a whitespace character nor a \. Here's another example:

[abc] means "match a, b or c"; [^abc] means "match any character except a, b or c".


You can use a lookahead:

/(?=\S)[^\\]/


This worked for me using sed [Edit: comment below points out sed doesn't support \s]

[^ ]

while

[^\s] 

didn't

# Delete everything except space and 'g'
echo "ghai ghai" | sed "s/[^\sg]//g"
gg

echo "ghai ghai" | sed "s/[^ g]//g"
g g


On my system: CentOS 5

I can use \s outside of collections but have to use [:space:] inside of collections. In fact I can use [:space:] only inside collections. So to match a single space using this I have to use [[:space:]] Which is really strange.

echo a b cX | sed -r "s/(a\sb[[:space:]]c[^[:space:]])/Result: \1/"

Result: a b cX
  • first space I match with \s
  • second space I match alternatively with [[:space:]]
  • the X I match with "all but no space" [^[:space:]]

These two will not work:

a[:space:]b  instead use a\sb or a[[:space:]]b

a[^\s]b      instead use a[^[:space:]]b


If using regular expressions in bash or grep or something instead of just in perl, \S doesn't work to match all non-whitespace chars. The equivalent of \S, however, is [^\r\n\t\f\v ].

So, instead of this:

[^\s\\]

...you'll have to do this instead, to match no whitespace chars (regex: \r\n\t\f\v ) and no backslash (\; regex: \\)

[^\r\n\t\f\v \\]

References:

  1. [my answer] Unix & Linux: Any non-whitespace regular expression


In this case, it's easier to define the problem of "non-whitespace without the backslash" to be not "whitespace or backslash", as the accepted answer shows:

/[^\s\\]/

However, for tricker problems, the regex set feature might be handy. You can perform set operations on character classes to get what you want. This one subtracts the set that is just the backslash from the set that is the non-whitespace characters:

use v5.18;
use experimental qw(regex_sets);

my $regex = qr/abc(?[ [\S] - [\\] ])/;


while( <DATA> ) {
    chomp;
    say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
    }

__DATA__
abcd
abc d
abc\d
abcxyz
abc\\xyz

The output shows that neither whitespace nor the backslash matches after c:

[abcd] Matched
[abc d] Missed
[abc\d] Missed
[abcxyz] Matched
[abc\\xyz] Missed

This gets more interesting when the larger set would be difficult to express gracefully and set operations can refine it. I'd rather see the set operation in this example:

[b-df-hj-np-tv-z]
(?[ [a-z] - [aeiou] ])
0

精彩评论

暂无评论...
验证码 换一张
取 消