开发者

Shell metacharacters - A shorter way to match these types of filenames?

开发者 https://www.devze.com 2023-02-10 17:10 出处:网络
For an exercise I wrote an expression consisting of meta-characters which match at most 3 uppercase characters.

For an exercise I wrote an expression consisting of meta-characters which match at most 3 uppercase characters.

开发者_运维知识库

Example

a -> match
A -> match
Ab -> match
AbC -> match
AbCd -> match 
...
ABCD -> no match, 4 uppercase chars

This is what I've come up with but I got a feeling I could make it shorter

ls @(!(*[A-Z]*)|*[A-Z]*|*[A-Z]*[A-Z]*|*[A-Z]*[A-Z]*[A-Z]*)

EDIT

Sry for the confusion. First of all, I'm only allowed to use meta-characters, no regular expressions, no test, no tools like awk/sed/something else. Moreover, the uppercase letters must not be consecutive.

EDIT

Okay, this one seems to work (but is even longer!).

export LC_COLLATE=C

ls @(!(*[A-Z]*)|!(*[A-Z]*)[A-Z]!(*[A-Z]*)|[A-Z]!(*[A-Z]*)[A-Z]!(*[A-Z]*)|!(*[A-Z]*)[A-Z]!(*[A-Z]*)[A-Z]!(*[A-Z]*)[A-Z]!(*[A-Z]*)


Your pattern doesn't work for me. One problem is that in many non-C locales [A-Z] includes some lowercase characters.

$ for c in a A b B z Z; do if [[ $c = [A-Z] ]]; then echo "match: $c"; else echo "no match: $c"; fi; done
no match: a
match: A
match: b
match: B
match: z
match: Z

Try it again with LANG=C. If you want to match only uppercase characters regardless of locale, use [[:upper:]].

Another reason yours doesn't work is that parts of it always match.

For example:

!(*[A-Z]*)

(even if it's corrected using [[:upper:]]) matches (rejects) anything that consists of only uppercase characters regardless of length. However, the rest of the (partially corrected) pattern includes uppercase characters explicitly while including any character including uppercase ones implicitly because of the asterisks. So just the first part of that:

*[[:upper:]]*

says to include all strings that consist of at least one uppercase character without regard to how many more there may be: one, ten, a million.

Instead, try this:

if [[ $string != *[[:upper:]]*[[:upper:]]*[[:upper:]]*[[:upper:]]* ]]
then
    echo "match: fewer than four uppercase character"
fi

It simply checks to see if there are four or more uppercase characters.

You could also use a regular expression (in Bash 3.2 or greater):

if [[ ! $string =~ ^.*[[:upper:]].*[[:upper:]].*[[:upper:]].*[[:upper:]].*$ ]]
then
    echo "match: fewer than four uppercase character"
fi

Another way is to delete all the non-uppercase characters and compare the difference in lengths.

Demo:

#!/bin/bash
strings[1]='AbCdEfghi'
strings[2]='ABCD'
strings[3]='Ab1Cd2Ef3ghi'
strings[4]='A1BbC2Dd'

for string in "${strings[@]}"
do
    test=${string//[^[:upper:]]}
    if (( ${#test} > 3 ))
    then
        echo "no match: $string"
    else
        echo "match: $string"
    fi
done


Try:

ls | grep -E '^([^A-Z]*[A-Z][^A-Z]*){0,3}$'


$ echo "BCDAdf" | awk '{m=gsub(/[A-Z]/,"");print (m<4) ?"match":"no match"}'
no match

$ echo "CDAdf" | awk '{m=gsub(/[A-Z]/,"");print (m<4) ?"match":"no match"}'
match


Erik mentioned using grep, so I'll use it too.

I think it should be:

/bin/ls -1 | grep -E '^[^A-Z]*([A-Z][^A-Z]*([A-Z][^A-Z]*([A-Z][^A-Z]*)?)?)?$'

which can be shortened to:

/bin/ls -1 | grep -E '^[^A-Z]*([A-Z][^A-Z]*){0,3}$'

If you really want to use bash extended patterns, it should look like this:

/bin/ls -1 *([^A-Z])?([A-Z]*([^A-Z]))?([A-Z]*([^A-Z]))?([A-Z]*([^A-Z]))

Note that you have to set LC_COLLATE=C for this to work.


If you want to write it shorter, you can use the fact that parameter expansion (i.e. variable expansion) happens before filename expansion, and do something bizarre like this:

u='[A-Z]' # $u == uppercase characters
U='[^A-Z]' # $U == non-uppercase characters
/bin/ls -1 *($U)?($u*($U))?($u*($U))?($u*($U))

Whether that's a good idea, I leave for you to decide. ;-)


I believe something like this is correct:

!(*([^[:upper:]])[[:upper:]]*([^[:upper:]])[[:upper:]]*([^[:upper:]])[[:upper:]]*([^[:upper:]])[[:upper:]]*)

with extglob, of course.

0

精彩评论

暂无评论...
验证码 换一张
取 消