开发者

Checking a string to see if it contains numeric character in UNIX

开发者 https://www.devze.com 2023-01-08 11:53 出处:网络
I\'m new to UNIX, having only started it at work today, but experienced with Java, and have the following code:

I'm new to UNIX, having only started it at work today, but experienced with Java, and have the following code:

#/bin/bash
echo "Please enter a word:"
read word
grep -i $word $1 | cut -d',' -f1,2 | tr "," "-"> output

This works fine, but what I now need to do is to check when word is read, that it contains nothing but letters and if it has numeric characters in print "Invalid input!" message and ask them to enter it again. I assumed regular expressions with an if statement would be the easy way to do this but I cannot get my head around how to use them in UNIX as I am used to the Java application of them. Any hel开发者_如何学Cp with this would be greatly appreciated, as I couldn't find help when searching as all the solutions with regular expressions in linux I found only dealt with if it was either all numeric or not.


Yet another approach. Grep exits with 0 if a match is found, so you can test the exit code:

echo "${word}" | grep -q '[0-9]'
if [ $? = 0 ]; then
    echo 'Invalid input'
fi

This is /bin/sh compatible.


Incorporating Daenyth and John's suggestions, this becomes

if echo "${word}" | grep '[0-9]' >/dev/null; then
    echo 'Invalid input'
fi


The double bracket operator is an extended version of the test command which supports regexes via the =~ operator:

#!/bin/bash

while true; do
    read -p "Please enter a word: " word
    if [[ $word =~ [0-9] ]]; then
        echo 'Invalid input!' >&2
    else
        break
    fi
done

This is a bash-specific feature. Bash is a newer shell that is not available on all flavors of UNIX--though by "newer" I mean "only recently developed in the post-vacuum tube era" and by "not all flavors of UNIX" I mean relics like old versions of Solaris and HP-UX.

In my opinion this is the simplest option and bash is plenty portable these days, but if being portable to old UNIXes is in fact important then you'll need to use the other posters' sh-compatible answers. sh is the most common and most widely supported shell, but the price you pay for portability is losing things like =~.


If you're trying to write portable shell code, your options for string manipulation are limited. You can use shell globbing patterns (which are a lot less expressive than regexps) in the case construct:

export LC_COLLATE=C
read word
while
  case "$word" in
    *[!A-Za-z]*) echo >&2 "Invalid input, please enter letters only"; true;;
    *) false;;
  esac
do
  read word
done

EDIT: setting LC_COLLATE is necessary because in most non-C locales, character ranges like A-Z don't have the “obvious” meaning. I assume you want only ASCII letters; if you also want letters with diacritics, don't change LC_COLLATE, and replace A-Za-z by [:alpha:] (so the whole pattern becomes *[![:alpha:]]*).

For full regexps, see the expr command. EDIT: Note that expr, like several other basic shell tools, has pitfalls with some special strings; the z characters below prevent $word from being interpreted as reserved words by expr.

export LC_COLLATE=C
read word
while expr "z$word" : 'z[A-Za-z]*$' >/dev/null; then
  echo >&2 "Invalid input, please enter letters only"
  read word
fi

If you only target recent enough versions of bash, there are other options, such as the =~ operator of [[ ... ]] conditional commands.

Note that your last line has a bug, the first command should be

grep -i "$word" "$1"

The quotes are because somewhat counter-intuitively, "$foo" means “the value of the variable called foo” whereas plain $foo means “take the value of foo, split it into separate words where it contains whitespace, and treat each word as a globbing pattern and try to expand it”. (In fact if you've already checked that $word contains only letters, leaving the quotes won't do any harm, but it takes more time to think of these special cases than to just put the quotes every times.)


Yet another (quite) portable way to do it ...

if test "$word" != "`printf "%s" "$word" | tr -dc '[[:alpha:]]'`"; then
   echo invalid
fi


One portable (assuming bash >= 3) way to do this is to remove all numbers and test for length:

#!/bin/bash
read -p "Enter a number" var
if [[ -n ${var//[0-9]} ]]; then
    echo "Contains non-numbers!"
else
    echo "ok!"
fi

Coming from Java, it's important to note that bash has no real concept of objects or data types. Everything is a string, and complex data structures are painful at best.

For more info on what I did, and other related functions, google for bash string manipulation.


Playing around with Bash parameter expansion and character classes:

# cf. http://wiki.bash-hackers.org/syntax/pe

word="abc1def"
word="abc,def"
word=$'abc\177def'
# cf. http://mywiki.wooledge.org/BashFAQ/058 (no NUL byte in Bash variable)
word=$'abc\000def'   
word="abcdef"

(
set -xv
[[ "${word}" != "${word/[[:digit:]]/}" ]] && echo invalid || echo valid
[[ -n "${word//[[:alpha:]]/}" ]] && echo invalid || echo valid
)


Everyone's answers seem to be based on the fact that the only invalid characters are numbers. The initial questions states that they need to check that the string contains "nothing but letters".

I think the best way to do it is

nonalpha=$(echo "$word" | sed 's/[[:alpha:]]//g')
if [[ ${#nonalpha} -gt 0 ]]; then
    echo "Invalid character(s): $nonalpha"
fi

If you found this page looking for a way to detect non-numeric characters in your string (like I did!) replace [[:alpha:]] with [[:digit:]].

0

精彩评论

暂无评论...
验证码 换一张
取 消