开发者

Bash empty array expansion with `set -u`

开发者 https://www.devze.com 2023-04-09 02:22 出处:网络
I\'m writing a bash script which has set -u, and I have a problem with empty array expansion: bash appears to treat an empty array as an unset variable during expansion:

I'm writing a bash script which has set -u, and I have a problem with empty array expansion: bash appears to treat an empty array as an unset variable during expansion:

$ set -u
$ arr=()
$ echo "foo: '${arr[@]}'"
bash: arr[@]: unbound variable

(declare -a arr doesn't help either.)

A common solution to this is to use ${arr[@]-} instead, thus substituting an empty string instead of the ("undefined") empty array. However this is not a good solution, since now you can't discern between an array with a single empty string in it and an empty array. (@-expansion is special in bash, it expands "${arr[@]}" into "${arr[0]}" "${arr[1]}" …, which makes it a perfect tool for building command lines.)

$ countArgs() { echo $#; }
$ countArgs a b c
3
$ countArgs
0
$ countArgs ""
1
$ brr=("")
$ countArgs "${brr[@]}"
1
$ countArgs "${arr[@]-}"
1
$ countArgs "${arr[@]}"
bash: arr[@]: unbound variable
$ set +u
$ countArgs "${arr[@]}"
0

So is there a way around that problem, other than checking the length of an array开发者_如何学Python in an if (see code sample below), or turning off -u setting for that short piece?

if [ "${#arr[@]}" = 0 ]; then
   veryLongCommandLine
else
   veryLongCommandLine "${arr[@]}"
fi

Update: Removed bugs tag due to explanation by ikegami.


According to the documentation,

An array variable is considered set if a subscript has been assigned a value. The null string is a valid value.

No subscript has been assigned a value, so the array isn't set.

But while the documentation suggests an error is appropriate here, this is no longer the case since 4.4.

$ bash --version | head -n 1
GNU bash, version 4.4.19(1)-release (x86_64-pc-linux-gnu)

$ set -u

$ arr=()

$ echo "foo: '${arr[@]}'"
foo: ''

There is a conditional you can use inline to achieve what you want in older versions: Use ${arr[@]+"${arr[@]}"} instead of "${arr[@]}".

$ function args { perl -E'say 0+@ARGV; say "$_: $ARGV[$_]" for 0..$#ARGV' -- "$@" ; }

$ set -u

$ arr=()

$ args "${arr[@]}"
-bash: arr[@]: unbound variable

$ args ${arr[@]+"${arr[@]}"}
0

$ arr=("")

$ args ${arr[@]+"${arr[@]}"}
1
0: 

$ arr=(a b c)

$ args ${arr[@]+"${arr[@]}"}
3
0: a
1: b
2: c

Tested with bash 4.2.25 and 4.3.11.


The only safe idiom is ${arr[@]+"${arr[@]}"}

Unless you only care about Bash 4.4+, but you wouldn't be looking at this question if that were the case :)

This is already the recommendation in ikegami's answer, but there's a lot of misinformation and guesswork in this thread. Other patterns, such as ${arr[@]-} or ${arr[@]:0}, are not safe across all major versions of Bash.

As the table below shows, the only expansion that is reliable across all modern-ish Bash versions is ${arr[@]+"${arr[@]}"} (column +"). Of note, several other expansions fail in Bash 4.2, including (unfortunately) the shorter ${arr[@]:0} idiom, which doesn't just produce an incorrect result but actually fails. If you need to support versions prior to 4.4, and in particular 4.2, this is the only working idiom.

Bash empty array expansion with `set -u`

Unfortunately other + expansions that, at a glance, look the same do indeed emit different behavior. Using :+ instead of + (:+" in the table), for example, does not work because :-expansion treats an array with a single empty element (('')) as "null" and thus doesn't (consistently) expand to the same result.

Quoting the full expansion instead of the nested array ("${arr[@]+${arr[@]}}", "+ in the table), which I would have expected to be roughly equivalent, is similarly unsafe in 4.2.

You can see the code that generated this data along with results for several additional version of bash in this gist.


@ikegami's accepted answer is subtly wrong! The correct incantation is ${arr[@]+"${arr[@]}"}:

$ countArgs () { echo "$#"; }
$ arr=('')
$ countArgs "${arr[@]:+${arr[@]}}"
0   # WRONG
$ countArgs ${arr[@]+"${arr[@]}"}
1   # RIGHT
$ arr=()
$ countArgs ${arr[@]+"${arr[@]}"}
0   # Let's make sure it still works for the other case...


Turns out array handling has been changed in recently released (2016/09/16) bash 4.4 (available in Debian stretch, for example).

$ bash --version | head -n1
bash --version | head -n1
GNU bash, version 4.4.0(1)-release (x86_64-pc-linux-gnu)

Now empty arrays expansion does not emits warning

$ set -u
$ arr=()
$ echo "${arr[@]}"

$ # everything is fine


this may be another option for those who prefer not to duplicate arr[@] and are okay to have an empty string

echo "foo: '${arr[@]:-}'"

to test:

set -u
arr=()
echo a "${arr[@]:-}" b # note two spaces between a and b
for f in a "${arr[@]:-}" b; do echo $f; done # note blank line between a and b
arr=(1 2)
echo a "${arr[@]:-}" b
for f in a "${arr[@]:-}" b; do echo $f; done


@ikegami's answer is correct, but I consider the syntax ${arr[@]+"${arr[@]}"} dreadful. If you use long array variable names, it starts to looks spaghetti-ish quicker than usual.

Try this instead:

$ set -u

$ count() { echo $# ; } ; count x y z
3

$ count() { echo $# ; } ; arr=() ; count "${arr[@]}"
-bash: abc[@]: unbound variable

$ count() { echo $# ; } ; arr=() ; count "${arr[@]:0}"
0

$ count() { echo $# ; } ; arr=(x y z) ; count "${arr[@]:0}"
3

It looks like the Bash array slice operator is very forgiving.

So why did Bash make handling the edge case of arrays so difficult? Sigh. I cannot guarantee you version will allow such abuse of the array slice operator, but it works dandy for me.

Caveat: I am using GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) Your mileage may vary.


"Interesting" inconsistency indeed.

Furthermore,

$ set -u
$ echo $#
0
$ echo "$1"
bash: $1: unbound variable   # makes sense (I didn't set any)
$ echo "$@" | cat -e
$                            # blank line, no error

While I agree that the current behavior may not be a bug in the sense that @ikegami explains, IMO we could say the bug is in the definition (of "set") itself, and/or the fact that it's inconsistently applied. The preceding paragraph in the man page says

... ${name[@]} expands each element of name to a separate word. When there are no array members, ${name[@]} expands to nothing.

which is entirely consistent with what it says about the expansion of positional parameters in "$@". Not that there aren't other inconsistencies in the behaviors of arrays and positional parameters... but to me there's no hint that this detail should be inconsistent between the two.

Continuing,

$ arr=()
$ echo "${arr[@]}"
bash: arr[@]: unbound variable   # as we've observed.  BUT...
$ echo "${#arr[@]}"
0                                # no error
$ echo "${!arr[@]}" | cat -e
$                                # no error

So arr[] isn't so unbound that we can't get a count of its elements (0), or a (empty) list of its keys? To me these are sensible, and useful -- the only outlier seems to be the ${arr[@]} (and ${arr[*]}) expansion.


I am complementing on @ikegami's (accepted) and @kevinarpe's (also good) answers.

You can do "${arr[@]:+${arr[@]}}" to workaround the problem. The right-hand-side (i.e., after :+) provides an expression that will be used in case the left-hand-side is not defined/null.

The syntax is arcane. Note that the right hand side of the expression will undergo parameter expansion, so extra attention should be paid to having consistent quoting.

: example copy arr into arr_copy
arr=( "1 2" "3" )
arr_copy=( "${arr[@]:+${arr[@]}}" ) # good. same quoting. 
                                    # preserves spaces

arr_copy=( ${arr[@]:+"${arr[@]}"} ) # bad. quoting only on RHS.
                                    # copy will have ["1","2","3"],
                                    # instead of ["1 2", "3"]

Like @kevinarpe mentions, a less arcane syntax is to use the array slice notation ${arr[@]:0} (on Bash versions >= 4.4), which expands to all the parameters, starting from index 0. It also doesn't require as much repetition. This expansion works regardless of set -u, so you can use this at all times. The man page says (under Parameter Expansion):

  • ${parameter:offset}

  • ${parameter:offset:length}

    ... If parameter is an indexed array name subscripted by @ or *, the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one greater than the maximum index of the specified array. It is an expansion error if length evaluates to a number less than zero.

This is the example provided by @kevinarpe, with alternate formatting to place the output in evidence:

set -u
function count() { echo $# ; };
(
    count x y z
)
: prints "3"

(
    arr=()
    count "${arr[@]}"
)
: prints "-bash: arr[@]: unbound variable"

(
    arr=()
    count "${arr[@]:0}"
)
: prints "0"

(
    arr=(x y z)
    count "${arr[@]:0}"
)
: prints "3"

This behaviour varies with versions of Bash. You may also have noticed that the length operator ${#arr[@]} will always evaluate to 0 for empty arrays, regardless of set -u, without causing an 'unbound variable error'.


Here are a couple of ways to do something like this, one using sentinels and another using conditional appends:

#!/bin/bash
set -o nounset -o errexit -o pipefail
countArgs () { echo "$#"; }

arrA=( sentinel )
arrB=( sentinel "{1..5}" "./*" "with spaces" )
arrC=( sentinel '$PWD' )
cmnd=( countArgs "${arrA[@]:1}" "${arrB[@]:1}" "${arrC[@]:1}" )
echo "${cmnd[@]}"
"${cmnd[@]}"

arrA=( )
arrB=( "{1..5}" "./*"  "with spaces" )
arrC=( '$PWD' )
cmnd=( countArgs )
# Checks expansion of indices.
[[ ! ${!arrA[@]} ]] || cmnd+=( "${arrA[@]}" )
[[ ! ${!arrB[@]} ]] || cmnd+=( "${arrB[@]}" )
[[ ! ${!arrC[@]} ]] || cmnd+=( "${arrC[@]}" )
echo "${cmnd[@]}"
"${cmnd[@]}"


Now, as technically right the "${arr[@]+"${arr[@]}"}" version is, you never want to use this syntax for appending to an array, ever!

This is, as this syntax actually expands the array and then appends. And that means that there is a lot going on computational- and memory-wise!

To show this, I made a simple comparison:

# cat array_perftest_expansion.sh
#! /usr/bin/bash

set -e
set -u

loops=$1

arr=()
i=0

while [ $i -lt $loops ] ; do
        arr=( ${arr[@]+"${arr[@]}"} "${i}" )
        #arr=arr[${#arr[@]}]="${i}"

        i=$(( i + 1 ))
done

exit 0

And then:

# timex ./array_perftest_expansion.sh 1000

real           1.86
user           1.84
sys            0.01

But with the second line enabled instead, just setting the last entry directly:

arr=arr[${#arr[@]}]="${i}"



# timex ./array_perftest_last.sh 1000

real           0.03
user           0.02
sys            0.00

If that is not enough, things get much worse, when you try to add more entries!

When using 4000 instead of 1000 loops:

# timex ./array_perftest_expansion.sh 4000

real          33.13
user          32.90
sys            0.22

Just setting the last entry:

# timex ./array_perftest_last.sh 4000

real           0.10
user           0.09
sys            0.00

And this gets worse and worse ... I could not wait for the expansion version to finish a loop of 10000!

With the last element instead:

# timex ./array_perftest_last.sh 10000

real           0.26
user           0.25
sys            0.01

Never use such an array expansion for any reason.


Interesting inconsistency; this lets you define something which is "not considered set" yet shows up in the output of declare -p

arr=()
set -o nounset
echo ${arr[@]}
 =>  -bash: arr[@]: unbound variable
declare -p arr
 =>  declare -a arr='()'

UPDATE: as others mentioned, fixed in 4.4 released after this answer was posted.


The most simple and compatible way seems to be:

$ set -u
$ arr=()
$ echo "foo: '${arr[@]-}'"
0

精彩评论

暂无评论...
验证码 换一张
取 消