开发者

Best way to handle list.index(might-not-exist) in python?

开发者 https://www.devze.com 2022-12-18 12:25 出处:网络
I have code which looks something like this: thing_index = thing_list.index(thing) otherfunction(thing_list, thing_index)

I have code which looks something like this:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

ok so that's simplified but you get the idea. Now thing might not actually be in the list, in which case I want to pass -1 as thing_index. In other languages this is what you'd expect index() to return if it couldn't find the element. In fact it throws a ValueError.

I could do this:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

But this feels dirty, plus I don't know if ValueError could be raised for some other reason. I came up with the following solution based on generator functions, but it seems a little complex:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

Is there a cleaner way to achieve the same thing? Let's assume the list 开发者_C百科isn't sorted.


There is nothing "dirty" about using try-except clause. This is the pythonic way. ValueError will be raised by the .index method only, because it's the only code you have there!

To answer the comment:
In Python, easier to ask forgiveness than to get permission philosophy is well established, and no index will not raise this type of error for any other issues. Not that I can think of any.


thing_index = thing_list.index(elem) if elem in thing_list else -1

One line. Simple. No exceptions.


The dict type has a get function, where if the key doesn't exist in the dictionary, the 2nd argument to get is the value that it should return. Similarly there is setdefault, which returns the value in the dict if the key exists, otherwise it sets the value according to your default parameter and then returns your default parameter.

You could extend the list type to have a getindexdefault method.

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

Which could then be used like:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )


If you are doing this often then it is better to stow it away in a helper function:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 


There is nothing wrong with your code that uses ValueError. Here's yet another one-liner if you'd like to avoid exceptions:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)


What about this:

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4: 3, 5: 4}
li.get(20) # None 
li.get(1)  # 0 


This issue is one of language philosophy. In Java for example there has always been a tradition that exceptions should really only be used in "exceptional circumstances" that is when errors have happened, rather than for flow control. In the beginning this was for performance reasons as Java exceptions were slow but now this has become the accepted style.

In contrast Python has always used exceptions to indicate normal program flow, like raising a ValueError as we are discussing here. There is nothing "dirty" about this in Python style and there are many more where that came from. An even more common example is StopIteration exception which is raised by an iterator‘s next() method to signal that there are no further values.


What about this:

otherfunction(thing_collection, thing)

Rather than expose something so implementation-dependent like a list index in a function interface, pass the collection and the thing and let otherfunction deal with the "test for membership" issues. If otherfunction is written to be collection-type-agnostic, then it would probably start with:

if thing in thing_collection:
    ... proceed with operation on thing

which will work if thing_collection is a list, tuple, set, or dict.

This is possibly clearer than:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

which is the code you already have in otherfunction.


I'd suggest:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)


What about like this:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1


It's been quite some time but it's a core part of the stdlib and has dozens of potential methods so I think it's useful to have some benchmarks for the different suggestions and include the numpy method which can be by far the fastest.

import random
from timeit import timeit
import numpy as np

l = [random.random() for i in range(10**4)]
l[10**4 - 100] = 5

# method 1
def fun1(l:list, x:int, e = -1) -> int:
    return [[i for i,elem in enumerate(l) if elem == x] or [e]][0]

# method 2
def fun2(l:list, x:int, e = -1) -> int:
    for i,elem in enumerate(l):
        if elem == x:
            return i
    else:
        return e

# method 3
def fun3(l:list, x:int, e = -1) -> int:
    try:
        idx = l.index(x)
    except ValueError:
        idx = e
    return idx

# method 4
def fun4(l:list, x:int, e = -1) -> int:
    return l.index(x) if x in l else e

l2 = np.array(l)
# method 5
def fun5(l:list or np.ndarray, x:int, e = -1) -> int:
    res = np.where(np.equal(l, x))
    if res[0].any():
        return res[0][0]
    else:        
        return e


if __name__ == "__main__":
    print("Method 1:")
    print(timeit(stmt = "fun1(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 2:")
    print(timeit(stmt = "fun2(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 3:")
    print(timeit(stmt = "fun3(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 4:")
    print(timeit(stmt = "fun4(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 5, numpy given list:")
    print(timeit(stmt = "fun5(l, 5)", number = 1000, globals = globals()))
    print("")
    print("Method 6, numpy given np.ndarray:")
    print(timeit(stmt = "fun5(l2, 5)", number = 1000, globals = globals()))
    print("")

When run as main, this results in the following printout on my machine indicating time in seconds to complete 1000 trials of each function:

Method 1: 0.7502102799990098

Method 2: 0.7291318440002215

Method 3: 0.24142152300009911

Method 4: 0.5253471979995084

Method 5, numpy given list: 0.5045417560013448

Method 6, numpy given np.ndarray: 0.011147511999297421

Of course the question asks specifically about lists so the best solution is to use the try-except method, however the speed improvements (at least 20x here compared to try-except) offered by using the numpy data structures and operators instead of python data structures is significant and if building something on many arrays of data that is performance critical then the author should try to use numpy throughout to take advantage of the superfast C bindings. (CPython interpreter, other interpreter performances may vary)

Btw, the reason Method 5 is much slower than Method 6 is because numpy first has to convert the given list to it's own numpy array, so giving it a list doesn't break it it just doesn't fully utilise the speed possible.


I'll be blunt: the answers here are very bad, and have insane time complexity.

Here's a simple way.

With dict().get('key', 'some_value'), the value at 'key' will be returned, and if the key is not in the dictionary, 'some_value' will be returned.

You can create such a dictionary with your list and its indices.

mylist = ['cat' 'dog', 'bunny']

mapping = {value: index for index, value in enumerate(mylist)}

Then, mapping.get('key', 0) will return the index if found, or None.

mapping.get('penguin', 0)  # returns 0


I have the same issue with the ".index()" method on lists. I have no issue with the fact that it throws an exception but I strongly disagree with the fact that it's a non-descriptive ValueError. I could understand if it would've been an IndexError, though.

I can see why returning "-1" would be an issue too because it's a valid index in Python. But realistically, I never expect a ".index()" method to return a negative number.

Here goes a one liner (ok, it's a rather long line ...), goes through the list exactly once and returns "None" if the item isn't found. It would be trivial to rewrite it to return -1, should you so desire.

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

How to use:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>


Comparison of implementations

Simple comparison on Python 3.8

TL;DR maybeidx2 is faster in general except for arrays (n<100) with lots of misses

def maybeidx1(l, v):
    return l.index(v) if v in l else None

def maybeidx2(l, v):
    try:
        return l.index(v)
    except ValueError:
        return None

Test cases:

a = [*range(100_000)]
# Case 1: index in list
maybeidx1(a, 50_000)
Out[20]: 50000
maybeidx2(a, 50_000)
Out[21]: 50000
# Case 2: index not in list
maybeidx1(a, 100_000) is None
Out[23]: True
maybeidx2(a, 100_000) is None
Out[24]: True

Timing case 1

%timeit maybeidx1(a, 50_000)
1.06 ms ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit maybeidx2(a, 50_000)
530 µs ± 8.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Timing case 2

%timeit maybeidx1(a, 100_000)
1.07 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit maybeidx2(a, 100_000)
1.07 ms ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Result

Use the maybeidx2 method for larger arrays. This is faster due to maybeidx1 having two scans of the array in search of the value - this is still O(n) time but has a constant multiplier of 2 and thus slower in practice. This holds in the case of the value being present in the list. When the value is not present, these times will be roughly equal; they both have to scan the full array exactly once, then return None. The overhead of the try-except is negligible, even with an array size of 10 - unless case two occurs. Then the try-except overhead is noticeable. Example:

a = [*range(10)]
%timeit maybeidx1(a, 10)
191 ns ± 2.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit maybeidx2(a, 10)
566 ns ± 5.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This overhead becomes negligible (on my machine) when a has above 100 elements.


Since Python 3.6, there are find and rfind methods for index and rindex which return -1 instead of throwing exception.


I don't know why you should think it is dirty... because of the exception? if you want a oneliner, here it is:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

but i would advise against using it; I think Ross Rogers solution is the best, use an object to encapsulate your desiderd behaviour, don't try pushing the language to its limits at the cost of readability.

0

精彩评论

暂无评论...
验证码 换一张
取 消