开发者

python(or numpy) equivalent of match in R

开发者 https://www.devze.com 2023-01-23 23:09 出处:网络
Is there any easy way in python to accomplish what the match function does in R? what match in R does is that it returns a vector of the positions of (first) matches of its first argument in its seco

Is there any easy way in python to accomplish what the match function does in R? what match in R does is that it returns a vector of the positions of (first) matches of its first argument in its second.

For example, the following R snippet.

> a <- c(5,4,3,2,1)
> b <- c(2,3)
> match(a,b)
[开发者_JAVA技巧1] NA NA  2  1 NA

Translate that in python, what I am looking for is a function that does the following

>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> match(a,b)
[None, None, 2, 1, None]

Thank you!


>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> [ b.index(x) if x in b else None for x in a ]
[None, None, 1, 0, None]

Add 1 if you really need position "one based" instead of "zero based".

>>> [ b.index(x)+1 if x in b else None for x in a ]
[None, None, 2, 1, None]

You can make this one-liner reusable if you are going to repeat it a lot:

>>> match = lambda a, b: [ b.index(x)+1 if x in b else None for x in a ]
>>> match
<function <lambda> at 0x04E77B70>
>>> match(a, b)
[None, None, 2, 1, None]


A faster approach building on Paulo Scardine's answer (difference becomes more meaningful as the size of the arrays increases). If you don't mind losing the one-liner:

from typing import Hashable, List


def match_list(a: List[Hashable], b: List[Hashable]) -> List[int]:
    return [b.index(x) if x in b else None for x in a]


def match(a: List[Hashable], b: List[Hashable]) -> List[int]:
    b_dict = {x: i for i, x in enumerate(b)}
    return [b_dict.get(x, None) for x in a]


import random

a = [random.randint(0, 100) for _ in range(10000)]
b = [i for i in range(100) if i % 2 == 0]


%timeit match(a, b)
>>> 580 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit match_list(a, b)
>>> 6.13 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

match(a, b) == match_list(a, b)
>>> True


one can accomplish the match functionality of R in python and return the matched indices as a dataframe index(useful for further subsetting) as

import numpy as np
import pandas as pd
def match(ser1, ser2):
"""
return index of ser2 matching elements of ser1(or return np.nan)
equivalent to match function of R
"""
idx=[ser2.index[ser2==ser1[i]].to_list()[0] if ser1.isin(ser2)[i] == True else np.nan for i in range(len(ser1))]
return (pd.Index(idx))
0

精彩评论

暂无评论...
验证码 换一张
取 消