Create a new array from numpy array based on the conditions from a list_问答_开发者

Create a new array from numpy array based on the conditions from a list

开发者 https://www.devze.com 2023-01-14 00:26 出处：网络

Suppose that I have an array defined by: data = np.array([(\'a1v1\', \'a2v1\', \'a3v1\', \'a4v1\', \'a5v1\'),

Suppose that I have an array defined by:

data = np.array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v1', 'a3v1', 'a4v2', 'a5v1'),
       ('a1v3', 'a2v1', 'a3v1', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v1', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v3', 'a3v2', 'a4v2', 'a5v1'),
       ('a1v3', 'a2v3', 'a3v2', 'a4v2', 'a5v2'),
       ('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v1', 'a2v2', 'a3v2', 'a4v2', 'a5v2'),
       ('a1v3', 'a2v2', 'a3v1', 'a4v2', 'a5v2'),
       ('a1v3', 'a2v1', 'a3v2', 'a4v1', 'a5v2'),
       ('a1v2', 'a2v2', 'a3v1', 'a4v开发者_StackOverflow中文版2', 'a5v1')],
      dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'),
             ('a4', '|S4'), ('a5', '|S4')])

How to create a function to list out data elements by row with conditions given in a list of tuples, r.

r = [('a1', 'a1v1'), ('a4', 'a4v1')]

I know that it can be done manually like this:

data[(data['a1']=='a1v1') & data['a4']=='a4v1']

What about removing rows from data that comply with the r.

data[(data['a1']!='a1v1') | data['a4']!='a4v1']

Thanks.

If I'm understanding you correctly, you want to list the entire row, where a given tuple of columns is equal to some value. In that case, this should be what you want, though it's a bit verbose and obscure:

test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)
data[test_cols == test_vals]

Note the "nested list" style indexing... That's the easiest way to select multiple columns of a structured array. E.g.

data[['a1', 'a4']]

will yield

array([('a1v1', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v1'),
       ('a1v2', 'a4v1'), ('a1v2', 'a4v1'), ('a1v2', 'a4v2'),
       ('a1v3', 'a4v2'), ('a1v1', 'a4v1'), ('a1v1', 'a4v1'),
       ('a1v2', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v2'),
       ('a1v3', 'a4v1'), ('a1v2', 'a4v2')], 
      dtype=[('a1', '|S4'), ('a4', '|S4')])

You can then test this agains a tuple of the values that you're checking for and get a one-dimensional boolean array where those columns are equal to those values.

However, with structured arrays, the dtype has to be an exact match. E.g. data[['a1', 'a4']] == ('a1v1', 'a4v1') just yields False, so we have to make an array of the values we want to test using the same dtype as the columns we're testing against. Thus, we have to do something like:

test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)

before we can do this:

data[test_cols == test_vals]

Which yields what we were originally after:

array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
       ('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2')], 
      dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'), ('a4', '|S4'), ('a5', '|S4')])

Hope that makes some sense, anyway...