I've recently run into issues when creating Numpy object arrays using e.g.
a = np.开发者_JAVA百科array([c], dtype=np.object)
where c is an instance of some complicated class, and in some cases Numpy tries to access some methods of that class. However, doing:
a = np.empty((1,), dtype=np.object)
a[0] = c
solves the issue. I'm curious as to what the difference is between these two internally. Why in the first case might Numpy try and access some attributes or methods of c
?
EDIT: For the record, here is example code that demonstrates the issue:
import numpy as np
class Thing(object):
def __getitem__(self, item):
print "in getitem"
def __len__(self):
return 1
a = np.array([Thing()], dtype='object')
This prints out getitem
twice. Basically if __len__
is present in the class, then this is when one can run into unexpected behavior.
In the first case a = np.array([c], dtype=np.object)
, numpy knows nothing about the shape of the intended array.
For example, when you define
d = range(10)
a = np.array([d])
Then you expect numpy to determine the shape based on the length of d
.
So similarly in your case, numpy will attempt to see if len(c)
is defined, and if it is, to access the elements of c
via c[i]
.
You can see the effect by defining a class such as
class X(object):
def __len__(self): return 10
def __getitem__(self, i): return "x" * i
Then
print numpy.array([X()], dtype=object)
produces
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]
In contrast, in your second case
a = np.empty((1,), dtype=np.object)
a[0] = c
Then the shape of a
has already been determined. Thus numpy can just directly assign the object.
However to an extent this is true only since a
is a vector. If it had been defined with a different shape then method accesses will still occur. The following for example will still call ___getitem__
on a class
a = numpy.empty((1, 10), dtype=object)
a[0] = X()
print a
returns
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]
精彩评论