We've got a set of recarrays of data for individual days - the first attribute is a timestamp and the rest are values.
Several of these:
ts a b c
2010-08-06 08:00, 1.2, 3.4, 5.6
2010-08-06 08:05, 1.2, 3.4, 5.6
2010-08-06 08:10, 1.2, 3.4, 5.6
2010-08-06 08:15, 2.2, 3.3, 5.6
2010-08-06 08:20, 1.2, 3.4, 5.6
We'd like to produce an arr开发者_StackOverflow社区ay of the averages of each of the values (as if you laid all of the day data on top of each other, and averaged all of the values that line up). The timestamp times all match up, so we can do it by creating a result recarray with the timestamps, and the other columns all 0s, then doing something like:
for day in day_data:
result.a += day.a
result.b += day.b
result.c += day.c
result.a /= len(day_data)
result.b /= len(day_data)
result.c /= len(day_data)
It seems like a better way would be to convert each day to a 2d array with just the numbers (lopping off the timestamps), then average them all element-wise in one operation, but we can't find a way to do this - it's always a 1d array of objects.
Does anyone know how to do this?
There are several ways to do this. One way is to select multiple columns of the recarray and cast them as floats, then reshape back into a 2D array:
new_data = data[['a','b','c']].astype(np.float).reshape((data.size, 3))
Alternatively, you might consider something like this (negligibly slower, but more readable):
new_data = np.vstack([data[item] for item in ['a','b','c']]).T
Also note that it might be a good idea to look into pandas for operations such as these so that you can easily work with heterogeneous data.
精彩评论