开发者

Merging a list of numpy arrays into one array (fast)

开发者 https://www.devze.com 2023-03-06 17:25 出处:网络
what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?

what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?

I tried two approaches:

  • merged_array = array(list_of_arrays) from Pythonic way to create a numpy array from a list of numpy arrays and

  • vstack

A you can see vstack is faster, but for some reason the first run takes three times longer than the second. I assume this caused by (missing) preallocation. So how would I preallocate an array for vstack? Or do you know a faster methode?

Thanks!

[UPDATE]

I want (25280, 320) not (80, 320, 320) which means, merged_array = array(list_of_arrays) wont work for me. Thanks Joris for pointing that out!!!

Output:

0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(seco开发者_JAVA技巧nd_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second

Code:

import numpy
import time
width = 320
height = 320
n_matrices=80

secondmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    secondmatrices.append(numpy.round(temp*9))

firstmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    firstmatrices.append(numpy.round(temp*9))


t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"

t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"

t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
    first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"

t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
    second2 = numpy.vstack((secondmatrices.pop(),second2))

print time.time() - t1, "s vstack second"


You have 80 arrays 320x320? So you probably want to use dstack:

first3 = numpy.dstack(firstmatrices)

This returns one 80x320x320 array just like numpy.array(firstmatrices) does:

timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop


timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop

If you want to use vstack, it will return a 25600x320 array:

timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop
0

精彩评论

暂无评论...
验证码 换一张
取 消