Convolution along one axis only_问答_开发者_运维开发者技术经验分享

I have two 2-D arrays with the same first axis dimensions. In python, I would like to convolve the two matrices along the second axis only. I would like to get C below without computing the convolution along the first axis as well.

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)

C = sg.convolve(A, B, 'full')[(2*M-1)/2]

Is there a fast way开发者_开发百科?

You can use np.apply_along_axis to apply np.convolve along the desired axis. Here is an example of applying a boxcar filter to a 2d array:

import numpy as np

a = np.arange(10)
a = np.vstack((a,a)).T

filt = np.ones(3)

np.apply_along_axis(lambda m: np.convolve(m, filt, mode='full'), axis=0, arr=a)

This is an easy way to generalize many functions that don't have an axis argument.

With ndimage.convolve1d, you can specify the axis...

np.apply_along_axis won't really help you, because you're trying to iterate over two arrays. Effectively, you'd have to use a loop, as described here.

Now, loops are fine if your arrays are small, but if N and P are large, then you probably want to use FFT to convolve instead.

However, you need to appropriately zero pad your arrays first, so that your "full" convolution has the expected shape:

M, N, P = 4, 10, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)

A_ = np.zeros((M, N+P-1), dtype=A.dtype)
A_[:, :N] = A
B_ = np.zeros((M, N+P-1), dtype=B.dtype)
B_[:, :P] = B

A_fft = np.fft.fft(A_, axis=1)
B_fft = np.fft.fft(B_, axis=1)
C_fft = A_fft * B_fft

C = np.real(np.fft.ifft(C_fft))

# Test
C_test = np.zeros((M, N+P-1))
for i in range(M):
    C_test[i, :] = np.convolve(A[i, :], B[i, :], 'full')

assert np.allclose(C, C_test)

for 2D arrays, the function scipy.signal.convolve2d is faster and scipy.signal.fftconvolve can be even faster (depending on the dimensions of the arrays):

Here the same code with N = 100000

import time    
import numpy as np
import scipy.signal as sg

M, N, P = 10, 100000, 20
A = np.random.randn(M, N)
B = np.random.randn(M, P)

T1 = time.time()
C = sg.convolve(A, B, 'full')
print(time.time()-T1)

T1 = time.time()
C_2d = sg.convolve2d(A, B, 'full')
print(time.time()-T1)

T1 = time.time()
C_fft = sg.fftconvolve(A, B, 'full')
print(time.time()-T1)

>>> 12.3
>>> 2.1
>>> 0.6

Answers are all the same with slight differences due to different computation methods used (e.g., fft vs direct multiplication, but i don't know what exaclty convolve2d uses):

print(np.max(np.abs(C - C_2d)))
>>>7.81597009336e-14

print(np.max(np.abs(C - C_fft)))
>>>1.84741111298e-13

Late answer, but worth posting for reference. Quoting from comments of the OP:

Each row in A is being filtered by the corresponding row in B. I could implement it like that, just thought there might be a faster way.

A is on the order of 10s of gigabytes in size and I use overlap-add.

Naive / Straightforward Approach

import numpy as np
import scipy.signal as sg

M, N, P = 4, 10, 20
A = np.random.randn(M, N) # (4, 10)
B = np.random.randn(M, P) # (4, 20)

C = np.vstack([sg.convolve(a, b, 'full') for a, b in zip(A, B)])

>>> C.shape
(4, 29)

Each row in A is convolved with each respective row in B, essentially convolving M 1D arrays/vectors.

No Loop + CUDA Supported Version

It is possible to replicate this operation by using PyTorch's F.conv1d. We have to imagine A as a 4-channel, 1D signal of length 10. We wish to convolve each channel in A with a specific kernel of length 20. This is a special case called a depthwise convolution, often used in deep learning.

Note that torch's conv is implemented as cross-correlation, so we need to flip B in advance to do actual convolution.

import torch
import torch.nn.functional as F

@torch.no_grad()
def torch_conv(A, B):
    M, N, P = A.shape[0], A.shape[1], B.shape[1]
    C = F.conv1d(A, B[:, None, :], bias=None, stride=1, groups=M, padding=N+(P-1)//2)
    return C.numpy()

# Convert A and B to torch tensors + flip B

X = torch.from_numpy(A) # (4, 10)
W = torch.from_numpy(np.fliplr(B).copy()) # (4, 20)

# Do grouped conv and get np array
Y = torch_conv(X, W)

>>> Y.shape
(4, 29)

>>> np.allclose(C, Y)
True

Advantages of using a depthwise convolution with torch:

No loops!
The above solution can also run on CUDA/GPU, which can really speed things up if A and B are very large matrices. (From OP's comment, this seems to be the case: A is 10GB in size.)

Disadvantages: