When I study convolutional layers, I implement im2col_2
by myself to handle the operation of convolution. im2col_1
is a different implementation which seems less intuitional. im2col_1
is pixel by pixel and img2col_2
is stride by stride. I do some tests, which show that im2col_1
is usually faster. What makes it perform better? What are the key differences?
def img2col_1(data, f_h, f_w, stride, pad):
N, C, H, W = data.shape
out_h = (H + 2*pad - f_h)//stride + 1
out_w = (W + 2*pad - f_w)//stride + 1
img = np.pad(data, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant')
col = np.zeros([N, C, f_h, f_w, out_h, out_w])
for v in range(f_h):
for u in range(f_w):
v_max = v + out_h*stride
u_max = u + out_w*stride
col[:, :, v, u, :, :] = img[:, :, v:v_max:stride, u:u_max:stride]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
return col
def img2col_2(data, f_h, f_w, stride, pad):
N, C, H, W = data.shape
out_h = (H + 2*pad - f_h)//stride + 1
out_w = (W + 2*pad - f_w)//stride + 1
img = np.pad(data, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant')
col = np.zeros([N, C, f_h, f_w, out_h, out_w])
for u in range(out_w):
for v in range(out_h):
u_min = u * stride
v_min = v * stride
col[..., v, u] = img[..., v_min:v_min+f_h, u_min:u_min+f_w]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
return col