
deep learning - Why does one img2col implementation perform better than the other? - Stack Overflow


When I study convolutional layers, I implement im2col_2 by myself to handle the operation of convolution. im2col_1 is a different implementation which seems less intuitional. im2col_1 is pixel by pixel and img2col_2 is stride by stride. I do some tests, which show that im2col_1 is usually faster. What makes it perform better? What are the key differences?

def img2col_1(data, f_h, f_w, stride, pad):
    N, C, H, W = data.shape
    out_h = (H + 2*pad - f_h)//stride + 1
    out_w = (W + 2*pad - f_w)//stride + 1
    img = np.pad(data, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant')
    col = np.zeros([N, C, f_h, f_w, out_h, out_w])
    for v in range(f_h):
        for u in range(f_w):
            v_max = v + out_h*stride
            u_max = u + out_w*stride
            col[:, :, v, u, :, :] = img[:, :, v:v_max:stride, u:u_max:stride]
    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    return col
def img2col_2(data, f_h, f_w, stride, pad):
    N, C, H, W = data.shape
    out_h = (H + 2*pad - f_h)//stride + 1
    out_w = (W + 2*pad - f_w)//stride + 1
    img = np.pad(data, [(0,0), (0,0), (pad,pad), (pad,pad)], 'constant')
    col = np.zeros([N, C, f_h, f_w, out_h, out_w])

    for u in range(out_w):
        for v in range(out_h):
            u_min = u * stride
            v_min = v * stride 
            col[..., v, u] = img[..., v_min:v_min+f_h, u_min:u_min+f_w]
    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    return col


  1. 暂无评论