Note: The most recent cuDNN distribution will be obtained automatically by ... is not supported for the specified algorithm on your GPU. A value -means that this convolution not supported for the ...
c2.cu optimizes the convolution by loading chunks of the input tensor into shared memory to reduce global memory access latency. Shared memory is faster but limited in size, so we use tiling to divide ...