"Matt J" wrote in message <email@example.com>...
> > Maybe, but only because there are some funny sub-optimal things happening inside conv2 implementation-wise, and because the interpolation kernel here happens to be small. There's no way the 3rd version below should be the slowest.
Of course. The reason is obvious: 2+2 = 2*2, but 10+10 << 10*10.
Bilinear interpolation - as we discussed here - requires kernel of 2. So there is no need to decompose the convolution in tensorial at the price of memory parsing twice.