"Bruno Luong" <email@example.com> wrote in message <firstname.lastname@example.org>... > "Matt J" wrote in message <email@example.com>... > > > > > Maybe, but only because there are some funny sub-optimal things happening inside conv2 implementation-wise, and because the interpolation kernel here happens to be small. There's no way the 3rd version below should be the slowest. > > Of course. The reason is obvious: 2+2 = 2*2, but 10+10 << 10*10. ============
That doesn't explain why the 3rd version was the slowest. The 3rd version uses a 10+10 tensorial operation so since 10+10 << 10*10, you would expect the 3rd version to be faster (or comparable to) the others.