> Probably better than actually shifting columns would be to simply keep a
> pointer to the next column to use modulo Ncols instead.

Why it is better?

Whereas the array is rearranged later with modulo, or at the begining without, the data in the memory get moved by the same amount.