(1) benchmarking without multithreading - in windows task manager right click matlab and set processor affinity to only 1 cpu (2) what does matlab use if not N^3 matrix multiplication? (3) I am benchmarking number of useful operations (flops for the actual computation). Memory access and programming inneficiencies lead to deterioration from the optimum - I am paying close attention to memory access and L0 (register level caching) in my implementation. The large gap between the theoretical optimum and the empirical runtime suggested to me that improvements in the implementation can be made.
"Bruno Luong" <email@example.com> wrote in message <firstname.lastname@example.org>... > > GFlopsMM(s) = NIter*2*size(A,1)^3/1e9/t; > > GFlopsFFT(s) = NIter*5*N*log2(N)*N/1e9/t2; > > So you assume that: > > (1) the computer does not use any multi-threading during MM or FFT? > (2) MATLAB matrix multiplication is O(n^3) (which is not the optimal)? > (3) memory access data are negligible (usually not right)? > > Bruno