Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Which fft package does matlab use?
Replies: 4   Last Post: Apr 1, 2013 11:09 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
NIcholas

Posts: 54
Registered: 5/19/10
Which fft package does matlab use?
Posted: Mar 28, 2013 7:27 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I am trying to write (or find code for) a very high performance fft routine (I am specifically interested in the N=256 case).

After benchmarking MATLAB's fft routine I noticed that it runs slower than I would have expected. Using the methodology of fftw webpage (http://www.fftw.org/speed/), and counting flops as 5*N*log2(N) for a complex fft of length power of 2, I benchmarked matlab's fft vs matlab's matrix multiplication as follows:

N=64, GFlopsMM=1.21, GFlopsFFT=2.82
N=128, GFlopsMM=3.99, GFlopsFFT=2.76
N=256, GFlopsMM=7.11, GFlopsFFT=2.89
N=512, GFlopsMM=7.05, GFlopsFFT=2.04
N=1024, GFlopsMM=7.61, GFlopsFFT=2.16
N=2048, GFlopsMM=7.64, GFlopsFFT=2.25

For N>=256, the fft is about 3x slower than matrix multiplication. Does anyone know what package MATLAB uses for fft? does it use fftw, intel mkl or an internal package? Has anyone gotten noticable (factor of 2 or more) improvements using other fft packages within a mex file? - it seems like this should be possible given the above experiment (my cpu is 2.25ghz with support for sse instructions (2 flops in parallel) and 2fpu ports; I calculate the theoretical optimum performance at 2.25*2*2 = 9GFlops).

% code for generating benchmark of fft vs matrix multiplication
Ns=2.^(6:11);
GFlopsMM = nan(1,numel(Ns));
GFlopsFFT = nan(1,numel(Ns));
for s=1:numel(Ns)
N = Ns(s);
A=rand(N,N); B=rand(N,N);

NIter = Ns(end)/N; % run more iterations for smaller N for more accurate timing
tic; for j=1:NIter, A'*B; end; t = toc;

A = A+sqrt(-1)*B; % benchmark complex fft
tic; for j=1:NIter fft(A); end; t2=toc;

GFlopsMM(s) = NIter*2*size(A,1)^3/1e9/t;
GFlopsFFT(s) = NIter*5*N*log2(N)*N/1e9/t2; % cost of N ffts of length N
fprintf('N=%d, GFlopsMM=%3.3g, GFlopsFFT=%3.3g\n',N,GFlopsMM(s),GFlopsFFT(s));
end



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.