The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Software » comp.soft-sys.matlab

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Which fft package does matlab use?
Replies: 4   Last Post: Apr 1, 2013 11:09 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 54
Registered: 5/19/10
Which fft package does matlab use?
Posted: Mar 28, 2013 7:27 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I am trying to write (or find code for) a very high performance fft routine (I am specifically interested in the N=256 case).

After benchmarking MATLAB's fft routine I noticed that it runs slower than I would have expected. Using the methodology of fftw webpage (, and counting flops as 5*N*log2(N) for a complex fft of length power of 2, I benchmarked matlab's fft vs matlab's matrix multiplication as follows:

N=64, GFlopsMM=1.21, GFlopsFFT=2.82
N=128, GFlopsMM=3.99, GFlopsFFT=2.76
N=256, GFlopsMM=7.11, GFlopsFFT=2.89
N=512, GFlopsMM=7.05, GFlopsFFT=2.04
N=1024, GFlopsMM=7.61, GFlopsFFT=2.16
N=2048, GFlopsMM=7.64, GFlopsFFT=2.25

For N>=256, the fft is about 3x slower than matrix multiplication. Does anyone know what package MATLAB uses for fft? does it use fftw, intel mkl or an internal package? Has anyone gotten noticable (factor of 2 or more) improvements using other fft packages within a mex file? - it seems like this should be possible given the above experiment (my cpu is 2.25ghz with support for sse instructions (2 flops in parallel) and 2fpu ports; I calculate the theoretical optimum performance at 2.25*2*2 = 9GFlops).

% code for generating benchmark of fft vs matrix multiplication
GFlopsMM = nan(1,numel(Ns));
GFlopsFFT = nan(1,numel(Ns));
for s=1:numel(Ns)
N = Ns(s);
A=rand(N,N); B=rand(N,N);

NIter = Ns(end)/N; % run more iterations for smaller N for more accurate timing
tic; for j=1:NIter, A'*B; end; t = toc;

A = A+sqrt(-1)*B; % benchmark complex fft
tic; for j=1:NIter fft(A); end; t2=toc;

GFlopsMM(s) = NIter*2*size(A,1)^3/1e9/t;
GFlopsFFT(s) = NIter*5*N*log2(N)*N/1e9/t2; % cost of N ffts of length N
fprintf('N=%d, GFlopsMM=%3.3g, GFlopsFFT=%3.3g\n',N,GFlopsMM(s),GFlopsFFT(s));

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.