The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Which fft package does matlab use?
Replies: 3   Last Post: Apr 1, 2013 6:03 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 54
Registered: 5/19/10
Re: Which fft package does matlab use?
Posted: Apr 1, 2013 6:03 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I am taking the fft of each colomn of the NxN matrix rather than a single Nx1 vector to get more accurate timings.

Calling Nx1 fft N times would be slow in matlab (calling fft with an NxN matrix does this internally). Furthermore, if there are optimizations that can be done because we are computing the same length fft N times (pre-computing the twiddle factors for example), passing in the NxN matrix allows fftw to take advantage of that if it chooses.

I am taking this factor of N into account in my computations of the number of flops (see code).

I wrote my own C++ code to do the fft on powers of 2 (highly optimized using sse instructions and efficient memory access patterns) and noticed about a 1.5-1.7x speedup over MATLAB's fft (for N in 128 to 1024 range), achieving about 4.5GFlops on my cpu (compare to above experiment where matlabs fft achieves 3GFlops at peak). I am working on an assembly implementation for x64 windows right now. It seems that fftw isn't as optimized as it could be (unless the interfacing with MATLAB is innefficient). I also noticed that fftw was written in c++ while most good matrix multiplication routines are written in assembly and achieve much closer to the theoretical optimum flop count.

Does anyone have experience with another fft library (like intel MKL) that shows significant improvements over fftw?

dpb <> wrote in message <kj4mpe$7me$>...
> On 3/29/2013 12:40 PM, dpb wrote:
> ...

> > Your above A array on which you compute the fft is a 2D array of NxN
> > which is an N/2-length column-wise complex by N columns.
> >
> > The ML vectorized FFT routine computes a separate FFT for each column
> > when passed an array. So, you have computed N complex FFTs each of
> > length N/2 complex values.

> ...
> Take it back...wasn't thinking clearly. After the sum it _is_ N complex
> in length but is still N wide. You want only
> A=complex(A(:,1),B(:,1));
> to be of the proper size for FFT to be consistent w/ the link timings.
> Although I didn't look at the actual source code of the benchmark, their
> verbiage indicates they actually compute on a zero-ed out array every
> time, though.
> --

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2017. All Rights Reserved.