Topic: Performance Difference in CPU and GPU in MATALB
Replies: 2   Last Post: Nov 9, 2012 7:07 AM

 Edric Ellis Posts: 721 Registered: 12/7/04
Re: Performance Difference in CPU and GPU in MATALB
Posted: Nov 9, 2012 4:21 AM

"Jerome " <the_rome@hotmail.com> writes:

> I have invoked a cuda kernel from my MATLAB implementation; however my
> CPU results are faster than my gpu implementation.
>
> The results are:
>
> CPU: 0.000006
> GPU: 0.00134
> My kernel and MATLAB code is below:
>
>
> matrix.cu
>
> __global__ void matrix_mult2(double *A, double *B, double * C) {
> int x = blockIdx.x * blockDim.x + threadIdx.x;
>
> C[x] = A[x] * B[x];
>
>
> }
>
>
>
> main.m
> kernel = parallel.gpu.CUDAKernel( 'matrix_mult2.ptx', ...
> 'matrix_mult2.cu' );
>
>
> kernel.GridSize = [1,1];
>
>
> A = parallel.gpu.GPUArray.rand(5,5,'double');
> B = parallel.gpu.GPUArray.rand(5,5,'double');
> C = parallel.gpu.GPUArray.zeros(5,5);
>
> C = feval(kernel,A,B,C);

Firstly, to get accurate timing information when running stuff on the
GPU, you need to add "wait(gpuDevice)" to ensure that everything has
finished running there.

Secondly, there is a fixed overhead to getting through to launching a
kernel on the GPU, which explains why things don't speed up until you
get to relatively large data sizes.

To evaluate GPU performance for a kernel as simple as this one, you
should compare your measured throughput (i.e. achieved bandwidth) with
the theoretical maximum for your device. For a kernel as simple as this,
you should get close to the peak achievable bandwidth for your device,
probably when numel(A) is around 1e5 or thereabouts.

Cheers,

Edric.

