Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Performance Difference in CPU and GPU in MATALB
Replies: 2   Last Post: Nov 9, 2012 7:07 AM

 Messages: [ Previous | Next ]
 Edric Ellis Posts: 721 Registered: 12/7/04
Re: Performance Difference in CPU and GPU in MATALB
Posted: Nov 9, 2012 4:21 AM

"Jerome " <the_rome@hotmail.com> writes:

> I have invoked a cuda kernel from my MATLAB implementation; however my
> CPU results are faster than my gpu implementation.
>
> The results are:
>
> CPU: 0.000006
> GPU: 0.00134
> My kernel and MATLAB code is below:
>
>
> matrix.cu
>
> __global__ void matrix_mult2(double *A, double *B, double * C) {
> int x = blockIdx.x * blockDim.x + threadIdx.x;
>
> C[x] = A[x] * B[x];
>
>
> }
>
>
>
> main.m
> kernel = parallel.gpu.CUDAKernel( 'matrix_mult2.ptx', ...
> 'matrix_mult2.cu' );
>
>
> kernel.GridSize = [1,1];
>
>
> A = parallel.gpu.GPUArray.rand(5,5,'double');
> B = parallel.gpu.GPUArray.rand(5,5,'double');
> C = parallel.gpu.GPUArray.zeros(5,5);
>
> C = feval(kernel,A,B,C);

Firstly, to get accurate timing information when running stuff on the
GPU, you need to add "wait(gpuDevice)" to ensure that everything has
finished running there.

Secondly, there is a fixed overhead to getting through to launching a
kernel on the GPU, which explains why things don't speed up until you
get to relatively large data sizes.

To evaluate GPU performance for a kernel as simple as this one, you
should compare your measured throughput (i.e. achieved bandwidth) with
the theoretical maximum for your device. For a kernel as simple as this,
you should get close to the peak achievable bandwidth for your device,
probably when numel(A) is around 1e5 or thereabouts.

Cheers,

Edric.

Date Subject Author
11/8/12 Jerome
11/9/12 Edric Ellis
11/9/12 Jerome