Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Performance Difference in CPU and GPU in MATALB
Replies: 2   Last Post: Nov 9, 2012 7:07 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Edric Ellis

Posts: 666
Registered: 12/7/04
Re: Performance Difference in CPU and GPU in MATALB
Posted: Nov 9, 2012 4:21 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

"Jerome " <the_rome@hotmail.com> writes:

> I have invoked a cuda kernel from my MATLAB implementation; however my
> CPU results are faster than my gpu implementation.
>
> The results are:
>
> CPU: 0.000006
> GPU: 0.00134
> My kernel and MATLAB code is below:
>
> Thanks in Advance!
>
> matrix.cu
>
> __global__ void matrix_mult2(double *A, double *B, double * C) {
> int x = blockIdx.x * blockDim.x + threadIdx.x;
>
> C[x] = A[x] * B[x];
>
>
> }
>
>
>
> main.m
> kernel = parallel.gpu.CUDAKernel( 'matrix_mult2.ptx', ...
> 'matrix_mult2.cu' );
>
>
> kernel.ThreadBlockSize = [25,1,1];
> kernel.GridSize = [1,1];
>
>
> A = parallel.gpu.GPUArray.rand(5,5,'double');
> B = parallel.gpu.GPUArray.rand(5,5,'double');
> C = parallel.gpu.GPUArray.zeros(5,5);
>
> C = feval(kernel,A,B,C);


Firstly, to get accurate timing information when running stuff on the
GPU, you need to add "wait(gpuDevice)" to ensure that everything has
finished running there.

Secondly, there is a fixed overhead to getting through to launching a
kernel on the GPU, which explains why things don't speed up until you
get to relatively large data sizes.

To evaluate GPU performance for a kernel as simple as this one, you
should compare your measured throughput (i.e. achieved bandwidth) with
the theoretical maximum for your device. For a kernel as simple as this,
you should get close to the peak achievable bandwidth for your device,
probably when numel(A) is around 1e5 or thereabouts.

Cheers,

Edric.



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.