Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
Re: Performance Difference in CPU and GPU in MATALB
Posted:
Nov 9, 2012 4:21 AM
|
|
"Jerome " <the_rome@hotmail.com> writes:
> I have invoked a cuda kernel from my MATLAB implementation; however my > CPU results are faster than my gpu implementation. > > The results are: > > CPU: 0.000006 > GPU: 0.00134 > My kernel and MATLAB code is below: > > Thanks in Advance! > > matrix.cu > > __global__ void matrix_mult2(double *A, double *B, double * C) { > int x = blockIdx.x * blockDim.x + threadIdx.x; > > C[x] = A[x] * B[x]; > > > } > > > > main.m > kernel = parallel.gpu.CUDAKernel( 'matrix_mult2.ptx', ... > 'matrix_mult2.cu' ); > > > kernel.ThreadBlockSize = [25,1,1]; > kernel.GridSize = [1,1]; > > > A = parallel.gpu.GPUArray.rand(5,5,'double'); > B = parallel.gpu.GPUArray.rand(5,5,'double'); > C = parallel.gpu.GPUArray.zeros(5,5); > > C = feval(kernel,A,B,C);
Firstly, to get accurate timing information when running stuff on the GPU, you need to add "wait(gpuDevice)" to ensure that everything has finished running there.
Secondly, there is a fixed overhead to getting through to launching a kernel on the GPU, which explains why things don't speed up until you get to relatively large data sizes.
To evaluate GPU performance for a kernel as simple as this one, you should compare your measured throughput (i.e. achieved bandwidth) with the theoretical maximum for your device. For a kernel as simple as this, you should get close to the peak achievable bandwidth for your device, probably when numel(A) is around 1e5 or thereabouts.
Cheers,
Edric.
|
|
|
|