Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Speedup array arithmetic using GPU
Replies: 0

 Michael Posts: 4 Registered: 8/26/13
Speedup array arithmetic using GPU
Posted: Aug 26, 2013 4:09 PM

I have three arrays, say for example
x_pts = rand(3,1000);
x_pts_sig = rand(3,1000);
s_pts_all = rand(3,10000);

I want to be able to speed up the following code on a GPU, but have found that running for loops on the GPU does not provide any increase in performance when opposed to using say a parfor loop on the CPU across the entire x_pts array. I have a feeling that I should unroll the for loop, but don't know how I would go about doing that, especially with the line using the bsxfun function.

% make all variables GPU arrays
x_pts = gpuArray(single(x_pts));
x_pts_sig = gpuArray(single(x_pts_sig));
s_pts_all = gpuArray(single(s_pts_all));

% calculate partial likelihood values for each pose on the GPU
posterior_sum_ix = gpuArray.zeros(1,size(s_pts_all,2));
for ix_pts = 1:size(x_pts,2)
sig = [x_pts_sig(1,ix_pts) 0 0;0 x_pts_sig(2,ix_pts) 0;0 0 x_pts_sig(3,ix_pts)];
fconst = 1/(2*pi^(3/2)*sqrt((det(sig))));
dist_ix_s = bsxfun(@minus,x_pts(:,ix_pts),s_pts_all);
dist_sq = dist_ix_s.^2;
dist_norm = sum(dist_sq'/sig,2)';
posterior_all = fconst .* exp(-.5*dist_norm);
posterior_sum_ix = posterior_sum_ix + posterior_all;
end
posterior_sum_ix = gather(posterior_sum_ix);

I know that I could move the lines 'sig = ...' and 'fconst = ...' out of the foor loop, but the profiler says that would be a negligible speedup. Also I know that I could save storage on the GPU by combining some of the lines in the for loop. Any suggestions would be helpful!