I think my processing time is being chewed up by spmd overhead. I have a simulation that looks like this:
for x = 1:N spmd D = expensiveFunction(D) end <gather small bits from workers> <fast simple function> spmd D = expensiveFunction2(D) end
<repeat the above basic idea 3-4 times>
Watching my processors, they are seldom above 20% utilization, so it makes me think that I'm suffering from going in and out of spmd and maybe gathering even small pieces of data is a problem. Any alternatives? For one, is there a way to gather directly worker to worker without exiting spmd and bringing it all to the master? The data is small and functions cheap, so maybe I'd be ahead to let everyone do the same computations just to keep things flowing.