> I am writing a parallel code of a master-slave model for data > processing. The master worker, 1, is listening to any request from > slaves, 2-8, and distribute the work load and determine the exit > condition. The master worker uses [data, source, tag] = labReceive to > wait until any slave finishes and report. After two communications > between master and slaves, the master goes to sleep and doesn't wake > up until all the slaves send report. This makes the earlier finished > workers idle until all others are done. I am using mpiSettings to > check the deadlock but it is not the synchronization problem but the > labReceive somehow does not respond promptly. I suspect if there is > any buffer to fill for the communication but not clear in any Matlab > documentations. The code looks like: > > spmd > if labindex == 1 %Master > while (NOT Exit condition) > [data, source, tag] = labReceive; > % Process the data > labSend(processed_data, source); > % Update Exit condition > end > else %Slave > while (NOT Exit condition) > % Computation > processed_data = labSendReceive(1,1,computed_data, 1); > % Update Exit condition > % Update computation variable > end > end
It's difficult to know exactly what the problem is here. I'll just point out a few things that might give you some more information:
1. labSendReceive is not really intended to be used in conjunction with separate labSend/labReceive calls as you are doing here - but I don't think this is the problem (and as you've discovered, it should work). In your case, the 'slaves' could quite easily use separate labSend and labReceive calls.
2. The buffering in the labSend/Receive calls is done entirely by the MPI implementation (MPICH2), and in practice what you'll find is that a call to 'labSend' can complete before the corresponding 'labReceive' has been posted only if the message is less than around 128kB.
3. Deadlock detection only operates on separate labSend/labReceive calls.
4. Is there any other communication going on? How are you sure that all labs 'exit condition' is in sync?