Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Parallel Computing Toolbox - Random numbers generation within tasks - a seed issue...
Replies: 8   Last Post: Apr 16, 2013 11:05 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Gabriele

Posts: 25
Registered: 10/7/09
Parallel Computing Toolbox - Random numbers generation within tasks - a seed issue...
Posted: Mar 25, 2013 6:34 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Hi All,
I am having some problems in consistently generating random numbers within tasks.
I suppose my problems come from the fact that it is not clear to me how the seed for the stream is handled by the tasks belonging to a job.

So, to make a long story short, and to simplify the problem, I have a job, which comprises some tasks. Each task is generating random numbers. I would like, of course, that:
1) Generated (pseudo-)random numbers are different from task to task (actually also between tasks belonging to different jobs);
2) Generated (pseudo-)random numbers are different if I run the code twice.

Unfortunately, I cannot manage to get both.

If I just make a "plain code" (simply calling, e.g., "rand" from each task) I achieve 1), but I do not achieve 2), i.e. outcomes from tasks are different, but If I run the code twice, I get exactly the same outcomes.

If I try to force the seed (using, e.g., rng('shuffle')) I have the problem that, in some cases, different tasks (typically 2 tasks) seem like starting at the same time (within the accuracy of the "shuffle" algorithm, which seems to be 1/100s looking at randstream.m). As a result, some outcomes are different, while other are the same.

I tried putting a rng('shuffle') command in jobStartup.m and in taskStartup.m, but I couldn't achieve a robust result fulfilling 1) & 2) above. It is not clear to me how an rng(something) command in jobStartup.m affects the tasks

I have also tried passing the seed as a parameter to each task, by creating the seed for each task on the basis of the progressive task's number (say the task ID...). However, this is not very robust, because if you start your code twice, the number of tasks, combined with the time difference of the two runs can lead to partially identically results (this is because in one case you use, e.g., seed=time+task_number, in the second case you use seed=time+delta_time+task_number, and for a given delta_time and two different task_number you could get the same seed).

So, this is the problem.

I post below a code which reproduce the issue, at least on my hardware. In my case the local profile run 4 workers (plus one client) because I have a quad core. Note that the issue does not happen always, so it might be necessary to run the code a few times to see a repetition in the generation.

As you will see, in the task creation there are 4 options. Note that:
- option 4 does not lead to repetition in my case, but results are the same at each run (looks like the starting seed for the tasks is always the same...i.e. 0). So this option is not usable.
- option 3: in my case leads to some repetitions in the generated numbers. So this is not working.
- option 2: can potentially lead to repetitions if the operations within the for-loop are faster than the "shuffle time accuracy". In my case I have not noticed any repetition, so this looks like the preferable option...but I am not 100% sure...a possibility would be to add a pause(0.01) command in the loop (just to be sure), but this is not fantastic...
- option 1: can potentially lead to repetitions between different runs of the code

a global alternative would be to create seed beforehand for each task...

ok, the code is below...

%-------------
%Main script
%

%% Identify a cluster:
parallel.defaultClusterProfile('local');
c = parcluster();

%% Create a job
j = createJob(c);

%% Create tasks within a job
%test random number generation
Ntests=6*5;
for jtest=1:Ntests, %create Ntests tasks

%t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],getfield(rng,'Seed')+jtest}); %option 1: fix the seed from the main script on the basis of the seed of the client
%t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],getfield(rng(rng('shuffle')),'Seed')}); %option 2: generate the seed using "shuffle" at this moment
%t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],[]}); %option 3: let the function generating the seed internally, using shuffle
t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],-1}); %option 4: let the task using the seed it is supposed to use

end;

%% Submit the job to the queue
submit(j);

%% Wait for the job to complete:
wait(j)

%% Get results
results = fetchOutputs(j);

%% Delete the job and permanently remove the job from the scheduler's storage location
delete(j)

%% Check the output
%if two columns are equal, it means the corresponding tasks started from the same
%random seed...which is something not wanted!
fprintf('\nIf two columns are equal, this is bad...')
final_data=[results{:}]
if any(diff(sort(final_data(1,:)))==0), %checking the first line is sufficient in this case
fprintf('\n...there is a generation problem!\n');
else
fprintf('\n...this generation seems to be ok!\n');
end;

%----------------------------

%---------------------------
%Additional function

function out=f_myrand_with_seed(dim,sd)

if nargin>1 && ~isempty(sd),
if sd>0, %change the seed to the required value
rng(sd);
end; %note that, when sd<0 we do NOTHING
else
rng('shuffle'); %use the clock-based seed
end;
out=rand(dim);
%out=rng;out=out.Seed; %use this line to have the seed from the present task
%-----------------------------

thanks for your comments...

bye,
gabriele



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.