Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Parallel Computing Toolbox - Random numbers generation within tasks - a seed issue...
Replies: 8   Last Post: Apr 16, 2013 11:05 AM

 Search Thread: Advanced Search

 Messages: [ Previous | Next ]
 Peter Perkins Posts: 156 Registered: 8/12/11
Re: Parallel Computing Toolbox - Random numbers generation within
tasks - a seed issue...

Posted: Apr 2, 2013 9:24 AM
 Plain Text Reply

Gabriele, you're doing large-scale parallel simulations. You should be
using the right tools for that. Setting seeds based on current time or
whatever is like throwing darts at a dartboard. You need something more
controlled.

MATLAB includes two random number generators, mrg32k3a and mldfg6331,
that are specifically designed for the kind of thing you're doing. They
both support multiple independent streams and substreams. (the latter is
more or less a lighterweight version of the former). I can't really
follow all of the "topology" that you describe, but by basing the stream
(or substream) index on the tasks, or workers, or runs, you can ensure
that you don't reuse the same random numbers.

This is described at length in a couple of blog posts:

http://blogs.mathworks.com/loren/2008/11/05/new-ways-with-random-numbers-part-i
http://blogs.mathworks.com/loren/2008/11/13/new-ways-with-random-numbers-part-ii

I hope this is helpful.

On 3/25/2013 6:34 AM, Gabriele wrote:
> Hi All,
> I am having some problems in consistently generating random numbers
> within tasks.
> I suppose my problems come from the fact that it is not clear to me how
> the seed for the stream is handled by the tasks belonging to a job.
>
> So, to make a long story short, and to simplify the problem, I have a
> job, which comprises some tasks. Each task is generating random numbers.
> I would like, of course, that:
> 1) Generated (pseudo-)random numbers are different from task to task
> (actually also between tasks belonging to different jobs);
> 2) Generated (pseudo-)random numbers are different if I run the code twice.
>
> Unfortunately, I cannot manage to get both.
> If I just make a "plain code" (simply calling, e.g., "rand" from each
> task) I achieve 1), but I do not achieve 2), i.e. outcomes from tasks
> are different, but If I run the code twice, I get exactly the same
> outcomes.
> If I try to force the seed (using, e.g., rng('shuffle')) I have the
> problem that, in some cases, different tasks (typically 2 tasks) seem
> like starting at the same time (within the accuracy of the "shuffle"
> algorithm, which seems to be 1/100s looking at randstream.m). As a
> result, some outcomes are different, while other are the same.
>
> I tried putting a rng('shuffle') command in jobStartup.m and in
> taskStartup.m, but I couldn't achieve a robust result fulfilling 1) & 2)
> above. It is not clear to me how an rng(something) command in
> jobStartup.m affects the tasks
>
> I have also tried passing the seed as a parameter to each task, by
> creating the seed for each task on the basis of the progressive task's
> number (say the task ID...). However, this is not very robust, because
> if you start your code twice, the number of tasks, combined with the
> time difference of the two runs can lead to partially identically
> results (this is because in one case you use, e.g.,
> seed=time+task_number, in the second case you use
> seed=time+delta_time+task_number, and for a given delta_time and two
> different task_number you could get the same seed).
>
> So, this is the problem.
> I post below a code which reproduce the issue, at least on my hardware.
> In my case the local profile run 4 workers (plus one client) because I
> have a quad core. Note that the issue does not happen always, so it
> might be necessary to run the code a few times to see a repetition in
> the generation.
> As you will see, in the task creation there are 4 options. Note that:
> - option 4 does not lead to repetition in my case, but results are the
> same at each run (looks like the starting seed for the tasks is always
> the same...i.e. 0). So this option is not usable.
> - option 3: in my case leads to some repetitions in the generated
> numbers. So this is not working.
> - option 2: can potentially lead to repetitions if the operations within
> the for-loop are faster than the "shuffle time accuracy". In my case I
> have not noticed any repetition, so this looks like the preferable
> option...but I am not 100% sure...a possibility would be to add a
> pause(0.01) command in the loop (just to be sure), but this is not
> fantastic...
> - option 1: can potentially lead to repetitions between different runs
> of the code
>
> a global alternative would be to create seed beforehand for each task...
>
> ok, the code is below...
>
> %-------------
> %Main script
> %
>
> %% Identify a cluster:
> parallel.defaultClusterProfile('local');
> c = parcluster();
>
> %% Create a job
> j = createJob(c);
>
> %% Create tasks within a job
> %test random number generation
> Ntests=6*5;
> for jtest=1:Ntests, %create Ntests tasks
>
> %t(jtest)=createTask(j, @f_myrand_with_seed, 1,
> {[3,1],getfield(rng,'Seed')+jtest}); %option 1: fix the seed from the
> main script on the basis of the seed of the client
> %t(jtest)=createTask(j, @f_myrand_with_seed, 1,
> {[3,1],getfield(rng(rng('shuffle')),'Seed')}); %option 2: generate the
> seed using "shuffle" at this moment
> %t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],[]}); %option
> 3: let the function generating the seed internally, using shuffle
> t(jtest)=createTask(j, @f_myrand_with_seed, 1, {[3,1],-1}); %option
> 4: let the task using the seed it is supposed to use
>
> end;
>
> %% Submit the job to the queue
> submit(j);
>
> %% Wait for the job to complete:
> wait(j)
>
> %% Get results
> results = fetchOutputs(j);
>
> %% Delete the job and permanently remove the job from the scheduler's
> storage location
> delete(j)
>
> %% Check the output
> %if two columns are equal, it means the corresponding tasks started from
> the same %random seed...which is something not wanted!
> fprintf('\nIf two columns are equal, this is bad...')
> final_data=[results{:}]
> if any(diff(sort(final_data(1,:)))==0), %checking the first line is
> sufficient in this case
> fprintf('\n...there is a generation problem!\n');
> else
> fprintf('\n...this generation seems to be ok!\n');
> end;
>
> %----------------------------
>
> %---------------------------
> %Additional function
>
> function out=f_myrand_with_seed(dim,sd)
>
> if nargin>1 && ~isempty(sd),
> if sd>0, %change the seed to the required value
> rng(sd);
> end; %note that, when sd<0 we do NOTHING
> else
> rng('shuffle'); %use the clock-based seed
> end;
> out=rand(dim);
> %out=rng;out=out.Seed; %use this line to have the seed from the present
> task
> %-----------------------------
>
> thanks for your comments...
>
> bye,
> gabriele

Date Subject Author
3/25/13 Gabriele
3/26/13 Yair Altman
4/2/13 Peter Perkins
4/11/13 Gabriele
4/12/13 Peter Perkins
4/13/13 Gabriele
4/15/13 Peter Perkins
4/16/13 Gabriele
4/16/13 bastyoconnell

© The Math Forum at NCTM 1994-2018. All Rights Reserved.