Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Compare two methods of random permutations
Replies: 8   Last Post: Jul 28, 2015 12:29 PM

 Messages: [ Previous | Next ]
 Mok-Kong Shen Posts: 629 Registered: 12/8/04
Re: Compare two methods of random permutations
Posted: May 10, 2013 12:38 PM

Am 10.05.2013 08:12, schrieb Rich Ulrich:
> On Wed, 08 May 2013 11:29:27 +0200, Mok-Kong Shen
> <mok-kong.shen@t-online.de> wrote:
>

>>
>> I want to compare in a practical sense two methods of random
>> permutations -- one theoretically perfect, namely that of Fisher and
>> Yates, and another ad hoc, let's call it X. A way of comparison I could
>> think of is the following:
>>
>> One starts from the standard configuration of n objects [0, 1, 2,
>> ...., n-1] and apply each method a fairly large number of times
>> successively to permute them and each time one computes the Hamming
>> distance of the result from the standard configuaration. One obtains
>> thus the frequency distribution of the Hamming distances for each
>> method. If the frequency distributions are fairly comparable to each
>> other, then X could be practically employed in place of the
>> theoretically perfect one.
>>
>> Is this line of thought of mine correct? Does anyone have an idea of
>> a better method of comparison? Thanks in advance.
>>

>
> I start with guess-work on what you are doing.
>
> I looked up Hamming distance, and it reminded me only
> of some previous questions about matching DNA sequences.
>
> I looked up Fisher and Yates, with Hamming, and came up
> with the algorithm also known as Knuth Shuffle. That does
> make a bit of sense if that is a way of finding how distant
> two DNA strings are from each other.
>
> Since F&Y is "theoretically perfect", I assume that the
> problem is that it is too time-consuming, and you want to
> do a partial enumeration instead of this (apparently
> efficient) full enumeration. This seems logical, whatever
> the ultimate goal.
>
> I will suggest that instead of matching the whole range
> of the distribution of the Hamming distances, you want
> to match the relevant portion -- the matches that are close
> enough to be interesting. One thing that comes to mind is
> that Monte Carlo simulations to get p-levels require large Ns
> if you want "exact p-values" that are very close for p=0.001
> or smaller, compared to getting good estimates for p=0.5 or 0.05.
> - IIRC, I read years ago about computational "swindles" that
> reduced the time by ordering the comparisons so that the tails
> were counted first, and counting could stop when a criterion
> was reached.
>
> And (I think) you want to look at the particular error distribution:
> Matching the shape is no good if you don't select the same cases.

Thanks for the comments. Indeed what I need for my applications is in
the direction that you indicated. In particular I don't need to
generate the whole range of possible permutations but on the other hand
a not too limited small range. Consider the case n=52, one knows that
in-shuffle and out-shuffle have periods of 52 and 8 respectively, which
are negligibly small compared to the whole range (of size (52!)). On the
other hand, the method of Fisher and Yates needs n-1 (pseudo-)random
numbers. In one of my applications a couple of such random numbers are
already available for other reasons, so it would certainly be
conveninent and advantageous, if I could just use them for doing random
permutation without having to invoke a PRNG to acquire the n-1 random
numbers needed for F&Y. This was the motivation of my OP and, as it
turns out, my scheme runs also much faster than F&Y as reported further
below.

For n=52 I took the list [0, 1, 2, 3, ....., 51] (the standard
configuration) and applied diverse variants of my scheme (which are
based on in-shuffle and thus are actually to be considered as its
generalizations) as well as F&Y 100000000 times repeatedly. At each
step the Hamming distance (H-D) of the result with respect to the
standard configuration is computed, with its frequency distribution
listed below, where Scheme0 is just the normal shuffling employed in
card games, Scheme1 to Scheme4 are different variants of my scheme and
F&Y is Fisher and Yates and 'Expected' is the expected values
(rounded) based on combinatorics. I did 3 runs, the results are almost
the same. Below is one typical series, with runtime in seconds on my PC.

H-D Scheme0 Scheme1 Scheme2 Scheme3 Scheme4 F&Y Expected

0 1923076 0 0 0 0 0 0
1-40 0 0 0 0 0 0 0
41 0 0 0 0 0 1 1
42 0 13 11 10 14 9 10
43 0 111 97 111 124 115 101
44 0 860 872 858 867 910 912
45 0 7259 7346 7239 7468 7315 7299
46 0 51096 50790 51085 50983 51183 51094
47 0 306209 306935 307452 307651 306641 306566
48 0 1532985 1529885 1529360 1532516 1530699 1532831
49 0 6132833 6127981 6131083 6130604 6132153 6131324
50 0 18395994 18399139 18395668 18397303 18385756 18393972
51 0 36791857 36780612 36789226 36780703 36799043 36787944
52 98076924 36780783 36796332 36787908 36791767 36786175 36787944

Time 2878 4706 4417 3512 3208 17937

Scheme0 is the same as in normal card games, where one does a cut
(depending on the PRN ur in [0,n-1]) and then performs an (exact)
in-shuffle. Scheme1 does a cut and then divides the pack of cards into
two (depending on the PRN us in [0,n-1]) and performs an in-shuffle to
each and finally does an in-shuffle of the entire pack. Scheme2 is
the same as Scheme1, excepting with ur = us, i.e. one uses only one PRN.
Scheme3 and Scheme4 are the same as Scheme1 and Scheme2, excepting that
the final in-shuffle is absent. As can be seen, Scheme1 to Scheme4 are
all fairly comparable to F&Y and the theoretically values and run much
faster than F&Y.

The computation was done in Python with codes given further below,
using the built-in PRNG of Python. Scheme3 has been used in an
application of mine (see http://s13.zetaboards.com/Crypto/topic/6925232/1/)

It may be noted that e.g. Scheme3 could be employed also in real card
games with a presumably acceptable complication of the common manual
shuffling procedure.

M. K. Shen

---------------------------------------------------------

import random,time

def hamming(a,b,n):
h=0
for i in range(n):
if a[i] != b[i]:
h+=1
return(h)

# in-shuffle
def riffle(cards,n):
newcards=[]
nh=n//2
for i in range(nh):
newcards.append(cards[nh+i])
newcards.append(cards[i])
if nh*2<n:
newcards.append(cards[n-1])
return(newcards)

# This is taken over from what is done in normal card games.
def scheme0(cards,ur,n):
newcards=cards[0:ur]+cards[ur:n]
newcards=riffle(newcards,n)
return(newcards)

# For Scheme1 use 2 PRNs ur and us, for Scheme2 use 1 PRN, i.e. ur = us.
def scheme12(cards,ur,us,n):
newcards=cards[0:ur]+cards[ur:n]
newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],n-us)
newcards=riffle(newcards,n)
return(newcards)

# For Scheme3 use 2 PRNs ur and us, for Scheme4 use 1 PRN, i.e. ur = us.
def scheme34(cards,ur,us,n):
newcards=cards[0:ur]+cards[ur:n]
newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],n-us)
return(newcards)

# The theoretically perfect pseudo-random Permuation of Fisher and Yates.
def fisheryates(cards,n):
newcards=cards[:]
nn1=n-1
for i in range(nn1):
k=random.randint(i,nn1)
t=newcards[i]
newcards[i]=newcards[k]
newcards[k]=t
return(newcards)

# Example computation with Scheme3

def comp(n,times):
bb=[i for i in range(n)]
cc=[0 for i in range(n+1)]
bbc=bb[:]
hmin=n+1
imin=-1
start=time.clock()
for i in range(times):
sh=random.randint(0,n)
kn=random.randint(0,n)
bbc=scheme34(bbc,sh,kn,n)
h=hamming(bb,bbc,n)
cc[h]+=1
print("time",time.clock()-start)
print(cc)

# Run the example

comp(52,100000000)

Date Subject Author
5/8/13 Mok-Kong Shen
5/10/13 Ray Koopman
5/10/13 Richard Ulrich
5/10/13 Mok-Kong Shen
5/10/13 Gordon Sande
5/11/13 Mok-Kong Shen
5/12/13 David Jones
5/13/13 Mok-Kong Shen
7/28/15 sean.c4s.vn@gmail.com