
Re: Compare two methods of random permutations
Posted:
May 10, 2013 12:38 PM


Am 10.05.2013 08:12, schrieb Rich Ulrich: > On Wed, 08 May 2013 11:29:27 +0200, MokKong Shen > <mokkong.shen@tonline.de> wrote: > >> >> I want to compare in a practical sense two methods of random >> permutations  one theoretically perfect, namely that of Fisher and >> Yates, and another ad hoc, let's call it X. A way of comparison I could >> think of is the following: >> >> One starts from the standard configuration of n objects [0, 1, 2, >> ...., n1] and apply each method a fairly large number of times >> successively to permute them and each time one computes the Hamming >> distance of the result from the standard configuaration. One obtains >> thus the frequency distribution of the Hamming distances for each >> method. If the frequency distributions are fairly comparable to each >> other, then X could be practically employed in place of the >> theoretically perfect one. >> >> Is this line of thought of mine correct? Does anyone have an idea of >> a better method of comparison? Thanks in advance. >> > > I start with guesswork on what you are doing. > > I looked up Hamming distance, and it reminded me only > of some previous questions about matching DNA sequences. > > I looked up Fisher and Yates, with Hamming, and came up > with the algorithm also known as Knuth Shuffle. That does > make a bit of sense if that is a way of finding how distant > two DNA strings are from each other. > > Since F&Y is "theoretically perfect", I assume that the > problem is that it is too timeconsuming, and you want to > do a partial enumeration instead of this (apparently > efficient) full enumeration. This seems logical, whatever > the ultimate goal. > > I will suggest that instead of matching the whole range > of the distribution of the Hamming distances, you want > to match the relevant portion  the matches that are close > enough to be interesting. One thing that comes to mind is > that Monte Carlo simulations to get plevels require large Ns > if you want "exact pvalues" that are very close for p=0.001 > or smaller, compared to getting good estimates for p=0.5 or 0.05. >  IIRC, I read years ago about computational "swindles" that > reduced the time by ordering the comparisons so that the tails > were counted first, and counting could stop when a criterion > was reached. > > And (I think) you want to look at the particular error distribution: > Matching the shape is no good if you don't select the same cases.
Thanks for the comments. Indeed what I need for my applications is in the direction that you indicated. In particular I don't need to generate the whole range of possible permutations but on the other hand a not too limited small range. Consider the case n=52, one knows that inshuffle and outshuffle have periods of 52 and 8 respectively, which are negligibly small compared to the whole range (of size (52!)). On the other hand, the method of Fisher and Yates needs n1 (pseudo)random numbers. In one of my applications a couple of such random numbers are already available for other reasons, so it would certainly be conveninent and advantageous, if I could just use them for doing random permutation without having to invoke a PRNG to acquire the n1 random numbers needed for F&Y. This was the motivation of my OP and, as it turns out, my scheme runs also much faster than F&Y as reported further below.
For n=52 I took the list [0, 1, 2, 3, ....., 51] (the standard configuration) and applied diverse variants of my scheme (which are based on inshuffle and thus are actually to be considered as its generalizations) as well as F&Y 100000000 times repeatedly. At each step the Hamming distance (HD) of the result with respect to the standard configuration is computed, with its frequency distribution listed below, where Scheme0 is just the normal shuffling employed in card games, Scheme1 to Scheme4 are different variants of my scheme and F&Y is Fisher and Yates and 'Expected' is the expected values (rounded) based on combinatorics. I did 3 runs, the results are almost the same. Below is one typical series, with runtime in seconds on my PC.
HD Scheme0 Scheme1 Scheme2 Scheme3 Scheme4 F&Y Expected
0 1923076 0 0 0 0 0 0 140 0 0 0 0 0 0 0 41 0 0 0 0 0 1 1 42 0 13 11 10 14 9 10 43 0 111 97 111 124 115 101 44 0 860 872 858 867 910 912 45 0 7259 7346 7239 7468 7315 7299 46 0 51096 50790 51085 50983 51183 51094 47 0 306209 306935 307452 307651 306641 306566 48 0 1532985 1529885 1529360 1532516 1530699 1532831 49 0 6132833 6127981 6131083 6130604 6132153 6131324 50 0 18395994 18399139 18395668 18397303 18385756 18393972 51 0 36791857 36780612 36789226 36780703 36799043 36787944 52 98076924 36780783 36796332 36787908 36791767 36786175 36787944
Time 2878 4706 4417 3512 3208 17937
Scheme0 is the same as in normal card games, where one does a cut (depending on the PRN ur in [0,n1]) and then performs an (exact) inshuffle. Scheme1 does a cut and then divides the pack of cards into two (depending on the PRN us in [0,n1]) and performs an inshuffle to each and finally does an inshuffle of the entire pack. Scheme2 is the same as Scheme1, excepting with ur = us, i.e. one uses only one PRN. Scheme3 and Scheme4 are the same as Scheme1 and Scheme2, excepting that the final inshuffle is absent. As can be seen, Scheme1 to Scheme4 are all fairly comparable to F&Y and the theoretically values and run much faster than F&Y.
The computation was done in Python with codes given further below, using the builtin PRNG of Python. Scheme3 has been used in an application of mine (see http://s13.zetaboards.com/Crypto/topic/6925232/1/)
It may be noted that e.g. Scheme3 could be employed also in real card games with a presumably acceptable complication of the common manual shuffling procedure.
M. K. Shen

import random,time
def hamming(a,b,n): h=0 for i in range(n): if a[i] != b[i]: h+=1 return(h)
# inshuffle def riffle(cards,n): newcards=[] nh=n//2 for i in range(nh): newcards.append(cards[nh+i]) newcards.append(cards[i]) if nh*2<n: newcards.append(cards[n1]) return(newcards)
# This is taken over from what is done in normal card games. def scheme0(cards,ur,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards,n) return(newcards)
# For Scheme1 use 2 PRNs ur and us, for Scheme2 use 1 PRN, i.e. ur = us. def scheme12(cards,ur,us,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],nus) newcards=riffle(newcards,n) return(newcards)
# For Scheme3 use 2 PRNs ur and us, for Scheme4 use 1 PRN, i.e. ur = us. def scheme34(cards,ur,us,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],nus) return(newcards)
# The theoretically perfect pseudorandom Permuation of Fisher and Yates. def fisheryates(cards,n): newcards=cards[:] nn1=n1 for i in range(nn1): k=random.randint(i,nn1) t=newcards[i] newcards[i]=newcards[k] newcards[k]=t return(newcards)
# Example computation with Scheme3
def comp(n,times): bb=[i for i in range(n)] cc=[0 for i in range(n+1)] bbc=bb[:] hmin=n+1 imin=1 start=time.clock() for i in range(times): sh=random.randint(0,n) kn=random.randint(0,n) bbc=scheme34(bbc,sh,kn,n) h=hamming(bb,bbc,n) cc[h]+=1 print("time",time.clock()start) print(cc)
# Run the example
comp(52,100000000)

