Am 10.05.2013 08:12, schrieb Rich Ulrich: > On Wed, 08 May 2013 11:29:27 +0200, Mok-Kong Shen > <firstname.lastname@example.org> wrote: > >> >> I want to compare in a practical sense two methods of random >> permutations -- one theoretically perfect, namely that of Fisher and >> Yates, and another ad hoc, let's call it X. A way of comparison I could >> think of is the following: >> >> One starts from the standard configuration of n objects [0, 1, 2, >> ...., n-1] and apply each method a fairly large number of times >> successively to permute them and each time one computes the Hamming >> distance of the result from the standard configuaration. One obtains >> thus the frequency distribution of the Hamming distances for each >> method. If the frequency distributions are fairly comparable to each >> other, then X could be practically employed in place of the >> theoretically perfect one. >> >> Is this line of thought of mine correct? Does anyone have an idea of >> a better method of comparison? Thanks in advance. >> > > I start with guess-work on what you are doing. > > I looked up Hamming distance, and it reminded me only > of some previous questions about matching DNA sequences. > > I looked up Fisher and Yates, with Hamming, and came up > with the algorithm also known as Knuth Shuffle. That does > make a bit of sense if that is a way of finding how distant > two DNA strings are from each other. > > Since F&Y is "theoretically perfect", I assume that the > problem is that it is too time-consuming, and you want to > do a partial enumeration instead of this (apparently > efficient) full enumeration. This seems logical, whatever > the ultimate goal. > > I will suggest that instead of matching the whole range > of the distribution of the Hamming distances, you want > to match the relevant portion -- the matches that are close > enough to be interesting. One thing that comes to mind is > that Monte Carlo simulations to get p-levels require large Ns > if you want "exact p-values" that are very close for p=0.001 > or smaller, compared to getting good estimates for p=0.5 or 0.05. > - IIRC, I read years ago about computational "swindles" that > reduced the time by ordering the comparisons so that the tails > were counted first, and counting could stop when a criterion > was reached. > > And (I think) you want to look at the particular error distribution: > Matching the shape is no good if you don't select the same cases.
Thanks for the comments. Indeed what I need for my applications is in the direction that you indicated. In particular I don't need to generate the whole range of possible permutations but on the other hand a not too limited small range. Consider the case n=52, one knows that in-shuffle and out-shuffle have periods of 52 and 8 respectively, which are negligibly small compared to the whole range (of size (52!)). On the other hand, the method of Fisher and Yates needs n-1 (pseudo-)random numbers. In one of my applications a couple of such random numbers are already available for other reasons, so it would certainly be conveninent and advantageous, if I could just use them for doing random permutation without having to invoke a PRNG to acquire the n-1 random numbers needed for F&Y. This was the motivation of my OP and, as it turns out, my scheme runs also much faster than F&Y as reported further below.
For n=52 I took the list [0, 1, 2, 3, ....., 51] (the standard configuration) and applied diverse variants of my scheme (which are based on in-shuffle and thus are actually to be considered as its generalizations) as well as F&Y 100000000 times repeatedly. At each step the Hamming distance (H-D) of the result with respect to the standard configuration is computed, with its frequency distribution listed below, where Scheme0 is just the normal shuffling employed in card games, Scheme1 to Scheme4 are different variants of my scheme and F&Y is Fisher and Yates and 'Expected' is the expected values (rounded) based on combinatorics. I did 3 runs, the results are almost the same. Below is one typical series, with runtime in seconds on my PC.
Scheme0 is the same as in normal card games, where one does a cut (depending on the PRN ur in [0,n-1]) and then performs an (exact) in-shuffle. Scheme1 does a cut and then divides the pack of cards into two (depending on the PRN us in [0,n-1]) and performs an in-shuffle to each and finally does an in-shuffle of the entire pack. Scheme2 is the same as Scheme1, excepting with ur = us, i.e. one uses only one PRN. Scheme3 and Scheme4 are the same as Scheme1 and Scheme2, excepting that the final in-shuffle is absent. As can be seen, Scheme1 to Scheme4 are all fairly comparable to F&Y and the theoretically values and run much faster than F&Y.
def hamming(a,b,n): h=0 for i in range(n): if a[i] != b[i]: h+=1 return(h)
# in-shuffle def riffle(cards,n): newcards= nh=n//2 for i in range(nh): newcards.append(cards[nh+i]) newcards.append(cards[i]) if nh*2<n: newcards.append(cards[n-1]) return(newcards)
# This is taken over from what is done in normal card games. def scheme0(cards,ur,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards,n) return(newcards)
# For Scheme1 use 2 PRNs ur and us, for Scheme2 use 1 PRN, i.e. ur = us. def scheme12(cards,ur,us,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],n-us) newcards=riffle(newcards,n) return(newcards)
# For Scheme3 use 2 PRNs ur and us, for Scheme4 use 1 PRN, i.e. ur = us. def scheme34(cards,ur,us,n): newcards=cards[0:ur]+cards[ur:n] newcards=riffle(newcards[0:us],us)+riffle(newcards[us:n],n-us) return(newcards)
# The theoretically perfect pseudo-random Permuation of Fisher and Yates. def fisheryates(cards,n): newcards=cards[:] nn1=n-1 for i in range(nn1): k=random.randint(i,nn1) t=newcards[i] newcards[i]=newcards[k] newcards[k]=t return(newcards)
# Example computation with Scheme3
def comp(n,times): bb=[i for i in range(n)] cc=[0 for i in range(n+1)] bbc=bb[:] hmin=n+1 imin=-1 start=time.clock() for i in range(times): sh=random.randint(0,n) kn=random.randint(0,n) bbc=scheme34(bbc,sh,kn,n) h=hamming(bb,bbc,n) cc[h]+=1 print("time",time.clock()-start) print(cc)