Here's a curiosity I've just come across. I'd be curious if anyone knows anything about it, or knows anything that I don't.
I've had a problem trying to evaluate Spearman's Rho on a large number of (x, y) pairs when the values of x and y are not normally distributed, so I spent a couple of hours experimenting. The reason I am using Spearman's Rho rather than Pearson's r is that the numbers are not normally distributed.
I have two lists of rectangularly distributed random numbers, each 2000 numbers in length. I treat one as a list of x's and the other as a list of y's. I compute Spearman's Rho and its value is close to zero, as you would expect. Actually its value is 0.00547 to three sigfifficant nigures.
So I add the numbers in each list in batches of 10, so I now have two lists of 200 numbers and each number is the sum of ten consecutive numbers in the old lists. I compute Spearman's Rho on the two lists and its value is now -0.0266, a bit less close to zero, and negative.
I tried adding the numbers in batches of 25, so now I had two lists of 80 numbers, and the value of Spearman's Rho was -0.120
Finally I tried batches of 40, and I got a Spearman's Rho of -0.05741088
Now I wondered whether this always happened. I tried two lists of 2000 different numbers and I got these results
Batch size Individual R = -0.00505 10 R = 0.0457 25 R = 0.187 40 R = -0.119
Is this anything interesting or was it obvious to people who know more statistics than I do? Is there any reason to prefer to batch-add the x's and y's or to keep them separate?
Thanks Ken Johnson
Here is the code in R if you want to copy and paste it - I have no idea whether set.seed( ) works the same way on all implementations.
# First set of numbers
set.seed(42) s <- runif(2000, min = 0, max = 100) t <- runif(2000, min = 0, max = 100)
# Second set of numbers
set.seed(342) s <- runif(2000, min = 0, max = 100) t <- runif(2000, min = 0, max = 100)