The Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math

Topic: Behaviour of Spearman's Rho when data are added in batches
Replies: 4   Last Post: Jun 11, 2017 8:31 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
kenjohnson195105@gmail.com

Posts: 2
Registered: 2/9/17
Behaviour of Spearman's Rho when data are added in batches
Posted: Jun 4, 2017 5:41 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Here's a curiosity I've just come across. I'd be curious if anyone knows anything about it, or knows anything that I don't.

I've had a problem trying to evaluate Spearman's Rho on a large number of (x, y) pairs when the values of x and y are not normally distributed, so I spent a couple of hours experimenting. The reason I am using Spearman's Rho rather than Pearson's r is that the numbers are not normally distributed.

I have two lists of rectangularly distributed random numbers, each 2000 numbers in length. I treat one as a list of x's and the other as a list of y's. I compute Spearman's Rho and its value is close to zero, as you would expect. Actually its value is 0.00547 to three sigfifficant nigures.

So I add the numbers in each list in batches of 10, so I now have two lists of 200 numbers and each number is the sum of ten consecutive numbers in the old lists. I compute Spearman's Rho on the two lists and its value is now -0.0266, a bit less close to zero, and negative.

I tried adding the numbers in batches of 25, so now I had two lists of 80 numbers, and the value of Spearman's Rho was -0.120

Finally I tried batches of 40, and I got a Spearman's Rho of -0.05741088

Now I wondered whether this always happened. I tried two lists of 2000 different numbers and I got these results

Batch size
Individual R = -0.00505
10 R = 0.0457
25 R = 0.187
40 R = -0.119

Is this anything interesting or was it obvious to people who know more statistics than I do? Is there any reason to prefer to batch-add the x's and y's or to keep them separate?

Thanks
Ken Johnson

Here is the code in R if you want to copy and paste it - I have no idea whether set.seed( ) works the same way on all implementations.

# First set of numbers

set.seed(42)
s <- runif(2000, min = 0, max = 100)
t <- runif(2000, min = 0, max = 100)

# Second set of numbers

set.seed(342)
s <- runif(2000, min = 0, max = 100)
t <- runif(2000, min = 0, max = 100)

# Batch add in batches of 10

s1 <- vector(length = 200)
t1 <- vector(length = 200)

for (i in 1:200)
{
for (j in (((i - 1) * 10) + 1):(((i - 1) * 10) + 10))
{
s1[i] <- s1[i] + s[j]
t1[i] <- t1[i] + t[j]
}
}

# now try Spearman

cor.test(s, t, method = "spearm")

# Spearman's rank correlation rho
#
# data: s and t
# S = 1.326e+09, p-value = 0.8067
# alternative hypothesis: true rho is not equal to 0
# sample estimates:
# rho
# 0.005473995

cor.test(s1, t1, method = "spearm")

# Spearman's rank correlation rho
#
# data: s1 and t1
# S = 1368700, p-value = 0.7085
# alternative hypothesis: true rho is not equal to 0
# sample estimates:
# rho
# -0.02657766

# same effect - try larger batch size of 25

s2 <- vector(length = 80)
t2 <- vector(length = 80)

for (i in 1:80)
{
for (j in (((i - 1) * 25) + 1):(((i - 1) * 25) + 25))
{
s2[i] <- s2[i] + s[j]
t2[i] <- t2[i] + t[j]
}
}

cor.test(s2, t2, method = "spearm")

# Spearman's rank correlation rho
#
# data: s2 and t2
# S = 95518, p-value = 0.2903
# alternative hypothesis: true rho is not equal to 0
# sample estimates:
# rho
# -0.1195265

s3 <- vector(length = 40)
t3 <- vector(length = 40)

for (i in 1:40)
{
for (j in (((i - 1) * 50) + 1):(((i - 1) * 50) + 50))
{
s3[i] <- s3[i] + s[j]
t3[i] <- t3[i] + t[j]
}
}

cor.test(s3, t3, method = "spearm")

# Spearman's rank correlation rho
#
# data: s3 and t3
# S = 11272, p-value = 0.7242
# alternative hypothesis: true rho is not equal to 0
# sample estimates:
# rho
# -0.05741088

# Second set of numbers
# Batch size 1 R = -0.005046081
# 10 R = 0.04569264
# 25 R = 0.1874824
# 40 R = -0.1193246





Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2017. All Rights Reserved.