M. C. Whitlock, Combining probability from independent tests: the weighted Z-method is superior to Fisher´s approach J. Evol. Biol. 18 (2005)1368-1375, European Society for Evolutionary Biology.
Reporting . . . Fisher´s method, however, does have one significant in this context. It treats large and small P-values asymmetrically. It is easiest to see this problem with an example (see Rice 1990). Imagine there were two studies on a topic that we would like to combine. One of these studies rejected the null hypotheses with P=0.001 while the other did not with P=0.999. Clearly, on average there is no consistent effect in these two studies, yet by Fisher´s method the P-value is P=0.008.
Rice W. R. (1990) A consensus combined P-value test and a family-wide significant of components tests. Biometrics 46 303-308.
My analysis focuses exclusively the text above. In particular the Rice/Stouffer´s method does not concern me here. Given that F = -2*(log .001 + log .999) = 13.8175 and Prob. Chi2(4) >13.28 = 0.990 I got Rice is thinking at a two-component experiment. Then it is easily shown by simulation that a P-amplitude of two random numbers larger than 0.998 is very unlike: prob.= 4/million approx. So the Null Hypotheses shared by both is completely unacceptable (see text ). It results that Rice´s argument against Fisher´s way falls on earth . . . Of course if the number of components gets large then is more and more likely they reach large amplitudes. A crucial point: the P-values follows a uniform Distribution only if we hit *the bulls eye*, the proposed/tentative parameter´s value to test is exactly equal to the Population one. As long I got Whitlock had performed his task following this direction: finding backwards the population p-values by simulation data resampling and drawing conclusions about.