On Thu, 03 May 2012 08:57:11 +0100, Rod wrote: > "James Waldby" wrote ... >> On Wed, 02 May 2012 13:28:47 +0100, Rod wrote: >>> If I look at a binary variable, say smoking/no smoking, and for each >>> county in the world or state in a country, calculate the proportion >>> of adults who smoke and then plot a frequency histogram of these >>> numbers. Should I expect the histogram to be from a particular >>> distribution, and if so which? >> >> If the binary variable is something that depends on locale or culture, >> per-county (country?) or per-state histograms will not in any >> meaningful sense be drawn from a particular distribution. A >> histogram itself might be thought of as drawn from some >> multivariate distribution but I don't know which; maybe Wishart? >> <http://en.wikipedia.org/wiki/Wishart_distribution>. A bit >> more obviously, the statistics of the histogram will be >> chi-squared if the binary variable's causes are universal. > > thanks for this, but why chi-squared?
I meant that the chi-square statistics (which, for a given histogram, is a single number computed according to the counts in the cells of the histogram) for all histograms will be from the same chi-square distribution, if the binary variable's causes are universal.
You may find Rich Ulrich's post of 03 May 2012 14:00:25 -0500 in sci.op-research / sci.stat.math / sci.math thread "Re: a combinatorial question" relevant too, in that he notes that for large tables, "each cell can be regarded as a 1 d.f. chisquared. But a chisquared is simply a normal variate, z, squared." etc.