The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: An estimation question
Replies: 5   Last Post: May 2, 2012 10:20 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 139
Registered: 4/27/05
Re: An estimation question
Posted: May 2, 2012 7:51 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Apr 30, 9:37 pm, Rich Ulrich <> wrote:
> On Sat, 28 Apr 2012 08:18:48 -0700 (PDT), ""
> <> wrote:

> >This problem occurs a lot in real life..
> >You sample n people, and a proportion p of them are found to be
> >carrying a red flag (like some political party, prefer a brand of soap
> >etc.). Textbooks say that the estimate of the proportion carrying the
> >red flag in the total population is p with a variance of n.p.(1-p).
> >This would indicate that p close to 0 or 100 pct can be estimated with
> >smaller samples than p around 50 pct with the same confidence.

> >But suppose we have carried out these samplings repeatedly and past
> >results show that the proportion carrying the red flag always comes in
> >between 0 and say 15 pct.  We can even estimate a histogram
> >distribution of p from past samples.  If we now make a new sampling of
> >n items - and we wish to rely on the past sampling results, how would
> >the mean and variance estimates change?

> >Thanks for any replies.
> I thought you would elicit some sort of Bayesian answer,
> but that hasn't happened.
> Bayesian computation uses a "prior distribution" and
> comes up with a combined, Bayesian estimate -- but that
> is not the same, exactly, as reporting Mean and SD.
> And I'm not a bayesian advocate, nor am I up-to-speed
> on what they are doing, but my impression is that the
> results, in terms of narrowing or modifying the estimators
> is ordinarily of the magnitude that you get by adding a
> total of 1 case, or very few cases, to the observed sample
> size.
> If you want to make a statement based on a long time-series
> of observations, there are classical techniques that *might*
> be applicable -- What is appropriate would depend on
> whether you are tapping some dimension that you is
> constant ("is thought to be constant")  or that might
> have a slow change, relative to the number of census points.
> For the simplest instance -- If there is no change expected
> or suggested by the data, you might decide to pool all the
> avialable data, and present the overall mean and SD, based
> on the total N.  -- If that comes to a really large N, it will
> produce a SD that is too small, because it will not take into
> account the standard error of the bias of the estiumations.
> If there is slow change, you might argue for a time-series
> projection.  That would mainly use the most recent points,
> but it might afford a more precise estimate of the present
> mean than you get by using the latest data alone.
> --
> Rich Ulrich- Hide quoted text -
> - Show quoted text -

Thanks, Rich and David. I thought some more about the problem and it
seems to me that we have to specify what is being measured.

(1) The classical problem can be stated in terms of an urn with a
fixed number of black and white balls. You sample n balls (the number
of balls in the urn is >> n so that replacement or non-replacement
doesn't matter) and m of them turn out to be black. m/n is the best
estimate of the proportion of black balls in the urn.

(2) In the "Bayesian" version there are k urns with the proportion of
black balls in urn j being p(j), which are all known. You control n
the number of balls sampled, but they all come from a single urn whose
ientity is not known to you. The problem here is that if m of them
turned out to be black what is the probablity that they all came from
urn j.

In real life - case (1) applies when you are emasuring an objective
reality outside your sampling - such as the proportion of women or
left-handed people in a population. In thsi case the observed
variation arises purely from the finiteness of the sampling.
Successive samples should simply be cumulated to get the best estimate
of the population proportion.

Case (2) applies when each sampling is actually a "campaign" of sorts
- you send out n mailings that solicit some action and the response
rate is not something thats objectively out there independent of your
measurement. But if all "campaigns" are not too dissimilar from each
other, then past response rates can be used as a guide as to what to
expect. In this case there are two sources of variation - which past
campaign your current campaign is most similar to and secondarily, the
normal sampling variation from finite sampling.

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.