
Re: Combinations with variable probability?
Posted:
Jan 19, 2009 6:09 AM


On 17 Jan, 19:34, Matt <matt271829n...@yahoo.co.uk> wrote: > On Jan 17, 6:40 am, petertwocakes <petertwoca...@googlemail.com> > wrote: > > > > > On 16 Jan, 21:47, Matt <matt271829n...@yahoo.co.uk> wrote: > > > > On Jan 16, 7:47 pm, petertwocakes <petertwoca...@googlemail.com> > > > wrote: > > > > > On 16 Jan, 19:00, Matt <matt271829n...@yahoo.co.uk> wrote: > > > > > > On Jan 16, 5:51?pm, petertwocakes <petertwoca...@googlemail.com> > > > > > wrote: > > > > > > > A population of N=1000 listen to a particular radio show for one > > > > > > month. Each person listens for a different amount of time, t, > > > > > > expressed as a proportion of the total time the show occupied over the > > > > > > month from, and the times are normally distributed between 0.0 to 1.0. > > > > > > This is not possible. The normal distribution always ranges from oo > > > > > to +oo. You may have in mind some sort of "truncated" normal > > > > > distribution, or you may need to come up with a different model for > > > > > the listening times. > > > > > > > Each person can tell us exactly how much time, t, ?they listened for. > > > > > > > The show's output comprises a playlist of 100 songs, all played a > > > > > > different amount of times, but we know the proportion of airtime, a, > > > > > > occupied by each song. "Song A" ?accounted for a = 0.15 total air > > > > > > time. > > > > > > > Given ?a sample of n= 20 people at random, ?can we estimate the > > > > > > probability that exactly k of them heard the song? > > > > > > You need more information. For starters, what does "heard" mean? Heard > > > > > the whole song? Heard any part of it? > > > > > > > Is this even possible without exhaustively calculating p for each > > > > > > possible combination? > > > > > > > If instead we ask a random sample of 20 people if they heard the song, > > > > > > and 5 of them have, what is the probability that that particular > > > > > > outcome occured, given that we know a =0.15, and the value of t for > > > > > > each person. > > > > > > > Although at first it looks like a variation on the hypergeometric > > > > > > distribution, I'm guessing that's no use because of the variable > > > > > > probabilities? > > > > > > > What sort of things should I be studying to figure this one out? > > > > > Yes, approximately normal, truncated at zero and 1.0, with a mean of > > > > 0.5, SD ?0.34 > > > > "Heard" means heard any part of it. There is no unmentioned bias in > > > > any of the conditions > > > > If there's any other missing information, please infer ideal default > > > > values that would make either part solveable. > > > > Well, the next thing to figure out is what assumptions you want to > > > make about the fragmentation of the total song time and total > > > listening time. Just knowing the total proportions isn't enough. For > > > example, all other things being equal, a short song played frequently > > > is likely to be heard by more people tuning in (for just some of the > > > show) than a long song played less frequently. Similarly, all other > > > things being equal, someone tuning in often for short periods is more > > > likely to hear a given song than someone tuning in less often for > > > longer periods. > > > I would like to return to that, but before I can I need to find an > > even simpler representation because the amount of detail seems to much > > for a general class of problems where instead of binary success/ > > failure we have a known probability of success. > > > Given a population where each member has an individual known > > probability p(event) that a certain event has/hasn't happened to them, > > can we either: > > (a) take a sample of n of them and estimate the probability of k > > successes > > (b) take a real sample of n of them,determine the true value of k, and > > calculate what the probability was that this outcome occurred > > > I'm assuming (b) is easier because we know p(event) for each of the > > actual members. > > > (I'm worried about inventing examples, in case I pick one that can't > > work without detail, but anyway let's say, beating a score of 72 on a > > golf course, for which we assume the probability is purely a function > > of their pasthistory of 100 games. 0.0 = never beaten that score, 1.0 > > = always beaten that score, mean =0.5, and the distribution is, ahem, > > bellshaped.[ From my other post I thought 'Has rented DVD x' might > > work as an example too, where we pretend p is a simple function of > > that movie's popularity, and the number or rentals that person has > > made?] ) > > If you have a specific sample, with known success probabilities p_1, > p_2, p_3, ... p_n, then you can calculate Pr(n, k), the probability of > exactly k successes in the n, recursively using the recipe > > Pr(i, j) = Pr(i  1, j  1)*p_i + Pr(i  1, j)*(1  p_i) > > starting with Pr(0, 0) = 1. One 2D recursion will give you the answer > for all the k's. > > I'm not certain if this is the most efficient way of doing it  there > might be a quicker method. > > Another approach is to forget about individual known probabilities and > the "without replacement" condition, and just assume that the > probability of success, p, for any individual is (independently) > anything between 0 and 1, where p itself has a known pdf, say f(p) > (for example, the, ahem, truncated normal distribution). This > "probability of a probability" resolves to a fixed probability of > success, E(p) (the expected value of p), where > > E(p) = Int_0^1 p*f(p) dp > > (i.e., the integral from 0 to 1 of p*f(p) w.r.t. p). > > Then you can just use the binomial distribution with E(p). > > This will give you "averaged" results  it won't give you the exact > answer for a given specific sample, but if the sample is of reasonable > size then we can expect it will conform to the pdf with reasonable > accuracy.
OK, in trying to learn enough to understand what you and Ray have advised, I've come across some things called the multinomial distribution and the betabinomial distribution (and the ordinary beta distribution), both of which seem to be related to this. I'll know more when I've coded your recursive formula to see how it compares to some real data.
The contents of my skull are just a spinning hourglass at the moment!
Thanks
Steve

