Date: Feb 27, 2013 8:04 AM
Author: Outspan
Subject: Standard deviation over observed sample
First of all, apologies for errors in my terminology, it's been a while.

I play online poker with the help of a HUD (heads-up display) that collects cumulative statistics on how the opponent plays. For instance, the HUD might tell me about what percentage of hands the opponent raised first-in. I played a bit with Matlab to figure out what the error margin on those statistics could be, based on the number of samples that I have.

I wrote a very simple Monte Carlo simulation. In the figure below, the true value of the statistic (eg, the value which would be observed if the opponent played an infinite number of hands) is 0.09. I generated 10k sequences of 100 samples each, then computed the standard deviation for each of the 10k data points-long vectors, and plotted them below.

http://img198.imageshack.us/img198/6399/variance2.jpg

This tells me that, given that the true value for a statistic is 0.09, the observed statistic will have approx. a 68% chance of lying between the two red lines (1-sigma) and approx. a 95% chance of lying between the two green lines (2-sigma). Is this correct?

This visualization is useful in some respects, however I then realized that it wasn't what I actually needed in the first place. What I need to know is to answer this scenario: given that the observed value of the statistic is, say, 0.09 after n samples, then what are the one-sigma and two-sigma confidence levels for the statistic? In other words, how do I calculate/compute the intervals that I am, respectively, 68% and 95% confident that the true value lies in?

I believe (correct me if I'm wrong) that the above graph cannot be used to answer this question at all. So how do I solve/compute this? Is this a trivial problem? Can I solve this analytically or do I have to use a computer simulation?

Thank you