
Re: manipulate data to better fit a Gaussian Distribution
Posted:
Mar 19, 2013 10:48 AM


"Torsten" wrote in message <ki9mme$t77$1@newscl01ah.mathworks.com>... > "Torsten" wrote in message <ki9i7v$gjp$1@newscl01ah.mathworks.com>... > > "Francesco Perrone" <francesco86perrone@yahoo.it> wrote in message <ki9fsv$abp$1@newscl01ah.mathworks.com>... > > > "Torsten" wrote in message <ki9fdu$91c$1@newscl01ah.mathworks.com>... > > > > "Francesco Perrone" <francesco86perrone@yahoo.it> wrote in message <ki9dra$56c$1@newscl01ah.mathworks.com>... > > > > > Hi all, > > > > > > > > > > I have got a question concerning normal distribution (with mu = 0 and sigma = 1). > > > > > > > > > > Let say that I firstly call randn or normrnd this way > > > > > > > > > > x = normrnd(0,1,[4096,1]); % x = randn(4096,1) > > > > > > > > > > Now, to assess how good x values fit the normal distribution, I call > > > > > > > > > > [a,b] = normfit(x); > > > > > > > > > > and to have a graphical support > > > > > > > > > > histfit(x) > > > > > > > > > > Now come to the core of the question: if I am not satisfied enough on how x fits the given normal distribution, how can I optimize x in order to better fit the expected normal distribution with 0 mean and 1 standard deviation?? Sometimes because of the few representation values (i.e. 4096 in this case), x fits really poorly the expected Gaussian, so that I wanna manipulate x (linearly or not, it does not really matter at this stage) in order to get a better fitness. > > > > > > > > > > I'd like remarking that I have access to the statistical toolbox. > > > > > > > > > > I thank you all in advance. > > > > > > > > Increase the number of sampling points (4096 in your example) > > > > or > > > > try another random number generator for a normally distributed random variable. > > > > > > > > Best wishes > > > > Torsten. > > > > > > It's quite a simplistic method. > > > > > > Unfortunately, I cannot magnify the number of representations because of some reasons I will not explain here in detail (theory beyond the code I am writing). Besides, what else random generator may I use? > > > > > > I do believe that is a way to "force" data better fitting the expected normal distribution. > > > > > > > I'm not an expert in this area, but in my opinion, every deterministic attempt to manipulate the data after their generation will weaken their randomness. > > A random number generator always makes a compromise between performance > > and quality. If speed is not important for your application, there should be random number generators with higher quality than randn. Make a GOOGLE search. > > > > > Regards, > > > Francesco > > > > Best wishes > > Torsten. > > Of course, if randomness of the numbers chosen does not matter, > you can proceed as follows: > > 1. Choose an equidistant grid on [0:1] (e.g. p=[1/4 1/2 3/4]). > 2. Calculate X=norminv(p,0,1) > 3. Between X(i) and X(i+1), place 4096/(n1) equidistant points where n is the length of the vector p (in this case n=3). > 4. The collection of all these points will approximately follow a standardnormal distribution. > > Best wishes > Torsten.
I would more go for a leastsquares fitting, but I don't really have a clue how to setup it within MATLAB.
Generally, I would call a reference random distribution fitting the expected normal distribution quite reliably:
x_ref = normrnd(0,1,[400000 1]);
Then, the standard data I have
x_act = normrnd(0,1,[4000 1]);
Once these two vectors are generated, I would call lsqcurvefit to minimize the error between the reference and actual values. But I am stuck on how to implement it correctly.

