phuong
Posts:
20
Registered:
8/23/13


Re: Problem with 1step ahead prediction in neural network
Posted:
Oct 21, 2013 9:09 AM


"Greg Heath" <heath@alumni.brown.edu> wrote in message <l41e1f$p3v$1@newscl01ah.mathworks.com>... > "phuong" wrote in message <l3vcom$ecd$1@newscl01ah.mathworks.com>... > > "Greg Heath" <heath@alumni.brown.edu> wrote in message <l3v4ci$r88$1@newscl01ah.mathworks.com>... > > > "phuong" wrote in message <l3uis0$mut$1@newscl01ah.mathworks.com>... > > > > "Greg Heath" <heath@alumni.brown.edu> wrote in message <l3t37j$bkp$1@newscl01ah.mathworks.com>... > > > > > "phuong" wrote in message <l3s2ha$4qv$1@newscl01ah.mathworks.com>... > > > > > > Hi everybody, > > > > > > I having a trouble with 1step ahead of neural. > > > > > > When I train network with fix parameter, I received another weight (IW,LW,b). > > > > > > I know the reason is random intial weights. But why can we believe the predict result in 1step if it alway changes for every train. May be the network not convergence. Because when it convergence, we just have only solution( or approximate solution). So is the network convergence? > > > > > > All of things make the test result for 100 new predicted by neural network have many results, and some times different between so large. > > > > > > Please help me fix these problems. > > > > > > Thank you very much. > > > > > > Phuong > > > > > > > > > > The only problem is your assumption that there is only one solution. For any IHO network configuration with tansig hidden nodes there are (2^H)*H!1 other nets that are equivalent. For the default value of H=10, there are (2^10)*factorial(10) = 3,715,891,200 > > > > > equivalent nets. > > > > > 1. There are H! equivalent nets that only differ by the way they are ordered. > > > > > 2. Since tansig is an odd function, for each of those orderings there are two equivalent > > > > > nets that only differ by the polarity of the weights connected to one of the H hidden nodes. > > > > > > > > > > To make things worse, there can be local minima that are not global minima. The corresponding solutions range from excellent to very poor. Finally, there are other reasons > > > > > (e.g., maximum mu in trainlm) that minimization searches fail. > > > > > > > > > > That is why I now use Ntrials = max(10,30/Ntst) random weight initializations for each candidate value of H. > > > > > > > > > > Hope this helps. > > > > > > > > > > Greg > > > > Sorry, I don't understand your way. As i understand, you will train network Ntrials times, ok? And what is the next, compute the mean of result or what? Please help me more detail. > > > > One more, i agree we have see H! net but i think the weight set is the same just change the oders, right? and If this idea right, i think the mse not change. > > > > > > There are several main points > > > > > > 1. No IHO net with H > 1 tansig hidden units is unique. Any such IHO net will have the same inputoutput function as, at least, the other 2^H*(H!1) equivalent nets. > > > 2. Therefore, given a set of design data, there is no set of weights that is "the" optimum solution. > > > 3. Given a trial value for H and a random set of initial weights, there is no guarantee that the subsequent training will optimize the training objective function. Even if H is acceptable, the training may converge to a local nonglobal minimum resulting in a range of results from excellent to very poor. In addition, the training may be aborted because the maximum mu limit or maximum epoch is reached. > > > 4. Therefore, to have a high probability of obtaining an acceptable solution, my recommendation is to design Ntrials nets for each candidate value of H. > > > 5. What you do with the Ntrials*numH resulting designs depends on your personal > > > goal. > > > a. The acceptable solution (e.g., R2a >= 0.99) with the smallest H? > > > b. The 10 best solutions to combine in an ensemble or committee net ? > > > c. Statistical characterization (e.g., min, med, mean, stdv, max ) of performance > > > estimates on unseen data ? > > > > > > Bottom line: There is absolutely no reason why you should expect acceptable designs to > > > have similar final weight distributions. > > > > > > HTH > > > > > > Greg > > After Ntrails, we have an acceptable H (e.g, we get with R2a>0.99), why i know the 1step prediction is right or wrong(may be it is local minimum, not global with any random intialize weight). > > If we just only a set of weight( just change the oder), so I think the mse of network is not change. But in real, I see it have different mse value every time I retrain. With some small data(e.g, range in [1,1]), the different is small, but with large data( e.g, greater than 100) the different is so large. > > If I use a fix intialize weigh for the network like that: > > rng(0); > > IW = 0.01*randn(hidden,delay); > > b1 = 0.01*randn(hidden,1); > > LW = 0.01*randn(1,hidden); > > b2 = 0.01*randn(1,1); > > net = configure(net,inputs,targets); > > net.IW{1,1} = IW; > > net.b{1,1} = b1; > > net.LW{2,1} = LW; > > net.b{2,1} = b2; > > Is it ok for my network? May be it reach local minimum. and can we have some methods to avoid? > > Avoid what? I really do not understand your problem. > > I have posted many design examples over the years. What problem do you have with > the approach? What do you want that is not available via the Ntrials*numH tabulation? > > Very puzzled, > > Greg Sorry if I made ??you uncomfortable. I'm just trying to understand your answer and use it in my problem. I really do not understand and need your help. First of all, I would like to present my understanding about your answer. If it not correct, please explain again for me. 1. only weight set for each neural network, just change the oder of weight to make different network. 2. You use Ntrails to find the acceptable H. If my understand is right, I really wonder about the following: 1. with only weight set, I think we have only mse for network in train. Even it not change in predict on a fix range of time by 1step( apply weight for new input and calculate mse again). 2. Like above, you have Ntrails result of R2a. so what is the best and what is the intialize weight we should use. My goal is find the network with have a stable in mse(minimize) correct trend(up and down to corresponding realdata). Forgive me if I bother you. Please help me. Thank you so much.
Phuong

