Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Problem with 1-step ahead prediction in neural network
Replies: 9   Last Post: Oct 23, 2013 6:23 AM

 Messages: [ Previous | Next ]
 Greg Heath Posts: 6,387 Registered: 12/7/04
Re: Problem with 1-step ahead prediction in neural network
Posted: Oct 20, 2013 4:22 PM

"phuong" wrote in message <l3vcom\$ecd\$1@newscl01ah.mathworks.com>...
> "Greg Heath" <heath@alumni.brown.edu> wrote in message <l3v4ci\$r88\$1@newscl01ah.mathworks.com>...
> > "phuong" wrote in message <l3uis0\$mut\$1@newscl01ah.mathworks.com>...
> > > "Greg Heath" <heath@alumni.brown.edu> wrote in message <l3t37j\$bkp\$1@newscl01ah.mathworks.com>...
> > > > "phuong" wrote in message <l3s2ha\$4qv\$1@newscl01ah.mathworks.com>...
> > > > > Hi everybody,
> > > > > I having a trouble with 1-step ahead of neural.
> > > > > When I train network with fix parameter, I received another weight (IW,LW,b).
> > > > > I know the reason is random intial weights. But why can we believe the predict result in 1-step if it alway changes for every train. May be the network not convergence. Because when it convergence, we just have only solution( or approximate solution). So is the network convergence?
> > > > > All of things make the test result for 100 new predicted by neural network have many results, and some times different between so large.
> > > > > Thank you very much.
> > > > > Phuong

> > > >
> > > > The only problem is your assumption that there is only one solution. For any I-H-O network configuration with tansig hidden nodes there are (2^H)*H!-1 other nets that are equivalent. For the default value of H=10, there are (2^10)*factorial(10) = 3,715,891,200
> > > > equivalent nets.
> > > > 1. There are H! equivalent nets that only differ by the way they are ordered.
> > > > 2. Since tansig is an odd function, for each of those orderings there are two equivalent
> > > > nets that only differ by the polarity of the weights connected to one of the H hidden nodes.
> > > >
> > > > To make things worse, there can be local minima that are not global minima. The corresponding solutions range from excellent to very poor. Finally, there are other reasons
> > > > (e.g., maximum mu in trainlm) that minimization searches fail.
> > > >
> > > > That is why I now use Ntrials = max(10,30/Ntst) random weight initializations for each candidate value of H.
> > > >
> > > > Hope this helps.
> > > >
> > > > Greg

> > > Sorry, I don't understand your way. As i understand, you will train network Ntrials times, ok? And what is the next, compute the mean of result or what? Please help me more detail.
> > > One more, i agree we have see H! net but i think the weight set is the same just change the oders, right? and If this idea right, i think the mse not change.

> >
> > There are several main points
> >
> > 1. No I-H-O net with H > 1 tansig hidden units is unique. Any such I-H-O net will have the same input-output function as, at least, the other 2^H*(H!-1) equivalent nets.
> > 2. Therefore, given a set of design data, there is no set of weights that is "the" optimum solution.
> > 3. Given a trial value for H and a random set of initial weights, there is no guarantee that the subsequent training will optimize the training objective function. Even if H is acceptable, the training may converge to a local non-global minimum resulting in a range of results from excellent to very poor. In addition, the training may be aborted because the maximum mu limit or maximum epoch is reached.
> > 4. Therefore, to have a high probability of obtaining an acceptable solution, my recommendation is to design Ntrials nets for each candidate value of H.
> > 5. What you do with the Ntrials*numH resulting designs depends on your personal
> > goal.
> > a. The acceptable solution (e.g., R2a >= 0.99) with the smallest H?
> > b. The 10 best solutions to combine in an ensemble or committee net ?
> > c. Statistical characterization (e.g., min, med, mean, stdv, max ) of performance
> > estimates on unseen data ?
> >
> > Bottom line: There is absolutely no reason why you should expect acceptable designs to
> > have similar final weight distributions.
> >
> > HTH
> >
> > Greg

> After Ntrails, we have an acceptable H (e.g, we get with R2a>0.99), why i know the 1-step prediction is right or wrong(may be it is local minimum, not global with any random intialize weight).
> If we just only a set of weight( just change the oder), so I think the mse of network is not change. But in real, I see it have different mse value every time I retrain. With some small data(e.g, range in [-1,1]), the different is small, but with large data( e.g, greater than 100) the different is so large.
> If I use a fix intialize weigh for the network like that:
> rng(0);
> IW = 0.01*randn(hidden,delay);
> b1 = 0.01*randn(hidden,1);
> LW = 0.01*randn(1,hidden);
> b2 = 0.01*randn(1,1);
> net = configure(net,inputs,targets);
> net.IW{1,1} = IW;
> net.b{1,1} = b1;
> net.LW{2,1} = LW;
> net.b{2,1} = b2;
> Is it ok for my network? May be it reach local minimum. and can we have some methods to avoid?

Avoid what? I really do not understand your problem.

I have posted many design examples over the years. What problem do you have with
the approach? What do you want that is not available via the Ntrials*numH tabulation?

Very puzzled,

Greg

Date Subject Author
10/18/13 phuong
10/19/13 Greg Heath
10/19/13 phuong
10/19/13 Greg Heath
10/19/13 phuong
10/20/13 Greg Heath
10/21/13 phuong
10/21/13 Greg Heath
10/23/13 phuong
10/23/13 Greg Heath