
NEURAL NET DATA DIVISION BUGS
Posted:
Mar 1, 2013 1:31 AM


I tried to design a timeseries neural net without a validation set and a straightforward 80/0/20 per cent ratio for the train/val/test data division split. The corresponding indices for a timeseries data set of size 100 should be
[ trainInd, valInd, testInd] = [ 1:80 , [] , 81:100 ]
I started with the most obvious command
[ trainInd, valInd, testInd] = divideblock(100, 0.8, 0.0, 0.2);
which bombed. I then tried the following variations. The resulting data splits are given:
[trainInd,valInd,testInd] = divideblock( 100) % 70/15/15
[trainInd,valInd,testInd] = divideblock( 100, 0.8) % 74/13/13
[trainInd,valInd,testInd] = divideblock( 100, 0.8, 0.0 ) % ERROR
[trainInd,valInd,testInd] = divideblock( 100, 0.8, 0.0, 0.2) % ERROR
[trainInd,valInd,testInd] = divideblock(100, 0.8, [], 0.2) %ERROR
Curious, I tried the same inputs on the other divide functions. However
1. 'divideind' never worked 2. 'dividetrain' cannot produce val or test indices 3. 'dividerand' does not produce sequential indices 4. 'divideint' produced the following results
[trainInd,valInd,testInd] = divideint( 100) % 70/15/15 [trainInd,valInd,testInd] = divideint( 100, 0.8) % 72/14/14 [trainInd,valInd,testInd] = divideint( 100, 0.8, 0.0 ) % 87/0/13 [trainInd,valInd,testInd] = divideint( 100, 0.8, 0.0, 0.2) % 87/0/13 [trainInd,valInd,testInd] = divideint(100, 0.8, [], 0.2) %ERROR
However, none of these resulted in sequential indices.
It is obvious that these functions need to be debugged.
Meanwhile, what is the best way to impose the 80/0/20 sequential index split?
My guess at a quick fudge is to use 'divideblock' with a 79/1/20 or 80/1/19 split and
net.trainParam.max_fail = net.trainParam.epochs % or just a large No.
Hope this helps.
Greg

