Hi everyone, I'm new to this forum, and first of all, I would like to know if there are any rules to follow here, either written or generally known, I don't want to screw up in my first post hehe.
Well now, I'm analyzing the randomness in various pseudo random number generators (From now on, PRNG). I have some knowledge of descriptive statistics, probabilistics and inferential statistics (From Statistics I on college). I also program in C++ and know my way around computers, so I've created a program that analizes some characterstics of the PRNG output, and I compared them to the desired values (the ones between parenthesis). This is what I have so far. Descriptive analysis of 1 byte: - Mean. (127.5) - Standard Deviation. (83.9) - Skewness coefficient. (0) - Mode (Not a significantly defined value). - Number that appeared the least (Not a significantly defined value). - Diference of the two above (0). - Kurtosis. (-1.2) - Entropy (base 2). (8) - Deciles[k = 1 to 10]. (25.5 * k). - Distribution graph. - Distribution of the differences between values (to watch for obvious correlation).
Descriptive analysis of 2 and 3 bytes: - Mean (32767,5 and 8388607,5) - Standard deviation. - Skewness (0 and 0). - Kurtosis (-1.2 and -1.2). - Entropy (8 and 8).
Periodicity analysis. Here I analyze the PRNG output to watch for short periodicity which means that the output resets itself and starts all over again, for example in a LCG the periodcity is defined by the modulus (usually 2^32).
LCG Checking. I implement an algorithm to look for an LCG, they are very insecure for crytographic implementations, since with only three or four values, you can adquire all of the parameters (an lcg is defined by the formula x[n+1] = (x[n]*a+k) mod m).
FIPS-140-2 randomness tests. This is a randomness test that needs a 2500 byte sample, I usually split the main sample in chunks of 2500 bytes and run it on every chunk and then plot it on a graph, with control bars, so I can see where the PRNG fails one of this tests. It consists of four different tests: Monobit Test. Poker Test. Runs Test. Long Run Test.
So anyway, this is what I have so far, but the problem is that none of this tests perform a true analysis on correlation, If I input a simple linear discrete function like this x[n+1] = (x[n] + 1) mod 256, I get a perfect score on every single one of the tests (except on the periodicity analysis, but that can be easily solved). The only way I can detect this is with the differences charts, but if the function is non linear or has some scrambling process, it won't detect it and it still is insecure for cryptographic implementations.
So, in conclusion I need this two things:
1) How to analize the correlation between consecutive and non consecutive values (I was thinking a regression analysis but I'm open to new Ideas).
2) How to implement a control limit for each of the descriptive tests, for example, the mean on a PRNG will never be perfectly 127.5, but I need to know when a deviation from the perfect value is acceptable and when it's not.
Well guys, thanks a lot! Hugs, APOKLIPTICO.
PS: I'm sure some of you might be a bit of a grammar nazi (like me on my mother tongue) but I want you guys to know that I'm from Argentina and as much as I would like my english to be perfect, some times it isn't. Thanks for understanding.