Let me encourage you to ask me directly, when I post something confusing.
> Is there a good documentation on JIT? Will there be? No, unfortunately there is no documentation and TMW stated repeatedly that this is intented. They do not want users to adjust code to a certain JIT versions, because it is subject to changes. For users like me, who do not install any new release for reasons of reliability and stability, a version depending adjustment would not be a drawback, such that I'm rather disappointed about the missing documentation. The optimizer of the MSVC compilers is not documented exhaustively also and subject to changes, too. Sometimes: for (i=0; i<n; i++) x[i]=... is faster than for (i=0; i<n; i++) *x++=... but sometimes it is the other way around. Sometimes computations with doubles are performed in the floating point unit (with correct handling of x<NaN), sometimes in the SSE units (with wrong handling of comparisons with NaN, although the IEEE754 standard defines them unequivocally). But inspite of this incomplete documentation or changing optimizer, it would be hilarious not to try to use all available information about the optimizer to write faster code.
The expression "x(z,i,t)" calculates the required element by: X + (z + size(x, 1) * i + size(x, 1) * size(x, 2) * t) (I do not care about 1- or 0-based indexing here). These are 3 multiplications, instead of two as I've claimed in the linked discussion. Calculating "i+z+t+2" on the fly instead would be much cheaper than creating a large array with such redundant values.
Bruno has explained important facts about the JIT already. 1. The JIT operates on functions, but not (or not completely) on scripts and in the command line. 2. The JIT need the possibility to reorder the code on demand. Therefore it cannot work, when several commands appear in one line. Also for debugging and profiling a line by line access must be possible and therefore the JIT is disabled. (This is really crude, e.g. finding JIT bugs by the debugger is not possible and time-measurements with the profiler are meaningless, if the powerful tool for the acceleration is disabled). 3. One method of the JIT is that the used variables are used "directly". This means, that Matlab stores the pointer to the data and does not have to look it up in the symbol tables again in each iteration. This fails, if the type of a variable is changed inside a function or variables are created or modified dynamically by EVAL, ASSIGNIN and similar "evil" methods. I guess that this is the reason why using "sum" in the command line decreases the speed, while in a function the JIT considers, that the symbol "sum" means the variable. (Anyhow, avoiding overloading built-in functions is a good programming practice)
But to come back to your original question: I do not understand, why accumulating "sum+thing" inside a FOR loop is so much slower in Matlab. I will perform some experiments later, but I think the cause is, that this example simply does not match the power of Matlab. It's like driving a mountain bike in a street race, dancing with sport shoes or starting a computer only to calculate 17*4 in Excel. Another example is a platform independent code in C++ to simulate the Matlab command "plot(1:10)" - under Win3.1/95/98/2k/NT/ME/XP/Vista/7 and 8 as well as Linux, MacOS on PPC and Intel, Solaris Sparc, ...