"David Young" wrote in message <email@example.com>... > Timing shows that this > > fid = fopen('bigfile', 'r', 'b'); % reading in big-endian mode > d1 = fread(fid, inf, '*uint32'); > fclose(fid); > > is much slower than this > > fid = fopen('bigfile', 'r', 'l'); % reading in little-endian mode > d2 = swapbytes(fread(fid, inf, '*uint32')); % then swap afterwards > fclose(fid); > > typically by a factor of 2 for a 0.5 GB file (taking minimum times over a number of trials). The effect seems to be even bigger for bigger files. The results d1 and d2 are, of course, identical. I'm using R2013b under 64-bit Windows. > > In practical terms, there's no problem: I just use the second method. (That's why I'm writing here, rather than on Answers.) But I'm puzzled as to how something as basic as fread can end up being less efficient than a call to another function, with the overhead of the intermediate array that that entails. > > I wondered if anyone can shed any light on this? If anyone from The MathWorks is watching, could you comment? Is it worth putting in a bug report?
I believe that this is not a bug, but an expected outcome: You are most probably running on a little-endian platform. In this case, running the vectorized swapbytes() function on the result is faster than using the non-default fread() endianness, which in most likelihood does the same thing numerous times internally. fread is a built-in function implemented in C, so swapping individual bytes in its main loop is very fast. But even so, a multi-threaded vectorized swapbytes() that is performed just once on the returned data, en-block, is more efficient than the single-threaded fread() loop.