I'm doing some work with EEGLAB where I would like implement a Wavelet transform on CUDA. One of the concerns we have with memory allocation. The input and output to the function is large, so large in fact, that we would like to not have to create new output matrixes for each call. We would like to preallocate the output array and then repeatedly use it on subsequent calls. I have been looking around and have been unable to locate any documentation on how this is possible or why it isn't.