Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Replacing gaps with NaN
Replies: 2   Last Post: Apr 19, 2013 6:31 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
dpb

Posts: 8,168
Registered: 6/7/07
Re: Replacing gaps with NaN
Posted: Apr 19, 2013 2:39 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On 4/19/2013 10:34 AM, Pawel wrote:
> My data looks like this:
...

> Is it for 62 days (2012-12-01 to 2013-01-31) for 24 hours/day and should
> be every 1 hour. But there are always some gaps in time series (ex.
> between 2 and 4hr), so I'd like to replace them by NaN's. My idea was to
> convert dates to single number (datenum) so I probably need to erase all
> '-' and ':' in date column to get this
>
> 12105 2012 12 01 00 00 00.000 2,2 87 1029,9
> 12105 2012 12 01 01 00 00.000 1,3 88 1029,8
>
> and finally create a matrix like below, with NaN's for every hour:
>
> 12105 735204 NaN NaN NaN
> 12105 735204.041666667 NaN NaN NaN
> 12105 735204.083333333 NaN NaN NaN
> 12105 735204.125 NaN NaN NaN
>
> Now my problem. Is it possible to update somehow this matrix with NaN's
> with data from first matrix to get something like this:
>
> ID Date Temperature Humidity Pressure
> 12100 735204 0,3 96 1029,1
> 12100 735204.041666667 0,9 98 1029,3
> 12100 735204.083333333 1,2 91 1029,2
> 12100 735204.125 NaN NaN NaN
> 12100 735204.20833333 2,8 80 1029,6
>
> and so on for every day and every ID? Or maybe you have different and
> better idea. It will be helpful for me because data file has many
> thousands of line and I don't really want to manually find these gaps.


A few ideas...flesh out and simplify as suits...

Don't need to do anything w/ the dates but use the proper formatting in
datenum...I made a file that holds just the first section...

>> type papkin.dat

ID Date Temperature Humidity Pressure
12100 2012-12-01 00:00:00.000 0.3 96 1029.1
12100 2012-12-01 01:00:00.000 0.9 98 1029.3
12100 2012-12-01 02:00:00.000 1.2 91 1029.2
12100 2012-12-01 04:00:00.000 2.8 80 1029.6

Read it in...

>> fid=fopen('papkin.dat','rt');
>> fmt=['%d %s %s' repmat('%f',1,3)];
>> d=textscan(fid,fmt,'headerlines',1,'collectoutput',1);
>> fid=fclose(fid);


Convert to datenum the datestamp strings..
>> dn=datenum([char(d{2}{:,1}) char(d{2}{:,2})], ...
'yyyy-mm-ddHH:MM:SS.FFF');
Make a vector of hourly datenums for insertion purposes...
>> dhr=datenum(2012,12,1,[0:4]',0,0);
And the data array to fill...
>> data=nan(length(dhr),5);
>> data(ismember(dhr,dn),1)=d{1}; %The ID column
>> data(:,2)=dhr; % stuck in datenum; use timestamp instead

And now paste on the floating point array at the end...
>> data(ismember(dhr,dn),3:end)=d{3}(1:4,:)
data =
1.0e+05 *
0.1210 7.3520 0.0000 0.0010 0.0103
0.1210 7.3520 0.0000 0.0010 0.0103
0.1210 7.3520 0.0000 0.0009 0.0103
NaN 7.3520 NaN NaN NaN
0.1210 7.3520 0.0000 0.0008 0.0103
>>

That's in a bunch of steps to give you ideas/practice in manipulating.
It could be done in just a couple of steps; the question I didn't think
too much about is the multiple ID sections.

The only real "trick" in any of the above is the use of ISMEMBER to get
the locations of the right rows that match the dates w/ the logical
addressing vector--

--




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.