Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: forget 'strtok' how about 'strtoks'?
Replies: 3   Last Post: Jun 18, 2013 5:27 PM

 Messages: [ Previous | Next ]
 Leo Simon Posts: 10 Registered: 5/23/10
Re: forget 'strtok' how about 'strtoks'?
Posted: Jun 18, 2013 4:11 PM

Art's code seemed to work for me only when I included just one argument and used the default value of the second argument delim. I didn't really understand the structure of the code, but made it work with an arbitrary delim. For example strtok_apl(pwd,'/') now returns each element of the path as an element of a cell array

function toks=strtok_apl(str,delim)
%toks=strtok_apl(str,delim)
%
%to extract all tokens from a string within a human lifespan
%replacement to my STRTOKS which iteratively calls the iterative STRTOK
%
%str: a string to extract tokens from
%delim: an array of delimiters (char or byte), defaults to STRTOK defaults
%
%test cases: s={'',' ','a','a ',' a ','a b','a b c',' a b c '}
% for i=1:length(s),toks=strtok_apl(s{i}),end
%
%A.R. Croucher 8/00 - Johns Hopkins University Applied Physics Laboratory
%Modified by Leo Simon, 6/13, U.C. Berkeley
%check for scalar string
if ~ischar(str);
fprintf(1,'string must be character array\n');
end
if isempty(str)
toks={};
return
end
strsiz=size(str);
if sum(strsiz>1)>1
fprintf(1,'String argument must be scalar \n');
end;
%if no delimiters are passed, use the default (as STRTOK)
if nargin<2
delim=[9:13 32];
else
if strcmp(class(delim),'char');
delim = double(delim);
end
delim=[9 delim];
if sum(size(delim)>1)>1
fprintf(1,'Delimiter array must be scalar \n');
return
end
end
delsiz=size(delim);
%repmat to make str and delim 2D and same size
if strsiz(1)>1,s=str';else;s=str;end
if delsiz(1)>1,d=delim;else;d=delim';end
ns=length(s);
nd=length(d);
sn=double(s);
s=sn(ones(1,nd),:);
d=d(:,ones(ns,1));
%find the non-delimiter characters in s
good=all(s~=d); %1 if a good char, 0 if a delimiter
%need to find the start chars and stop chars of each token
%calc diff(good), +1 transitions are starts, -1 are stops
dif=diff(good);
start=find(dif==1)+1;
stop=find(dif==-1);
%need to set start/end states (is first char a delim or token);
%the first state transition MUST be +1 and the last MUST be -1
%if this is not so, then the string must have begun/ended on a token
if good(1)==1, start=[1 start];end
if good(end)==1 stop=[stop length(good)];end
%extracct the tokens
ntoks=length(start);
if ntoks==0
toks={};
else
toks=cell(ntoks,1);
for i=1:length(start),toks{i}=str(start(i):stop(i));end
end

"Art Croucher" <art.croucher@jhuapl.edu> wrote in message <8v3e93\$a49\$1@houston.jhuapl.edu>...
> I wrote the attached function to do this. The two design objectives were to
> clean up the code where I had to put loops, and to speed up the process.
> STRTOK is not exactly vectorized.
>
> It runs about 8x faster than STRTOK in a loop. If anybody can vectorize the
> assignment at the end, I'd appreciate it.
>
> ----------------------------------------------------------------------------
> -----
>
> function toks=strtok_apl(str,delim)
> %toks=strtok_apl(str,delim)
> %
> %to extract all tokens from a string within a human lifespan
> %replacement to my STRTOKS which iteratively calls the iterative STRTOK
> %
> %str: a string to extract tokens from
> %delim: an array of delimiters (char or byte), defaults to STRTOK defaults
> %
> %test cases: s={'',' ','a','a ',' a ','a b','a b c',' a b c '}
> % for i=1:length(s),toks=strtok_apl(s{i}),end
> %
> %A.R. Croucher 8/00 - Johns Hopkins University Applied Physics Laboratory
>
> %check for scalar string
>
> if ~ischar(str);
> fprintf(1,'string must be character array\n');
> end
>
> if isempty(str)
> toks={};
> return
> end
>
> strsiz=size(str);
> if sum(strsiz>1)>1
> fprintf(1,'String argument must be scalar \n');
> return
> end
>
> %if no delimiters are passed, use the default (as STRTOK)
>
> if nargin<2
> delim=[9:13 32];
> else
> if sum(size(delim)>1)>1
> fprintf(1,'Delimiter array must be scalar \n');
> return
> end
> end
> delsiz=size(delim);
>
> %repmat to make str and delim 2D and same size
>
> if strsiz(1)>1,s=str';else;s=str;end
> if delsiz(1)>1,d=delim;else;d=delim';end
> ns=length(s);
> nd=length(d);
>
> %Tony's trick instead of repmat(s,nd,1)...
> sn=double(s);
> s=sn(ones(1,nd),:);
> d=d(:,ones(ns,1));
>
> %find the non-delimiter characters in s
>
> good=all(s~=d); %1 if a good char, 0 if a delimiter
>
> %need to find the start chars and stop chars of each token
> %calc diff(good), +1 transitions are starts, -1 are stops
>
> dif=diff(good);
> start=find(dif==1)+1;
> stop=find(dif==-1);
>
> %need to set start/end states (is first char a delim or token);
> %the first state transition MUST be +1 and the last MUST be -1
> %if this is not so, then the string must have begun/ended on a token
>
> if good(1)==1, start=[1 start];end
> if good(end)==1 stop=[stop length(good)];end
>
> %extract the tokens
>
> ntoks=length(start);
> if ntoks==0
> toks={};
> else
> toks=cell(ntoks,1);
> for i=1:length(start),toks{i}=str(start(i):stop(i));end
> end
>
> return
>
>
> <wchall01@my-deja.com> wrote in message news:8v1os4\$79v\$1@nnrp1.deja.com...

> >
> >
> > any body know a faster way to get all the string tokens in a string
> > other than using strtok in a while loop? currently i use:
> >
> > <--- code --->
> >
> > function [tok] = strtoks(instr)
> > idx = 1;
> > while( ~isempty(strtok(instr)) )
> > [tok(idx).token,instr] = strtok(instr,' ');
> > idx = idx + 1;
> > end;
> >
> > <--- end code --->
> >
> > also is there a more 'universal' or 'standard' way to return the
> > multiple tokens instead of the tok.token structure? are cell arrays
> > multi-dim string arrays more commonly used or typical?
> >
> >
> > Sent via Deja.com http://www.deja.com/