Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: porting Python fn to MATLAB
Replies: 1   Last Post: Jan 22, 2013 8:02 AM

 Messages: [ Previous | Next ]
 kj Posts: 197 Registered: 8/17/09
porting Python fn to MATLAB
Posted: Jan 22, 2013 7:16 AM

I have a Python function, called "collect", that basically groups
records according to some user-specified criterion, computes some
user-specified aggregate function of each of these groups, and returns
the sequence of these aggregate values.

I would like to port this "collect" function to "idiomatic" MATLAB.

An important aspect of "collect" is that it operates on sequences
of *heterogeneous* tuples, aka "records", containing both string
and numeric fields. (The number of fields and their datatypes are
constant across all records.) This means that the MATLAB version
of "collect" needs to operate either on arrays of <1xN cell>s or
on an <MxN cell>. (I'm agnostic on which of these two alternatives
would be preferable, and open to suggestions.)

In slightly more detail, the Python version of "collect" takes as
arguments a sequence of records, and two functions, called "key"
and "aggregate", respectively. The "key" function, as the name
suggests, computes a key value for each record. The records are
then grouped into the "equivalence classes" induced by this "key"
function. (IOW, two records r and r' are in the same equivalence
class if and only if key(r) == key(r').)

(By far, the most common type of "key" function is one that extracts
from the record the value of some specific *string* field, possibly
normalizing it in some way, e.g. converting it to upper case,
removing leading and trailing white space, etc.)

Once all the equivalence classes are populated, the value that gets
returned by "collect" is the sequence obtained by applying the
"aggregate" function to each of these equivalence classes.

One final detail is that the order in which records appear in the
input determines both the ordering of the records in each individual
equivalence class as well as the ordering of the set of all the
equivalence classes. This ordering, in turn, is preserved in the
value returned by the "collect" function. (The "aggregate" function,
also, may conceivably make use of the ordering of the records in each
equivalence class, although in practice I have not run into this
use-case yet.)

The question is how to implement something like "collect" in MATLAB.

My Python implementation (see below) uses a hash to implement the
equivalence classes. I know that MATLAB also has support for hash
tables, but I don't know if, in the case of MATLAB, a hash table is
the best approach.

(NB: Python's syntax and standard data structures happen to be
particularly well optimized for this sort of function. The MATLAB
version of this function need not be equally succinct, and that's
perfectly OK.)

Any suggestions would be much appreciated.

--------------------------------------------------------------------

(Unless the description above was not sufficiently clear, there's no
need to read the rest of this post.)

PS: FWIW, here's the implementation of the function in Python

import collections as co

def collect(records, key, aggregate):

coll = co.OrderedDict()
for rec in records:
coll.setdefault(key(rec), []).append(rec)

return tuple(aggregate(accum) for accum in coll.values())

There are only a couple of non-obvious lines in this code. One of
them is the line

coll.setdefault(key(rec), []).append(rec)

Here, the possibly obscure bit is the subexpression

coll.setdefault(key(rec), [])

It returns the element coll[key(rec)]; if this element does not
already exist in the coll dictionary, it gets initialized to the empty
list ([]).

The other somewhat strange-looking bit is the expression

tuple(aggregate(accum) for accum in coll.values())

It represents the tuple whose elements correspond to applying the
aggregate function to each value of the coll dictionary. (Since the
coll dict is an OrderedDict, these values are visited in the order of
"first appearance" of their corresponding keys.)

And here is an example of the function above in action:

data = (
('C', 70), ('I', 42), ('m', 29), ('M', 69), ('A', 62), ('a', 15),
('A', 61), ('A', 67), ('i', 64), ('h', 71), ('H', 40), ('u', 68),
('U', 60), ('u', 70), ('O', 26), ('H', 20), ('c', 20), ('T', 57),
('T', 69), ('t', 69), ('t', 69), ('T', 60), ('s', 70), ('S', 69),
('s', 4), ('h', 11), ('a', 34)
)

for r in collect(data,
key=lambda r: r[0].upper(),
aggregate=lambda recs: (recs[0][0],
sum(r[1] for r in recs))):
print r

In this example, the input records are grouped case-insensitively by
their first element, and aggregated by summing together all the second
elements in each group.

Thus, the output of the above is:

('C', 90)
('I', 106)
('m', 98)
('A', 239)
('h', 142)
('u', 198)
('O', 26)
('T', 324)
('s', 143)

Here, the key function extracts the string stored as the first element
(r[0]) of each record r and returns its uppercase transform
(r[0].upper()).

The aggregate function returns a new "summary" record whose first
element is the same as the first element of the first record in its
argument, and whose second element is the sum of the second elements
of all the records in its argument.

Date Subject Author
1/22/13 kj
1/22/13 Bruno Luong