Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.math.independent

Topic: A tradeoff scheme for optimum multiple records hierarchies
Replies: 3   Last Post: Apr 26, 2012 2:08 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Fabrizio J Bonsignore

Posts: 677
Registered: 8/4/10
A tradeoff scheme for optimum multiple records hierarchies
Posted: Apr 22, 2012 11:29 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Assume you have a record oriented database where fields from each
record are to be extracted and processed. Motivation is email records.
For a complex field structure a standard tradeoff between memory and
runtime processing is established. Not all records have to be
retrieved/extracted, which takes time, for a multiplicity of future
processes, but a single hierarchy is desirable. The best embodiment
for the solution is a hierarchy code generator (CG) where a base class
(virtual) interface is generated with get(/set) methods for each field
in the record. The base class is generated from a descriptive file,
possible XML, describing the fields of each record; in this case a
simple text file naming each field. Then the CG goes ahead and
produces hierarchy derivation classe for each combinatorial
possibility.

Say that each record contains fields ABC. The CG reads the descriptive
file and produces base (virtual) class with get(/set) methods for each
A, B, C field, (with additional support methods read from another
file). Then the CG produces classes for each possible combination of
fields to be processed: A, B, C, AB, AC, BC, as derivations from the
base class. For each mentioned field in the derivation, a member for
holding data is generated in the class.

Constructor parsers can be generated to read and _save_ each mentioned
file iton the derived class, or be handcrafted, if a single base class
constructor (parser) extractor cannot be devised. For email the issue
is relatively trivial as per specification a single method can be
devised to accumulate data from each field by simply specifying the
field (header) title.

For the remaining fields, a standard algorithm is optionally devised
to extract in runtime time the fields that were not mentioned in the
class derivation definition. Vg, class AC constructs fields A and C at
construction time, reading from a file for instance, then it includes
a _costlier_ standard method to extract field B from the same source.
Note that construction and runtime extraction algorithms need not be
the same.

Under this scheme a programmer or expert system (statistical AI) can
choose at compilation time what optimum class it will use according to
memory/runtime constraints. Say field C (content) will be the primary
field to be processed, while field A (to:) will be customarily invoke
to complete processing. Then the choose class to use is AC. In the
rare classes where the B (from:) field is required, the class extracts
the datum from the source at runtime cost (possibly very costly),
while fields A and C can be reutilized several times at fixed
(construction) cost and constant addressing time. Since each A, C
fields will occupy addressing memory, program construction becomes
_optimum_ for memory constraints given the processing algorithm
(statistical) constraints, WITHOUT SACRIFICING AVAILABILITY for the B
field and other fields that mat be necessary for the given process..

It can be shown that this scheme minimizes total programming time for
a multiplicity of processing options. It also minimizes total program
size, addressing time (each maximum usage, multiply used field is
extraced once), and memory spenditure while maximizing AVAILABILITY.
Regarding memory spenditure, analysis of the processing algorithm can
lead to further optimuim choices between classes. Say, it will invoke
field C for an average use of P(A) and a given expected spread of
cases. An expression can be derived to choose optimally between class
A (runtime C) and class AC (construction time C).

Several other similar expressions can be derived to provide metrics
and decision rules. This analysis can be automatized to any possible
processing algorithm by counting field addressing of each field and
estimating average use and spread of use to choose the best
combination of memory/runtime constraints over a (possibly very) large
space of data instances.

INTERNET IS NOT FREE, YOU PAY IT SEVERAL WAYS, SEVERAL TIMES. THIS IS
AN ECONOMICS STATEMENT not subject to opinion.

Copyright (C) by Danilo J Bonsignore. Patent pending. &8|D}

Danilo J Bonsignore



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2013. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.