Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
A tradeoff scheme for optimum multiple records hierarchies
Posted:
Apr 22, 2012 11:29 PM
|
|
Assume you have a record oriented database where fields from each record are to be extracted and processed. Motivation is email records. For a complex field structure a standard tradeoff between memory and runtime processing is established. Not all records have to be retrieved/extracted, which takes time, for a multiplicity of future processes, but a single hierarchy is desirable. The best embodiment for the solution is a hierarchy code generator (CG) where a base class (virtual) interface is generated with get(/set) methods for each field in the record. The base class is generated from a descriptive file, possible XML, describing the fields of each record; in this case a simple text file naming each field. Then the CG goes ahead and produces hierarchy derivation classe for each combinatorial possibility.
Say that each record contains fields ABC. The CG reads the descriptive file and produces base (virtual) class with get(/set) methods for each A, B, C field, (with additional support methods read from another file). Then the CG produces classes for each possible combination of fields to be processed: A, B, C, AB, AC, BC, as derivations from the base class. For each mentioned field in the derivation, a member for holding data is generated in the class.
Constructor parsers can be generated to read and _save_ each mentioned file iton the derived class, or be handcrafted, if a single base class constructor (parser) extractor cannot be devised. For email the issue is relatively trivial as per specification a single method can be devised to accumulate data from each field by simply specifying the field (header) title.
For the remaining fields, a standard algorithm is optionally devised to extract in runtime time the fields that were not mentioned in the class derivation definition. Vg, class AC constructs fields A and C at construction time, reading from a file for instance, then it includes a _costlier_ standard method to extract field B from the same source. Note that construction and runtime extraction algorithms need not be the same.
Under this scheme a programmer or expert system (statistical AI) can choose at compilation time what optimum class it will use according to memory/runtime constraints. Say field C (content) will be the primary field to be processed, while field A (to:) will be customarily invoke to complete processing. Then the choose class to use is AC. In the rare classes where the B (from:) field is required, the class extracts the datum from the source at runtime cost (possibly very costly), while fields A and C can be reutilized several times at fixed (construction) cost and constant addressing time. Since each A, C fields will occupy addressing memory, program construction becomes _optimum_ for memory constraints given the processing algorithm (statistical) constraints, WITHOUT SACRIFICING AVAILABILITY for the B field and other fields that mat be necessary for the given process..
It can be shown that this scheme minimizes total programming time for a multiplicity of processing options. It also minimizes total program size, addressing time (each maximum usage, multiply used field is extraced once), and memory spenditure while maximizing AVAILABILITY. Regarding memory spenditure, analysis of the processing algorithm can lead to further optimuim choices between classes. Say, it will invoke field C for an average use of P(A) and a given expected spread of cases. An expression can be derived to choose optimally between class A (runtime C) and class AC (construction time C).
Several other similar expressions can be derived to provide metrics and decision rules. This analysis can be automatized to any possible processing algorithm by counting field addressing of each field and estimating average use and spread of use to choose the best combination of memory/runtime constraints over a (possibly very) large space of data instances.
INTERNET IS NOT FREE, YOU PAY IT SEVERAL WAYS, SEVERAL TIMES. THIS IS AN ECONOMICS STATEMENT not subject to opinion.
Copyright (C) by Danilo J Bonsignore. Patent pending. &8|D}
Danilo J Bonsignore
|
|
|
|