
Re: Approximating GCF
Posted:
Jul 20, 2014 11:28 PM


On Thu, 17 Jul 2014 04:46:36 0700, ryan.mitchley wrote: > On Thursday, July 17, 2014 11:46:53 AM UTC+2, William Elliot wrote: >> >> >> Let n1, n2,.. nj be j integers, r a real number and d1, d2, .. dj small >> numbers. You are given what? n1,.. nj & d1,.. dj and are looking for >> some r for which the sequence r.n1 + d1, r.n2 + d2,.. r.nj + sj does what?
> Sorry for being unclear. I want the best estimate of r, given a number of > samples of the sums r.ni+di > e.g. given [1.3375, 2.0776, 4.7121, 2.0996, 3.8090] > we notice that 1.7 is the "closest" common factor  i.e. after subtracting > 0.425 and dividing by 1.7 the sequence is nearly a set of integers: > [1.0368, 0.9721, 3.0218, 0.9851, 1.9906]
Suppose you are given n values y_1 ... y_n. By knowledge about some process, you know that each y_i is some random noise d_i plus r * x_i + w, where r, x_i, w, and d_i are unknowns except that all the x_i are integers. You want to compute r and w (where w is an offset).
Given a set of 4 values e,f,g,h from among the n values, ratios like (fe)/(hg), (ge)/(hf), and (he)/(gf) will be estimates of numbers of the form (a*r)/(b*r) = a/b for some integers a, b. It is then straightforward to use continued fractions to find the least values a', b' that may apply, and then divide the value ratio's numerator by a' and its denominator by b'. The following array shows the intervalsize results of this procedure applied to your test vector [1.3375, 2.0776, 4.7121, 2.0996, 3.8090]: [0.02278, 0.02278, 0.04439, 0.0444, 0.07161, 0.07161, 0.1095, 0.10951, 0.15137, 0.15137, 1.6873, 1.69742, 1.70422, 1.70755, 1.70755, 1.7094, 1.7094, 1.7094, 1.71855, 1.7314, 3.39485, 3.40585, 3.4151, 3.4371] in which the first 10 numbers are from irrelevant ratios (eg a nearzero numerator), the next 10 are in the neighborhood of 1.7, and the last 4 numbers are twicetoobig, from cases where ratios like 1/2 or 2/3 that should have been 2/4 or 4/6. All of which suggests an interval size between 1.7066 (half of the average for the last 4 numbers) and 1.7082 (the average for the 10 numbers near 1.7).
Anyhow, you could take a random sample of quads of values to get a vector of possible intervalsizes, then for likely possibilities, iteratively improve the intervalsize and dataoffset values r and w. If your measure of quality of fit is a sum of squared error it is easy to compute an improved iterate in O(1) time whenever the change in iterated value of w or r is less than the minimum difference between any data point and its nearest w+k*r. Else an iteration costs O(n).
 jiw

