Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Re: binary search with arbitrary or random split
Posted:
Jul 24, 2012 10:27 AM


pam <pamelafluente@libero.it> writes:
> On Jul 24, 1:05 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote: >> pam <pamelaflue...@libero.it> writes: > https://groups.google.com/group/comp.theory/browse_thread/thread/f712492127047301/ffb8397007f4abf9#ffb8397007f4abf9 > >> > ... And actually splitting deterministically at 1 (or n) we would >> > somehow be out of a "divide and conquer" scenario >> > (as indicated above by John), as in effect, as Patricia noted, we are >> > examining the elements one by one. >> >> It's not "splitting deterministically at 1 or n" is splitting that >> happens, by outrageous fortune, at 1 or n all n1 times. If that can >> happen, it forms the worst case for random splitting. > > I agree with that. I have a problem with the probability being 0, in > the asymptotic analysis.
Where, in the standard definitions of worstcase complexity, does the limit of some probability get used? As far as I know, it does not.
Here's an example worked out in full:
// search for t in d[1]..d[n]: binarysearch(t, d, n) { l := 1; h := n while h >= l do m := (l + h) / 2 if d[m] = t then return m else if d[m] > t then h := m  1 else l = m + 1 done return false }
unless I've got something wrong (I probably have[1]), the number of comparisons done for input size is bounded above by log2(n)+1. This is an exact and tight bound. No case ever takes more, and there is always some input of size n which takes this number of compares.
Here there is no need to use big O, but we might as well. The algorithm is O(log(n)) because the base of the log function and the +1 are lost.
What changes if we write m := uniform(l, h)? What is the upper bound on the cost now? It's n  again exactly and tightly. There is always some execution that takes n compares, and the probability that this occurs is nonzero  for all n. We never consider an "infinite" case when the probability is zero.
[1] See Bentley, Jon, 1986. "Programming Pearls", AddisonWesley. Lots of professional programmer got some detail of binary search wrong when set as an exercise.
>> > So i think this can be legitimately ruled out, as a real "degenerate" >> > case. No? This for the deterministic split. >> >> Now I can answer your split at 2 or n1 question. That also gives O(n) >> search so you'd have to rule that case out as well. In fact, splitting >> at k and nk for any constant k (with some other rule when n <= k) gives >> O(n) search so they all have to be ruled out. too! > > > Here i am getting lost. Assume that M is the middle index in an > arbitrarily large array. > Are you saying that if i split, for instance, deterministically at M1 > i suddenly get a O(n) search > instead of O(log n) ?
No. Splitting at M+k for some (possibly negative) constant k as well as splitting at M*k for some constant k in (0,1) all produce logarithmic bounds.
I was talking about splitting at k and/or nk. That's generalising from the pathological case of splitting at 1 and/or n. Eliminating only one element at a time is not, asymptotically, worse than eliminating 2 or 3 or 4 or... They are all O(n).
>> But to answer... Yes, anything with probability zero >> is not considered because it can't happen. > > I thought so.
Just be sure: because a zero probability means that it can't happen so it has no place in a worst case analysis.
>> The asymptotic bound is not what happens when n gets very large... > > well it seems to me that there is a limit in the definition.
You can define O(f(x)) in terms of limits but that's got nothing to do with any algorithm. Remember, we work out the worstcase cost first as a function of n. We may then choose to summarise that by using big O notation. If we do, we don't revisit the code to see what happens "in the limit". We already know, exactly, the cost of the algorithm for all n and the big O step is a technical rewriting of this function in a simpler form.
In my examples above, the worstcase costs are log2(n) + 1 and n  exactly. No need to use big O, or take any limits. One is logarithmic and the other is linear.
> So, to recap, what would be the answer? That only a deterministic > search > using exactly the middle point is O(logn), while any other > deterministic and the randomsplit search > is O(n) ?
No. For example, a random split based on
m := (l+h)/uniform(2,5)
would be logarithmic. The key distinction is between reducing the range by a factor of n or by a fixed amount independent of n.
 Ben.



