Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: binary search with arbitrary or random split
Replies: 4   Last Post: Jul 24, 2012 12:25 PM

 Messages: [ Previous | Next ]
 Ben Bacarisse Posts: 1,972 Registered: 7/4/07
Re: binary search with arbitrary or random split
Posted: Jul 24, 2012 10:27 AM

pam <pamelafluente@libero.it> writes:

> On Jul 24, 1:05 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> pam <pamelaflue...@libero.it> writes:
>

>> > ... And actually splitting deterministically at 1 (or n) we would
>> > somehow be out of a "divide and conquer" scenario
>> > (as indicated above by John), as in effect, as Patricia noted, we are
>> > examining the elements one by one.

>>
>> It's not "splitting deterministically at 1 or n" is splitting that
>> happens, by outrageous fortune, at 1 or n all n-1 times.  If that can
>> happen, it forms the worst case for random splitting.

>
> I agree with that. I have a problem with the probability being 0, in
> the asymptotic analysis.

Where, in the standard definitions of worst-case complexity, does the
limit of some probability get used? As far as I know, it does not.

Here's an example worked out in full:

// search for t in d[1]..d[n]:
binary-search(t, d, n)
{
l := 1; h := n
while h >= l do
m := (l + h) / 2
if d[m] = t then
return m
else if d[m] > t then
h := m - 1
else l = m + 1
done
return false
}

unless I've got something wrong (I probably have[1]), the number of
comparisons done for input size is bounded above by log2(n)+1. This is
an exact and tight bound. No case ever takes more, and there is always
some input of size n which takes this number of compares.

Here there is no need to use big O, but we might as well. The algorithm
is O(log(n)) because the base of the log function and the +1 are lost.

What changes if we write m := uniform(l, h)? What is the upper bound on
the cost now? It's n -- again exactly and tightly. There is always
some execution that takes n compares, and the probability that this
occurs is non-zero -- for all n. We never consider an "infinite" case
when the probability is zero.

[1] See Bentley, Jon, 1986. "Programming Pearls", Addison-Wesley. Lots
of professional programmer got some detail of binary search wrong when
set as an exercise.

>> > So i think this can be legitimately ruled out, as a real "degenerate"
>> > case. No? This for the deterministic split.

>>
>> Now I can answer your split at 2 or n-1 question.  That also gives O(n)
>> search so you'd have to rule that case out as well.  In fact, splitting
>> at k and n-k for any constant k (with some other rule when n <= k) gives
>> O(n) search so they all have to be ruled out. too!

>
>
> Here i am getting lost. Assume that M is the middle index in an
> arbitrarily large array.
> Are you saying that if i split, for instance, deterministically at M-1
> i suddenly get a O(n) search
> instead of O(log n) ?

No. Splitting at M+k for some (possibly negative) constant k as well as
splitting at M*k for some constant k in (0,1) all produce logarithmic
bounds.

I was talking about splitting at k and/or n-k. That's generalising from
the pathological case of splitting at 1 and/or n. Eliminating only one
element at a time is not, asymptotically, worse than eliminating 2 or
3 or 4 or... They are all O(n).

>> But to answer...  Yes, anything with probability zero
>> is not considered because it can't happen.

>
> I thought so.

Just be sure: because a zero probability means that it can't happen so
it has no place in a worst case analysis.

>> The asymptotic bound is not what happens when n gets very large...
>
> well it seems to me that there is a limit in the definition.

You can define O(f(x)) in terms of limits but that's got nothing to do
with any algorithm. Remember, we work out the worst-case cost first as
a function of n. We may then choose to summarise that by using big O
notation. If we do, we don't re-visit the code to see what happens "in
the limit". We already know, exactly, the cost of the algorithm for all
n and the big O step is a technical re-writing of this function in a
simpler form.

In my examples above, the worst-case costs are log2(n) + 1 and n --
exactly. No need to use big O, or take any limits. One is logarithmic
and the other is linear.

> So, to recap, what would be the answer? That only a deterministic
> search
> using exactly the middle point is O(logn), while any other
> deterministic and the random-split search
> is O(n) ?

No. For example, a random split based on

m := (l+h)/uniform(2,5)

would be logarithmic. The key distinction is between reducing the
range by a factor of n or by a fixed amount independent of n.

--
Ben.

Date Subject Author
7/24/12 Ben Bacarisse
7/24/12 pamela fluente
7/24/12 Patricia Shanahan