Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: binary search with arbitrary or random split
Replies: 2   Last Post: Jul 26, 2012 3:50 PM

 Messages: [ Previous | Next ]
 Ben Bacarisse Posts: 1,972 Registered: 7/4/07
Re: binary search with arbitrary or random split
Posted: Jul 26, 2012 12:56 PM

pam <pamelafluente@libero.it> writes:

> On Jul 26, 3:55 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>

>> You did not answer my key question (maybe I forgot to ask it!). Do you
>> agree that the cost function (in terms of compares) of your uniform
>> splitting algorithm is C(n) = n?  I can say that this is not in
>> O(log(n)) (but it is in O(n)) without taking any limits, and without any
>> reference to probability.

>
>
> Sure it is.
> You have understood perfectly my point, though.

Did you mean "not understood"?

> Considering the formulation with random splitting point n/k ( not n-
> k ! ) (again: "divide", not subtract!)

Do you see why that makes no difference? uniform(1,n), n-uniform(1,n)
and n/uniform(1,n) all produce splits in [1,n] but with different
distributions.

> IF the only cases where i get cost n are those "degenerate", where by
> mere (insignificant, for n large) chance
> the binary search happens to be a linear search, i would still object
> that in a probabilistic
> setting, using sure convergence has little meaning.

What would you object to? The (worst case) cost function in n. That n
occurs rarely does not alter the big O class to which this function
belongs.

Look at it another way, do you agree that "f(x) = x is in O(n)" is a
theorem of mathematics? Do you think the truth or falsehood of this
theorem depends on where f(x) comes from -- if it's the distance
travelled by a car it's true but, it if it's the cost of an algorithm
with some random component it's false?

If you mean that the complexity class of this algorithm is not
interesting to you, that you find other measures more intuitive and
practical, then we can stop this discussion now. I can't object to
such a preference, but you seem to agree with everything except this
last point. Unfortunately it's not a matter of opinion or intuition;
the identity function is O(n).

> So considering that O() is defined using limits, as far i can see
> everywhere,
> *in a probabilistic setting * (such is what we are considering here)
> i would consider *probabilistic forms* of convergence.

O() can be defined in terms of limits but it usually is not. Even when
it is defined in terms of limits, these are the limits from analysis not
from probability theory. Membership of the equivalence class O(n) does
not depend on the frequency with which a function takes certain values,
but on weather there is a point beyond which f(x) <= n*k for some
constant k.

> You say that this is not the way this is done in CS and they always
> consider "sure" convergence.
> I can and have to accept that for sure. But i also feel to say it
> makes no sense, in a probabilistic setting, nor intuitively.

We may be nearly done here. I don't think there is any point in trying
to change your mind about what makes sense. It makes sense to me because
I want an algorithm witch cost function n to be in O(n) regardless of
any further knowledge of the algorithm, but we can safely disagree about
that.

> I have in mind large problem, like n bigger than 5000 digits. In this
> case i only care of asymptotics
> and to me if the cost n reduces to O(logn) asymptotically, because the
> events causing cost n are
> practically impossible, i think it's right to see it as O(logn), no
> matter what happens for small n.

All find except for the notation. You can't "see it as O(log(n))" if it
isn't. You can see it as Opr(log(n)) for some new measure, Opr, that
uses some kind of almost sure convergence. You'd still have to define
Opr, and prove that your chosen algorithm is in OPr(log(n)) though.

> Clearly, IF there are other situations which causes the cost n (apart
> the linear searches and having
> a probability which does not tend to 0 then ok, i could not,
> obviously, object anything).

It so much more complicated than that. The relationship between split
choices and cost is an almost smooth one. Choosing n (always) *reduces*
the cost in many cases but increases it in others. Capturing that in
your new OPr measure will be hard (but I am sure it's possible).

The trouble with thinking about "cases" is that most cases have a
probability that tends to zero. For example, the set of executions
where the random algorithm choose n/2 every time is vanishingly small in
the long run. There are also an unbounded number of cases that give
pathological results (always choosing 1, always choosing 2 etc...).
Don't get me wrong, I know that probability theory can resolve all these
issues but I think the details will be hard for and actual algorithm.

>> O(log(n)) (but it is in O(n)) without taking any limits
>
> This part is not completely clear to me yet. I always understood O()
> as a short hand
> for an event involving a limit. A description of an asymptotic
> behavior.
> But maybe you are referring to complexity classes definition and they
> don't use limits (?).

Discussed above. You agree that the cost function in n. n is in O(n)
and not in O(log(n)) these are theorems of mathematics. In those cases
where O() is defined using limits, these are not probabilistic limits,
they are normal analytic limits.

--
Ben.

Date Subject Author
7/26/12 Ben Bacarisse
7/26/12 pamela fluente