Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: binary search with arbitrary or random split
Replies: 1   Last Post: Jul 25, 2012 6:40 PM

 Ben Bacarisse Posts: 1,972 Registered: 7/4/07
Re: binary search with arbitrary or random split
Posted: Jul 25, 2012 6:40 PM

Life intervened, and I had to postpone posting this after I'd already
spent some time one it. I see now that there are more messages so the
debate may have moved on, but I don't want to lose the time invested so
I post this with apologies if its content duplicates other people's
contributions.

pam <pamelafluente@libero.it> writes:

> On Jul 25, 4:08 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>

>>
>> No, you can't be any doubt about what my answer is.  I said it's linear
>> and not logarithmic -- O(n) and not O(log(n)) -- and I've said it more
>> than once.  I don't think you have any trouble understand that this is
>>
>> You might have trouble understanding why this is my answer.  Here I can
>> only say that I've been as clear as I can be.  If you could say what
>> parts are not clear to you, I could say more.

>
> Well here the question is another.
>
> When you answered O(n) you were considering a setting
> where the split point does not depend on n.

I didn't make any general remark like that. I make a very specific one:
"the key distinction is between reducing the range by a *factor of n* or
by a *fixed amount* independent of n" (emphasis added). It's about only
two cases, and it's about the reduction in the range, not the formula
for the split point.

> Now we have "refined" the formulation making clear the dependence
> ("divide and conquer").

You once proposed splitting at n/uniform(1, n) but that does not always
reduce the range by anything more than a constant.

> So, now the question is (and it has always been in essence):
>
> Should one singularity which causes the least upper bound to raise
> from O(logn)
> to "no better than O(n)" be considered in the computational complexity
> assessment,
> considered that its probability tends to 0 ?
>
> Patricia was saying yes.

And I agree. If you don't follow what I say below in clarification,
just ignore it and assume that I'm a terrible communicator. You haven't
been objecting to Patricia's argument so I presume you accept it. She's
right, and if I seem to be saying anything else, it's my bad writing
skills.

> You were saying, at one point, that stuff with probability 0 should
> not be considered in the asymptotic analysis.

Every case that can occur must be considered in a worst-case analysis.
There is no n so large that the probability of a sequence of bad splits
is zero. These cases must be included.

> So, in this sense, i was saying that you have not answered to the
> corrected formulation of the problem.

I hope I have now. The cost for all n-sized cases can be calculated and
the worst ones, taking linear time, all have non-zero probabilities.

If you disagree, what is the worst-case cost function? I don't mean
what is it asymptotically similar to (i.e. I don't want the big O
"summary", I want the actual cost in comparisons of a search of n
elements).

> Now, as Patricia recalled, we say that the complexity C(n) is O(logn)
> if:
>
>
> limit for n to infinity of C(n) / logn is less than some
> constant, say K.
>

But you've forgotten something else she said (twice). You don't find a
O(f(x)) formula for C(n) until you have found C(n). The big O stuff is
throwing you right off -- it a way of simplifying your result, not the
result itself. Forget it for the moment: what is C(n) for your
algorithm? Do you agree that it's just n?

> Now in a probabilistic setting, limits are taken "in probability":
> What we are interested in
> is to see if:
>
>
> Prob [ limit for n to infinity of C(n) / logn is less than
> K ] = 1
>
> (this is an "almost sure" convergence (a.s.), as opposed to the "sure"
> convergence
> used in nonprobabilistic setting)

Well, that's just not how complexity is defined. You can't use big
O to mean something new without alerting everyone to this fact.

As for the probabilistic convergence, I have to leave that to you. The
formula you have inside Prob[...] does not look like the kind of thing
I've see before in such contexts, but since I won't be able to help, it
hardly matters that I don't understand your formula.

> or if, at least, the above probability is converging to 1 (a.a.s.,
> asymptotically almost sure).
>
> In our case, from what i gathered so far we have a generally O(logn)
> problem which only on singularities
> might blow up to O(n).

The *problem* is O(log(n)) because there is an O(log(n)) worst-case
solution. Your *algorithm* (random slitting) is O(n) in the worst case.
I have no idea if these worst cases can be described as singularities
but I think (talking off the top of my head) you believe that cost is
greater than k*log(n) only on a set of cases with measure zero, and that
you can formalise this with one or other of the standard forms of
probabilistic convergence. Obviously I can't say if this it true or
not.

> So when we consider C(n) above we can split it into 2 parts: one of
> order O(logn) and another ("atomic")
> of order O(n). But the second part, as n goes to infinity has a
> probability going (very quickly) to 0.

I don't think it's that simple. You can't describe cases as being
O(log(n)) or O(n). If you run your algorithm with n=42 and it takes 36
compares, is that an O(log(n)) or an O(n) case? The probabilistic
analysis must take the distribution of splits (and inputs if need be)
into account to give a random variable in place of the cost function.
You may then be able to determine that this variable has a limit as
n->oo using one of the limit definitions ("almost surely" is only one).

Alternatively, you might be able to derive the expectation etc. of this
variable to give yet another measure of the algorithm's performance.

This is not something I know much about, but it does seem clear that you
can't partition the cases into two. Every set of split points
determines a cost and has a distinct probability of occurring. They
must all be considered.

> So we have that the above limit holds with probability 1.

That needs to be established. It does not seem unlikely but
probabilities are notoriously counter-intuitive.

> In probabilistic contexts i have always seen considering forms of
> convergence a.s. or a.a.s., or anyway involving
> probability.

Right, but in case it has not yet been said enough, in complexity
theory, you don't have to take any limits. If the worst-case cost
function is n, you can call it O(n) without using a limit argument and
without using any probability theory. Whatever measure you end up
deciding to use, you can't call it worst-case complexity.

> Also, from an intuitive point of view, it would be ridiculous, imho,
> to consider the problem O(n). Just think:
> imagine n is as big as an RSA modulus. The probability of the
> "singularity" causing O(n)
> is a number inconceivably small, that no computer on earth could even
> ever represent.

I can't stop you thinking it ridiculous, but complexity classes are
defined a particular way and you can't alter that on your own. You can
choose to use what you consider a more intuitive measure, but an O(n)
algorithm will remain and O(n) algorithm.

> This was in essence my argument. Clearly, i dont expect to be right,
> and actually i would like to be corrected.
>

--
Ben.