Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Re: binary search with arbitrary or random split
Posted:
Jul 26, 2012 12:56 PM


pam <pamelafluente@libero.it> writes:
> On Jul 26, 3:55 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote: > >> You did not answer my key question (maybe I forgot to ask it!). Do you >> agree that the cost function (in terms of compares) of your uniform >> splitting algorithm is C(n) = n? I can say that this is not in >> O(log(n)) (but it is in O(n)) without taking any limits, and without any >> reference to probability. > > > Sure it is. > You have understood perfectly my point, though.
Did you mean "not understood"?
> Considering the formulation with random splitting point n/k ( not n > k ! ) (again: "divide", not subtract!)
Do you see why that makes no difference? uniform(1,n), nuniform(1,n) and n/uniform(1,n) all produce splits in [1,n] but with different distributions.
> IF the only cases where i get cost n are those "degenerate", where by > mere (insignificant, for n large) chance > the binary search happens to be a linear search, i would still object > that in a probabilistic > setting, using sure convergence has little meaning.
What would you object to? The (worst case) cost function in n. That n occurs rarely does not alter the big O class to which this function belongs.
Look at it another way, do you agree that "f(x) = x is in O(n)" is a theorem of mathematics? Do you think the truth or falsehood of this theorem depends on where f(x) comes from  if it's the distance travelled by a car it's true but, it if it's the cost of an algorithm with some random component it's false?
If you mean that the complexity class of this algorithm is not interesting to you, that you find other measures more intuitive and practical, then we can stop this discussion now. I can't object to such a preference, but you seem to agree with everything except this last point. Unfortunately it's not a matter of opinion or intuition; the identity function is O(n).
> So considering that O() is defined using limits, as far i can see > everywhere, > *in a probabilistic setting * (such is what we are considering here) > i would consider *probabilistic forms* of convergence.
O() can be defined in terms of limits but it usually is not. Even when it is defined in terms of limits, these are the limits from analysis not from probability theory. Membership of the equivalence class O(n) does not depend on the frequency with which a function takes certain values, but on weather there is a point beyond which f(x) <= n*k for some constant k.
> You say that this is not the way this is done in CS and they always > consider "sure" convergence. > I can and have to accept that for sure. But i also feel to say it > makes no sense, in a probabilistic setting, nor intuitively.
We may be nearly done here. I don't think there is any point in trying to change your mind about what makes sense. It makes sense to me because I want an algorithm witch cost function n to be in O(n) regardless of any further knowledge of the algorithm, but we can safely disagree about that.
> I have in mind large problem, like n bigger than 5000 digits. In this > case i only care of asymptotics > and to me if the cost n reduces to O(logn) asymptotically, because the > events causing cost n are > practically impossible, i think it's right to see it as O(logn), no > matter what happens for small n.
All find except for the notation. You can't "see it as O(log(n))" if it isn't. You can see it as Opr(log(n)) for some new measure, Opr, that uses some kind of almost sure convergence. You'd still have to define Opr, and prove that your chosen algorithm is in OPr(log(n)) though.
> Clearly, IF there are other situations which causes the cost n (apart > the linear searches and having > a probability which does not tend to 0 then ok, i could not, > obviously, object anything).
It so much more complicated than that. The relationship between split choices and cost is an almost smooth one. Choosing n (always) *reduces* the cost in many cases but increases it in others. Capturing that in your new OPr measure will be hard (but I am sure it's possible).
The trouble with thinking about "cases" is that most cases have a probability that tends to zero. For example, the set of executions where the random algorithm choose n/2 every time is vanishingly small in the long run. There are also an unbounded number of cases that give pathological results (always choosing 1, always choosing 2 etc...). Don't get me wrong, I know that probability theory can resolve all these issues but I think the details will be hard for and actual algorithm.
>> O(log(n)) (but it is in O(n)) without taking any limits > > This part is not completely clear to me yet. I always understood O() > as a short hand > for an event involving a limit. A description of an asymptotic > behavior. > But maybe you are referring to complexity classes definition and they > don't use limits (?).
Discussed above. You agree that the cost function in n. n is in O(n) and not in O(log(n)) these are theorems of mathematics. In those cases where O() is defined using limits, these are not probabilistic limits, they are normal analytic limits.
 Ben.



