Life intervened, and I had to postpone posting this after I'd already spent some time one it. I see now that there are more messages so the debate may have moved on, but I don't want to lose the time invested so I post this with apologies if its content duplicates other people's contributions.
pam <email@example.com> writes:
> On Jul 25, 4:08 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote: > >> >> No, you can't be any doubt about what my answer is. I said it's linear >> and not logarithmic -- O(n) and not O(log(n)) -- and I've said it more >> than once. I don't think you have any trouble understand that this is >> my answer. >> >> You might have trouble understanding why this is my answer. Here I can >> only say that I've been as clear as I can be. If you could say what >> parts are not clear to you, I could say more. > > Well here the question is another. > > When you answered O(n) you were considering a setting > where the split point does not depend on n.
I didn't make any general remark like that. I make a very specific one: "the key distinction is between reducing the range by a *factor of n* or by a *fixed amount* independent of n" (emphasis added). It's about only two cases, and it's about the reduction in the range, not the formula for the split point.
> Now we have "refined" the formulation making clear the dependence > ("divide and conquer").
You once proposed splitting at n/uniform(1, n) but that does not always reduce the range by anything more than a constant.
> So, now the question is (and it has always been in essence): > > Should one singularity which causes the least upper bound to raise > from O(logn) > to "no better than O(n)" be considered in the computational complexity > assessment, > considered that its probability tends to 0 ? > > Patricia was saying yes.
And I agree. If you don't follow what I say below in clarification, just ignore it and assume that I'm a terrible communicator. You haven't been objecting to Patricia's argument so I presume you accept it. She's right, and if I seem to be saying anything else, it's my bad writing skills.
> You were saying, at one point, that stuff with probability 0 should > not be considered in the asymptotic analysis.
Every case that can occur must be considered in a worst-case analysis. There is no n so large that the probability of a sequence of bad splits is zero. These cases must be included.
> So, in this sense, i was saying that you have not answered to the > corrected formulation of the problem.
I hope I have now. The cost for all n-sized cases can be calculated and the worst ones, taking linear time, all have non-zero probabilities.
If you disagree, what is the worst-case cost function? I don't mean what is it asymptotically similar to (i.e. I don't want the big O "summary", I want the actual cost in comparisons of a search of n elements).
> Now, as Patricia recalled, we say that the complexity C(n) is O(logn) > if: > > > limit for n to infinity of C(n) / logn is less than some > constant, say K. >
But you've forgotten something else she said (twice). You don't find a O(f(x)) formula for C(n) until you have found C(n). The big O stuff is throwing you right off -- it a way of simplifying your result, not the result itself. Forget it for the moment: what is C(n) for your algorithm? Do you agree that it's just n?
> Now in a probabilistic setting, limits are taken "in probability": > What we are interested in > is to see if: > > > Prob [ limit for n to infinity of C(n) / logn is less than > K ] = 1 > > (this is an "almost sure" convergence (a.s.), as opposed to the "sure" > convergence > used in nonprobabilistic setting)
Well, that's just not how complexity is defined. You can't use big O to mean something new without alerting everyone to this fact.
As for the probabilistic convergence, I have to leave that to you. The formula you have inside Prob[...] does not look like the kind of thing I've see before in such contexts, but since I won't be able to help, it hardly matters that I don't understand your formula.
> or if, at least, the above probability is converging to 1 (a.a.s., > asymptotically almost sure). > > In our case, from what i gathered so far we have a generally O(logn) > problem which only on singularities > might blow up to O(n).
The *problem* is O(log(n)) because there is an O(log(n)) worst-case solution. Your *algorithm* (random slitting) is O(n) in the worst case. I have no idea if these worst cases can be described as singularities but I think (talking off the top of my head) you believe that cost is greater than k*log(n) only on a set of cases with measure zero, and that you can formalise this with one or other of the standard forms of probabilistic convergence. Obviously I can't say if this it true or not.
> So when we consider C(n) above we can split it into 2 parts: one of > order O(logn) and another ("atomic") > of order O(n). But the second part, as n goes to infinity has a > probability going (very quickly) to 0.
I don't think it's that simple. You can't describe cases as being O(log(n)) or O(n). If you run your algorithm with n=42 and it takes 36 compares, is that an O(log(n)) or an O(n) case? The probabilistic analysis must take the distribution of splits (and inputs if need be) into account to give a random variable in place of the cost function. You may then be able to determine that this variable has a limit as n->oo using one of the limit definitions ("almost surely" is only one).
Alternatively, you might be able to derive the expectation etc. of this variable to give yet another measure of the algorithm's performance.
This is not something I know much about, but it does seem clear that you can't partition the cases into two. Every set of split points determines a cost and has a distinct probability of occurring. They must all be considered.
> So we have that the above limit holds with probability 1.
That needs to be established. It does not seem unlikely but probabilities are notoriously counter-intuitive.
> In probabilistic contexts i have always seen considering forms of > convergence a.s. or a.a.s., or anyway involving > probability.
Right, but in case it has not yet been said enough, in complexity theory, you don't have to take any limits. If the worst-case cost function is n, you can call it O(n) without using a limit argument and without using any probability theory. Whatever measure you end up deciding to use, you can't call it worst-case complexity.
> Also, from an intuitive point of view, it would be ridiculous, imho, > to consider the problem O(n). Just think: > imagine n is as big as an RSA modulus. The probability of the > "singularity" causing O(n) > is a number inconceivably small, that no computer on earth could even > ever represent.
I can't stop you thinking it ridiculous, but complexity classes are defined a particular way and you can't alter that on your own. You can choose to use what you consider a more intuitive measure, but an O(n) algorithm will remain and O(n) algorithm.
> This was in essence my argument. Clearly, i dont expect to be right, > and actually i would like to be corrected. >