Random Variables and Order Statistics
Date: 08/16/2001 at 00:42:11 From: Sam Subject: Random variables Can you give me some hints concerning the following problem? Let X1, X2, ....., Xn be n independent and continuous random variables with the same distribution. Let X denote the maximum and Y the minimum of these random variables. Let U = X/Y. Find the distribution of X, Y, and U.
Date: 08/16/2001 at 11:57:35 From: Doctor Jordi Subject: Re: Random variables Hello, Sam - thanks for writing to Ask Dr. Math. The problem you propose is an exercise in order statistics and transformations. Let us first look at the distribution of your maximum order statistic and of your minimum. I will denote the minimum observation by X_min and the maximum by X_max. Suppose you have ordered your sample of n Xi observations. That is, say we have a streak such that W1 < W2 < ... < Wn where every Wi corresponds to some Xi and X_min = W1 = min(X1, X2, ..., Xn) and X_max = Wn = max(X1, X2, ..., Xn). Let F_X(x) denote the cumulative distribution function of the Xi. Given that the Xi are i.i.d., the distribution of each may be found according to the following argument. We will proceed by first finding the distribution function of X_max. That is, we seek F_max(x) = P(X_max < x). Now, think about this problem this way: the probability that the maximum is less than any given number is not an event independent of the other observations. In fact, X_max will be less than any given x if and only if all the other observations are also less than this given x. What is the probability that W1 is less than x (hint: what is the meaning of the cumulative distribution function F_X(x) of W1)? What is the probability that W2 is also less than x? What is the probability that W1 AND W2 are less than x, given that W1 and W2 are independent (how do you find the probability of two _independent_ events)? Therefore, what is the probability that W1 AND W2 AND W3 AND ... AND Wn = X_max are all less than x? I hope the above hints are enough for you. Once you have found the cumulative distribution function of X_max, remember to differentiate to obtain its density function. The argument for finding the distribution of the minimum proceeds similarly. That is, we seek F_min(x) = P(X_min < x) = 1 - P(X > x) and we know that the minimum will be greater than any given x if and only if all the other observations are greater than this given x. Remember that P(Xi > x) = 1 - P(Xi < x) = 1 - F_X(x) The last step is to find the distribution function of the quotient of these two random variables. This can be done with similar reasoning. We seek F_U(u) = P(U < u) = P(X_max / X_min < u) That is, the probability of the set (x_max, x_min) such that x_max / x_min < u. If x_min is greater than 0, then this will be the set x_max < u*x_min. If x_min is less than 0, the set in question will then be x_max > u*x_min. This allows us to set up the following double integrals to evaluate the probability in question: F_U(u) = Int[-oo, 0]Int[u*x_min, oo] f(x_max, x_min) dx_max dx_min + Int[0, oo]Int[-oo, u*x_min] f(x_max, x_min) dx_max dx_min Where the symbol Int[a, b] f(x)dx is used to denote the definite integral of f(x) evaluated from a to b. (I've omitted the appropriate subscripts of the density functions for better readability). To evaluate these integrals, it is helpful to make the change-of-variable x_max = x_min*v in the inner integrals. Does this help? Please write back if you feel I need to explain myself better or if you would like to talk about anything else. - - Doctor Jordi, The Math Forum http://mathforum.org/dr.math/
Date: 08/16/2001 at 18:39:16 From: Sam Subject: Re: Random variables I am extremely thankful for the answer you provided, and I think I got the idea concerning the calculation of the first part. However, concerning the part that deals with the ratio ( max / min ) , I am still wondering how I can calculate the joint p.d.f (f (x_max, x_min)) you used in the double integrals. Is there any general formula that allows me to calculate it for the ordered set of random variables you talked about at the beginning? If yes I would be glad to receive your answer. Thank you again. Sincerely, Sam
Date: 08/17/2001 at 21:02:36 From: Doctor Jordi Subject: Re: Random variables Hello again. I will treat in more detail the general case for the transformation of the quotient of two continuous random variables (without assuming independence). I hope you have no difficulty applying it to the problem at hand. Suppose we have two random variables, X and Y, with joint density f(x,y), and we seek the density function of Z = Y/X. We proceed first by finding the cdf of Z. F_Z(z) = P(Z < z) = P(Y/X < z). Now, this last probability can be expressed as the sum of two disjoint events: : P(Y < xz) if x > 0 and : P(Y > xz) if x < 0. (Since I am assuming continuous random variables, it is okay to use strict inequalities). By the additive axiom of disjoint events of the probability function, P(Z < z) = P(Y < xz|x > 0) + P(Y > xz|x < 0) It is now simple to setup the appropriate integrals to evaluate this probability. In  Y ranges from -oo to xz, while x ranges from 0 to oo. In the other summand above, Y ranges from xz to oo and x from -oo to 0. The resulting double integrals look as follows: F_Z(z) = Int[o,oo]Int[-oo,xz]f(x,y)dydx + Int[-oo,0]Int[xz,oo]f(x,y)dydx In order to remove the dependence of x in the inner limits of integration, we make the change-of-variable v = y/x; dv = dy/x, which gives Int[0,oo]Int[-oo,z] xf(x,xv) dvdx + Int[-oo,0]Int[z,-oo] xf(x,xv) dvdx (take careful note of the changes in the limits of integration and make sure you understand why they changed as they did). Flipping the inner limits of integration on the right integral and using the absolute value function allows us to express this more neatly as follows: Int[0,oo]Int[-oo,z] xf(x,xv) dvdx + Int[-oo,0]Int[-oo,z] - xf(x,xv) dvdx Int[0,oo]Int[-oo,z] |x|f(x,xv) dvdx Now, if we can use the assumption that f(x,xv) is continuous, we can exchange the order of integration and differentiate with respect to z to obtain the appropriate density function f_z(z) = Int[0,oo] |x|f(x,xz) dx as required. I will now address your question about the distribution of the kth order statistic. The setup, recall, is an i.i.d. sample of n observations. We take these observations and order them from least to greatest. Now we want their distribution. Let X1, X2, ..., Xn be independent, identically distributed random variables with common density f_x(x) and let Y1, Y2, ..., Yn be such that every Yi corresponds to some Xi and Y1 < Y2 < ... < Yn. (note: since I am restricting myself to the continuous case, the previous inequalities are strict, although to be perfectly rigorous and general, they should be "less than or equal to"). We have already considered previously the distribution of Y1 and Yn (the min and the max). Let us now concentrate on the case n=2 and turn to the joint distribution of Y1 and Y2. It is apparent that no two-order statistics will be independent. So whatever the joint distribution of Y1 and Y2 is, it will not be the product of their marginal distributions. The compound event (Y1 < y1, Y2 < y2) means that either (X1 < y1, X2 < y2) or (X2 < y1, X1 < y2). Notice that Y1 can be X1 or X2, whichever one is smaller. This means that the event (Y1 < y1 and Y2 < y2) is the union of two events. Recall the additive law of probability that states that for two events A and B we have P(A union B) = P(A) + P(B) - P(A intersection B). So, since y1 < y2, for F_Y1,Y2(y1, y2) = P(Y1 < y1, Y2 < y2) we have the following: P(Y1 < y1, Y2 < y2) = P(X1 < y1, X2 < y2) + P(X2 < y1, X1 < y2) - P(X1 < y1, X2 < y2) Since X1 and X2 are independent with common distribution F_x(x), we can calculate F_Y1,Y2(y1, y2) = F(y1)F(y2) + F(y2)F(y1) - F(y1)F(y1) = 2F(y1)F(y2) - [F(y1)]^2 To obtain the joint density f_Y1,Y2(y1, y2) we differentiate first with respect to y1 and then with respect to y2 to obtain simply f_Y1,Y2(y1, y2) = 2f(y1)f(y2) for y1 < y2 and 0 elsewhere This method can be applied to any number of n order statistics. In general, we obtain the following joint density of all the order statistics: f_Y1,Y2,...Yn(y1, y2, ..., yn) = n!f(y1)f(y2)...f(yn) for y1 < y2 < ... < yn and 0 elsewhere From this joint density, the marginal of the kth order statistic can be found by integration. As an example, say n = 3 and we seek the marginal density function of Y2. We set up the following double integral and "integrate out" y1 and y3. The limits of integration are set by the requirement that y1 < y2 < y3. f_Y2(y2) = Int[-oo,y2]Int[y2,oo] 3! f(y1)f(y2)f(y3) dy1dy3 = 3!f_X(y2)F_X(y2)[1 - F_X(y2)) Although this procedure can be generalized to obtain the density of the kth statistic, it is easier to proceed by a heuristic differential argument, which can be made rigorous if desired. Think of the density function of a continuous random variable as an approximate measure of the probability that the random variable realizes a value in that neighbourhood. P( x < X < x + dx) is approximately equal to f(x)dx for small dx. These approximations become exact in the limit as dx -> 0. Say now we seek the marginal density function f_Yk(yk) of the kth order statistic. This can be done if we consider the problem as an exercise in applying the multinomial distribution. We have n observations. We want to know the probability that k-1 of those observations are less than yk, that exactly one of them is close to yk, and that the remaining n-k are greater than yk. The following table summarizes a bit of this information. Class How many of them Associated probability -------------------------------------------------------------------- Yi's less than yk k-1 [F_X(yk)]^(k-1) Yi's close to yk 1 f_X(yk) Yi's greater than yk n-k [1 - F_X(yk)]^(n-k) From the multinomial distribution, we now compute the probability of the event f_X(yk) = P("Yk is close to yk"). n! f_Yk(yk) = --------------- [F_X(yk)]^(k-1)*f_X(yk)*[1 - F_X(yk)]^(n-k) (k-1)! 1! (n-k!) An entirely analogous argument allows us to find the joint density of any two-order statistics. You will have to use this when you find the joint density of the maximum and the minimum. HINT: If two-order statistics are under consideration then there will be 5 classes for the multinomial distribution instead of 3. What are these classes? This should be enough to help you answer your question. If you still need anything else, please let us know. - Doctor Jordi, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994-2013 The Math Forum