The essence of the Cantor diagonal argument is quite simple, namely:
Given a square matrix F one may construct a row-vector different from those in F by simply taking the diagonal of F and changing each element.
In detail: suppose matrix F(i,j) has entries from a set B with two or more elements (so there is a "change" map ~ on B: ~b differs from b). Let ~F(i,i) denote the "changed diagonal" of F, i.e. the row-vector obtained by applying ~ to each element of the diagonal F(i,i) of F. The changed diagonal of F differs from each row-vector of F because it differs on diagonal elements: the i'th row-vector f_i of F has i'th entry = F(i,i) versus ~F(i,i) = i'th entry of changed diagonal:
[ f1(1) f1(2) f1(3) . . . ] row i = f_i(j) maps A to B | | | f2(1) f2(2) f2(3) | | | --> [~f1(1) ~f2(2) ~f3(3) ... ] | f3(1) f3(2) f3(3) | | . . | = "changed diagonal" of F | . . | [ . . ] where ~x differs from x
Notice that the above proof required no hypotheses on the index set of the rows and columns of F. Although classically one thinks of a square matrix as indexed by a finite ordered set {1 2 ... n}, in fact the above proof holds for any abstract index set A; then the rows are simply functions f(j) from A to B, and the matrix F is just a set of such row-functions {f_i(j)}, indexed by i in A. This matrix viewpoint is adopted in the proof given below that nontrivial exponentiation B^A increases the cardinality |A| of A.
THEOREM Let A, B be sets, and B^A = set of functions from A to B. If |B| > 1 then |B^A| > |A|, i.e. B^A has more than |A| functions.
Proof: Given |A| functions f_i : A -> B, construct the |A| x |A| "matrix" F(i,j) = f_i(j). The proof presented above shows that the changed diagonal function ~F(i,i) differs from the given |A| functions (= row-vectors of F); hence |B^A| > |A| when |B| > 1.
In particular, for B = 2 = {0 1} the theorem implies |2^A| > |A|, i.e. the cardinality of the powerset of A exceeds that of A; here we've employed the well-known fact that the powerset of A is the same as 2^A (subsets S of A are in 1-1 correspondence with functions f : A -> 2 via: i in S <=> f(i) = 1; in other words consider f as the characteristic function (bit-vector) for the subset S of A). A close examination of the classical proof [1] will show that it is essentially identical to the above "matrix-theoretic" proof.
By the way, it's a historical mistake that the diagonal construct is attributed to Cantor. In fact it dates to at least 1871 when du Bois-Reymond diagonalized on the growth rates of functions in his pioneering studies on "orders of infinity" (see my prior post [2]); in fact such a diagonalization on growth rates is a simple way of proving that Ackermann function is not primitive recursive (this problem was not considered until almost 60 years later [3]).