### Ask Dr. Math: A Mathematical Essay

Dr. Math FAQ ||
Classic Problems ||
Formulas ||
Search Dr. Math ||
Dr. Math Home

## Nonstandard Analysis and the Hyperreals, by Jordi Gutierrez Hermoso
This introduction to the most basic ideas of nonstandard analysis and the hyperreal number system was written in answer to a number of questions first directed to Ask Dr. Math. I have based almost all of it on what I have read in two books: [1] A word about the presentation of nonstandard analysis I chose to use: I had to make a compromise between clarity of exposition, and mathematical rigour. Nonstandard analysis is a very rich and intricate topic, and I want to give as much of the complete story as possible, but I do not want to bog you down in technicalities. I will have to ask you to take my word for certain claims I make. I use a little set theory, but it should be familiar to you (if it's not, then perhaps a quick review is in order). As is common practice in the mathematical literature, I will leave out a few details and hope that you can provide them yourself. I have faith, nevertheless, that these details are not reason enough to make you doubt the validity of the general structure of the hyperreal number system, and that the description the hyperreals seems like a natural way to answer the questions that I pose. A little history of the events leading up to the development of nonstandard analysis is in order. The following section is adapted from reference [1]. [TOP] In Western mathematical tradition, the driving notions of differential and infinitesimal calculus date back at least as far back as the ancient Greeks. Archimedes deemed them necessary for finding the formulae of areas and volumes of curved objects, particularly the circle. His use of the method of exhaustion, first introduced by Eudoxus a century earlier, allowed him to give a reductio ad absurdum proof that the area of a circle must be equal to half the product of its radius and its circumference. The process is strongly reminiscent of integration. By similar reasoning, using arguments of motion and thinking of geometrical figures as infinitely many nothings together, he was able to accurately derive many other formulae. He is careful to note that his method does not give proof, but only seems to suggest the truthfulness of his conclusions. We commonly attribute to Eudoxus and Archimedes a fundamental principle I have quoted before to you: for every real number, a larger real number exists. In the seventeenth century came Newton and Leibniz, the two founders of infinitesimal calculus. Although their results were the same, their motivations and interpretations were quite different. This is a very natural occurrence in mathematics: the same ideas are treated in a different manner because they are used for different purposes. Leibniz developed his calculus based on differential quantities, their ratios (derivatives), and their infinite sums (integrals). Newton, who required new mathematical tools to advance his physical ideas, instead used notions of speed and motion, very apparent by his terminology of "fluents" for functions and "fluxions" for derivatives. Using such ideas, sometimes in an unwieldy manner, sometimes masterfully, both demonstrated an excellent intuition of the infinitesimals and obtained consistent rules, lemmas, and theorems, by thinking of notions such as dy/dx in the context of "a change in y less than any assignable quantity divided by a change in x less than any other assignable quantity." In fact, their reasoning, although sometimes seemingly shaky, was adequate enough to be further advanced by Euler, probably the most prolific mathematician ever. With a rigour that most would describe as at best slightly lax, Euler made significant advances in infinitesimal calculus in his book These two concepts, infinitesimals and infinite quantities, however, were stirring great philosophical dilemmas. Simply put, the ideas are a little tough to chew on. The situation was worsened when other mathematicians, with perhaps a less intuitive grasp of the infinitesimal calculus, tried to work out further results with the use of infinitesimals, imitating the logic in vogue until then, and produced sheer nonsense. I can't resist quoting one of the most famous attacks against infinitesimals, by Berkeley in 1734: But what are these fluxions? The velocities of evanescent increments? And what are these same evanescent increments? They are neither finite quantities, nor quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities? Objections such as these, which I confess myself guilty of professing as well, eventually led to the famed epsilon-delta definition of a limit and the formal arithmetization of infinitesimal calculus by way of the work of Dedekind, Cantor, Cauchy ("The father of modern analysis"), and Weierstrass. And thus, circa 1872, infinitesimals were purged from the accepted mathematical literature and epsilon-delta limits were used to define integrals, derivatives, convergent sequences, and all similar notions. "Infinitely small" was formally dead and arithmetic was queen of mathematics once more... ... until 1966. That was the year that Abraham Robinson published his work In the fall of 1960 it occurred to me that the concepts and methods of contemporary Mathematical Logic are capable of providing a suitable framework for the development of the Differential and Integral Calculus by means of infinitely small and infinitely large numbers. This framework is probably a new development in logic and set theory. Below I will try to briefly outline his development of calculus with infinitely large and small quantities. [TOP] We wish to introduce the notion of two new types of numbers: infinite numbers (following the suggestions in [1] I will refer to these as unlimited numbers) and infinitesimal numbers. That is, numbers whose absolute value is greater than any positive real number, and numbers whose absolute value is less than any positive real number. Of the first kind, none exist in the real number system. Of the second kind, only one exists, namely, zero (yes, it is best to think of zero as the only infinitesimal real number). What we need here is an extension from R, the real numbers, to *R, the hyperreals. There are a few properties we would like this extension to have. First, everything that is true in *R should be true in R as well, and vice versa (under a loose interpretation of "everything," pertaining only to well-formed statements). Specifically, R should be a subset of *R and still preserve all the properties we have known and loved. The next point is to make sure that our new field *R is ordered (has a trichotomy law), that is to say, for every a, b in *R, one and only one of the following is true: a = b, a < b, or a > b. Guided by our previous experiences with sequences, our inclination is to use an infinite sequence of real numbers to define a hyperreal number. In fact, one way to construct the field of real numbers out of the field of rational numbers is precisely this: as limits of converging (Cauchy) sequences. We will try, however, to do things a little differently. In this construction of the real numbers, the following two sequences are associated to the same real number, namely zero. a = (1, 1/2, 1/3, ... , 1/n , ...) Yet it somehow fits our intuition to say that sequence b is smaller than sequence a, since almost all of the b terms (that is, all except the first) are less than the corresponding a terms. To get more ideas of what sort of construction we are after, consider these other sequences: c = (0, 0, 0, 1/16, 1/25, ..., 1/n^2, ...) We would like to say that sequence c denotes the same hyperreal as sequence b, since changing three isolated terms to zero out of an infinite number of terms is not enough to say that we have a completely different number altogether. Also, it would be desirable to be able to safely claim that d is smaller than b, since it has a bit of a head start as it goes to zero. We will eventually be able to want to say that a, b, c, and d are infinitesimals, that N is an unlimited hyperreal, and that a > b, since "almost all" (we will have to define what "almost all" means) of the corresponding terms in a are greater than those in b, just as b > d. Furthermore, we do not want to rely on the notion of a limit, which is a confusing way to think about infinitesimals, since the entire objection to infinitesimals is that thinking about them as quantities that "tend to zero" is meaningless (which is why infinitesimals are not formally defined in standard analysis, except sometimes only as a notational device). A quantity should not tend to anything, for that implies some sort of motion. It should be quite fixed. Note how no mention of the "infinitieth term" is made. There are a few problems we can see right now with this approach. We would like ANY infinite sequence to be a hyperreal number, even goofy examples like the ones below. (1, 0, 1, 0, 1, 0, ...) Alternating sequence of zeroes and ones What kind of definition would allow us to decide which of the above sequences are equal? What sort of order relation can we apply that will work on ALL infinite sequences of real numbers, no matter how crazy we can make them? The answer is that we will think of two sequences as being equal if their agreement set (the set of corresponding terms which are equal) is large. Similarly, a sequence is greater than another sequence if almost all of its corresponding terms are greater. It seems that one of our key points in the construction of the hyperreals will be to define exactly in set-theoretic language what "large" and "almost all" means. This still leaves the question of the laws of hyperreal arithmetic. We want this to be a field, that is, that for all numbers a, b, c in *R with two binary operations, + and x (addition and multiplication) the following field axioms should hold: - Closure
a + b and a x b are both in *R
- Commutativity
a + b = b + a and a x b = b x a
- Associativity
(a + b) + c = a + (b + c) and (a x b) x c = a x ( b x c)
- Distributivity
a x (b + c) = (a x b) + (a x c)
- Existence of identity or neutral elements
there exist elements z and e in *R such that a + z = a and a x e = a
- Existence of inverses
there exist elements -a and a^-1 for every a such that a + (-a) = z and a x (a^-1) = e
It turns out that it is easy to make sure that *R is a field once we have clearly defined the order and equivalence relations. From the above rules, for example, an unlimited number should have an inverse. (10, 100, 1000, ..., 10^n, ...) As we shall see, one way to express its (infinitesimal) inverse will be, quite naturally, as (1/10, 1/100, 1/1000, ..., 10^-n, ..) This is consistent with our intuition that 100...000 x 0.000...0001 = 1. Remember why we want hyperreals, after all: we want a novel approach to analysis that is more consistent with our intuition. In what follows, I hope I may be able to convince you that the intuitive ideas of Leibniz, Newton, and Euler by which they developed so much of their calculus CAN be formalized and put on a solid basis and that the intuitive arguments schoolteachers give to their calculus students can in fact be made rigorous, with a little bit of work and setup. All we need is a new kind of numbers. We now turn to the construction of this ordered field. [TOP] The first point in our agenda is to establish some measure of equality and order between two hyperreal numbers. When comparing two hyperreal numbers, a and b, we can form three disjoint sets: the agreement set (set of indices of corresponding equal terms of the sequence) the "a-greater set" (set of indices of corresponding terms greater in a than b), and the "b-greater set" (whatever is left over). We want to define a definition of largeness which will allow us to choose exactly one of these three sets, so that we can choose the one and only "large set" (notice how we are not comparing the size of two sets, a set either is or isn't large, but one set is not "larger" than the other, at least, not in our sense). Let us first consider when we want to say that two infinite sequences are in fact the same hyperreal number. Since we want a measure of equality (an equivalence relation) we want reflexivity, symmetry, and transitivity to hold. That is, (1) a [=] a Reflexivity (2) if a [=] b, then b [=] a Symmetry (3) if a [=] b and b [=] c, then a [=] c Transitivity (I have used a bracketed = to indicate that this is an equality in a more general sense; some relation between two hyperreals that satisfies the above axioms. I will use the same symbol again below.) Let us denote the agreement set between two sequences by giving the numbers of the positions that are agreed upon. For example, for the following two finite sequences r = (3, 4, 7, 8, 9) their agreement set Ers is denoted by {2, 4, 5}, since they agree on the second, fourth, and fifth term. The "r-greater" set is {1, 3} and the "s-greater" set is empty. With this notation, the agreement set of a sequence with itself will be the set of natural numbers. This is our first requirement of "largeness": 1) The set of natural numbers N = {1, 2, 3, 4, ...} is large. This ensures that a [=] a. Symmetry will be satisfied trivially by the method we are employing (the agreement set of two sequences does not depend on the order in which we are comparing the sequence, so Ers = Esr), but transitivity requires a bit more thought. Say we know that that a [=] b and that b [=] c. This means that the agreement set between a and b, Eab, is large, and so is the agreement set Ebc. The intersection of Eab and Ebc is a subset of Eac (hyperreal a and c have at least as many terms in common as they have with b, but they may have more between themselves they do not share with b). Thus we want Eac to be large, so that we may say that a [=] b and b [=] c implies a [=] c. This is our next requirement of largeness. 2) If two subsets of N are large, then all supersets of their intersection are also large. In particular, this condition entails that if A and B are large, then so is their intersection. Also, any superset of a large set is large. Basically, this means that if any elements are added to a large set, it will still be large. The next point seems rather natural to consider as well. Since we do not want to consider the empty set as large (otherwise, all hyperreals would be equal, since we'd have to define all supersets of the empty set to be large as well, that is, all subsets of N including N itself), only one of two disjoint sets can be large, because the intersection of two disjoint sets is empty. In particular, we will consider the case of the two disjoint sets B and its complement in N, Bc. 3) The empty set is not large. 4) Set B is large if and only if its complement Bc is not large. Condition 4 may seem rather artificial (after all, why shouldn't we say that neither a set nor its complement is large?) but it becomes necessary in order to satisfy a trichotomy law. Remember how when comparing two hyperreal numbers we form three disjoint sets: the agreement set, and two order relation sets. We need one and only one of those sets to be large, and the other two (its complement) not to be. It is now apparent that we will need some sort of rigorous criterion to be able to filter out which subsets of N are large and which are not. This, in fact, is what motivates the definition of a filter. A filter F is a set of sets that only includes all large sets, where large sets are only as those described in condition 1 and 2 above. If the filter also excludes the empty set (i.e. it satisfies condition 3) it is called a proper filter. If it includes all large sets as defined by conditions 1-4 above, it is a maximal proper filter (since it cannot contain any more subsets without including the empty set and becoming improper) and is called an We are almost ready to define an equivalence relation between hyperreal numbers. The last requirement is perhaps intuitively plausible, but requires some explanation and formal justification. 5) A finite set is not large. An ultrafilter that also satisfies condition 5 (i.e. it does not contain any finite sets, only the complements of finite sets that are called Here I will make a digression as to why we need condition 5. Suppose, for some reason, we decided to make an ultrafilter that contains a finite set. Without too much loss of generality, imagine that we arbitrarily say that ultrafilter F contains the set: B = {1, 2, 3, 4, 5} Since F is an ultrafilter, it cannot contain the cofinite complement of B (i.e. all natural numbers except 1 through 5). I claim that with these conditions, then F must also contain at least one of the following one- element sets: {1}, {2}, {3}, {4}, {5}. For if it didn't contain any of these sets, then it must contain all of their cofinite complements (all the naturals except 1, all the naturals except 2, etc). Since it is a filter, it must also contain the intersection of these five cofinite sets (an extension of condition 2). This intersection is the complement of B. But this contradicts the fact that F is an ultrafilter (since it contains both B and Bc). Hence, our assumption that it doesn't contain any of the above one-element sets must be false, so it must contain at least one. Wrapping it up, if an ultrafilter contains a finite set, then it contains a one-element set and is called a "All right," you may be thinking. "So what's the big problem if an ultrafilter contains a one-element set?" Well, things get awkward if we allow that to happen. First, we would have to define that two hyperreals are equal if they agree on a single entry, which really seems to counter our intuition of largeness. Second, and this is getting a bit ahead of myself, we will soon be seeing that the real numbers are a subset of the hyperreals and that they can be described as constant sequences (i.e. a hyperreal where every term is equal to some real number). If we allow one-element sets to be large, then EVERY hyperreal is equal to a real number (constant sequence), and we haven't really constructed the extension of R we wanted to build. Hence, we need a nonprincipal ultrafilter to construct *R. I will summarize the conditions that define a nonprincipal ultrafilter below: A nonprincipal ultrafilter F is a set of subsets of the natural numbers that satisfies the following: - F contains the set of natural numbers N (this condition is redundant, since it is a consequence of 3 below).
- If F contains the subsets B and C, then it also contains their intersection.
- If F contains the subset B, then it contains all of the supersets of B as well.
- F does not contain a subset B if and only if it contains its complement Bc.
- F does not contain any finite subsets of N.
Thus, a nonprincipal ultrafilter contains all cofinite sets, and basically contains all the sets we could ever need to define largeness (and hence, an order/equivalence relation between two hyperreals). Let me give a couple of examples of how this will work. Consider the following two sequences: a = (0, 0, 0, 9, 9, 6, 7, 8, ..., n, ...) The "a-greater" set is {4, 5} and the "b-greater" set is {1, 2, 3}. Both of these sets are finite, so neither can be large. Their agreement set, on the other hand, is {all natural numbers except 1-5}. Since this is a cofinite set, it is in a nonprincipal ultrafilter and it is large. Hence, a [=] b. Let us now turn to two of our goofy sequences from before: r = (1, 0, 1, 0, ...) Their agreement set is empty. The "r-greater" set is the set of odd numbers, and the "s-greater set" is the set of even numbers. We know that the empty set is not in any ultrafilter (i.e. the empty set is not large) so a cannot be equal to b. So, all we have to decide is which of the two sets, the odd numbers or the even numbers, is large. Both are infinite sets, disjoint, complements of each other. It really does not matter which one we decide to include in our ultrafilter. At this point we can make an arbitrary choice and include either of these sets in our ultrafilter, but we'll have to stick to our choice. That is all. Just two questions remain, the mathematician's favourite questions, existence and uniqueness: Is there a nonprincipal ultrafilter on N? Is it the only one, or can we (or would we want to) form a different nonprincipal ultrafilter? To answer them we need to make two assumptions in set theory. Let us consider existence. First, a definition and an observation. A set of sets H has the finite intersection property, fip, if it is not possible to form the empty set by intersecting finitely many of the elements of H. Then, a filter generated by H (i.e. the filter that contains all the elements of H, plus the intersections and supersets as well, of course) is proper if and only if H has the fip. Next, we must invoke the axiom of choice, or a version of it that is more amenable to this case, Zorn's Lemma. Zorn's Lemma makes the rather "obvious" claim that if we have a partially ordered set B in which every linearly ordered chain of subsets of B has an upper bound, then B contains a maximal element. Allow me to explain the terminology, which should point out the "obviousness" of the lemma. A partial ordering in a set is analogous to the ordering of "less than or equal to" in the real numbers. In terms of sets, a partial ordering can be obtained by using set inclusion, the relation "is a subset of," where subset also includes the meaning that a set is a subset of itself. For example, for the set S = {1, 2, 3}, the following is a partially ordered chain of elements in P(S), the set of subsets of S (called the power set of S). {}; {1}; {1}; {1, 2}; {2, 1}; {2, 3, 1} A linear ordering, on the other hand, is analogous to "is less than" in the real numbers, or "is a proper subset of" (proper subsets of B are all subsets of B except B itself) in terms of set inclusion. The following is a linearly ordered chain in P(S) {}; {1}; {1, 3}; {1, 2, 3} Now, let's see if we can apply Zorn's lemma to the power set P(S). We notice that P(S) is partially ordered by set inclusion, and that every linearly ordered chain in P(S) has an upper bound (since the union of all elements in any chain is in S itself). Hence, P(S) contains a maximal element, namely S. Well, that's rather obvious, isn't it? Zorn's Lemma tells us that there is greatest subset of S, where "greater than" is defined by set inclusion. Although in this case we know that this maximal element of P(S) is S itself, we won't always be as lucky as to pinpoint exactly what the maximal element is or even if there is only one maximal element; sometimes we have to be content with the knowledge of the existence of a maximal element and move on. This is exactly how we will prove the following THEOREM: There exists a nonprincipal ultrafilter on N. Proof: Consider the filter H that contains all cofinite sets. Hence, H is proper, since it has the finite intersection property. Next, consider the set M of all proper filters that are generated by H. One of the elements in M, for example, could be the set of all cofinite sets, plus all even integers, plus all supersets and intersections. No element in M is principal, because since an element B of M contains all cofinite sets, B cannot contain one-element sets, for then B would contain the empty set (the intersection of one-element set and the cofinite complement), and this would contradict the fact that B is proper. Next, we must see if M has a maximal element (a nonprincipal maximal proper filter, i.e. a nonprincipal ultrafilter!). To this effect, we use Zorn's Lemma. We notice that M is partially ordered by set inclusion, the "is a subset of" relation. We see that every linearly ordered chain of elements of M (ordered by the "is a proper subset of" relation) has an upper bound in M, because the union U of all elements in said chain, is again a proper filter that contains all cofinite sets, which is in M again, and all elements in the chain are subsets of U (i.e. they are all "less than or equal to U") so U is an upper bound of any linearly ordered chain in M. Hence, M contains a maximal element F, by Zorn's Lemma, and the theorem is proved. Lastly, uniqueness. Let me remark that the above proof by no means implies that we have found the one and only maximal nonprincipal proper filter, but only shows that a maximal element exists. Maximal, in this case, means that if we have any other set D in M which we can order according to the relation "F is a subset of D," then F's maximality implies that F and D are the same. In other words, we cannot add any more elements to a maximal filter in M and get another filter in M. The fact that our nonprincipal ultrafilter F is not unique should be apparent: there are many ultrafilters we could have chosen; arbitrary choices such as either including the odds or including the evens in F abound. The interesting question, does this complicate our lives or not? It depends. Do you remember the continuum hypothesis? It just so happens that if we allow ourselves to treat the continuum hypothesis as another axiom of set theory (we are allowed to do this consistently, since the continuum hypothesis is independent of the usual axioms of set theory), then even if we choose a different nonprincipal ultrafilter for our construction it turns out the system we produce in the end turns out to be completely isomorphic, that is, we can produce identical systems where we are merely using different symbols for denoting the same thing. If, on the other hand, if we decide to not include the continuum hypothesis, then the situation is undetermined. With these remarks, the most important task of our construction is complete. [TOP] Let F be a fixed nonprincipal ultrafilter on N. Define the following three relations modulo F, for any infinite sequences of real numbers a and b. a [=] b if the agreement set between a and b is in F From now on, saying that a subset of N is "large" means that it is included in our ultrafilter F, and saying that "almost all" of the terms of an infinite sequence have a particular property means that the set of these terms with that particular property imply a certain subset of N that is in F as well. Define the following equivalence class on an infinite real sequence r: C[r] = {all real infinite sequences s such that s [=] r} And at last, make the following definition: A hyperreal number r is the equivalence class C[r]. The set of all hyperreal numbers is denoted by *R. The fact that there are infinitely many sequences we can use to denote the same hyperreal should not bother us, just as 1/2 and 17/34 can be used to denote the same rational number. Hyperreal arithmetic is all done term-by-term. To add two hyperreals, we add all of the corresponding entries. To multiply them, we multiply corresponding terms. The additive identity is the equivalence class identified with a constant sequence of real numbers all equal to 0. The multiplicative identity is identified to the constant sequence 1. The set R of real numbers is a subset of *R, and a member r of R is the equivalence class identified by the constant sequence r. A hyperreal a > b if any sequence in the equivalence class of a is greater than any other sequence in the equivalence class of b, by the almost-all criterion (inclusion or not in F). The analogous reverse relation also holds. A few examples, where the choice of F is irrelevant, since I will only use the fact that it contains all cofinite sets. Although I will designate a hyperreal r by a sequence, it should be understood that this number can in fact be designated by ANY sequence which is in its equivalence class C[r].
0 = (0, 0, 0, 0, ...) 1 = (1, 1, 1, 1, ...) 2 = (2, 2, 2, 2, ...) 1/4 = (1/4, 1/4, 1/4, ...) pi = (pi, pi, pi, ...) a = (1, 1/2, 1/3, ..., 1/n, ...) b = (1, 2, 3, ..., n, ...) c = (2, 4, 6, ..., 2n, ...) d = (0, 0, 0, ... fifty more zeroes, 54, 55, 56, ... n, ...) f = (3, 6, 9, ..., 3n, ...) g = (pi/2, pi, pi/4 pi/2 pi/2, pi/4, pi/4, pi/4, pi/4, ..., pi/4, ...) h = (-1, 1, -1, 1, ..., (-1)^n, ...) j = (1, 0, 1, 0, ...) k = (0, 1, 0, 1, ...)
0 < 1 d = b f > c > b > a pi > 1 g < b
a + 0 = (1 + 0, 1/2 + 0, ..., 1/n + 0, ...) = a d x 1 = (0 x 1, 0 x 1, 0 x 1, ..., fifty more times, 54, ..., n ...) = d c / 2 = (2/2, 4/2, 6/2, ..., 2n/2, ...) = (1, 2, 3, ..., n, ...) = b 0 - 1 = (0-1, 0-1, 0-1, ...) = (-1, -1, -1, ...) = -1 b + c = (1+2, 2+4, 3+6, ..., n+2n, ...) = (3, 6, 9, ..., 3n, ...) = f a x pi = (pi, pi/2, pi/3, pi/4, ..., pi/n, ...) g / pi = (1/2, 1, 1/4, 1/2, 1/2, 1/4, 1/4, ..., 1/4, ...) = 1/4 since almost all the terms are in agreement pi x 1/4 = g almost-all criterion again (in ultrafilter) pi/4 = g a x b = (1 x 1, 2 x 1/2, ..., n x 1/n, ...) = (1, 1, 1, ...) = 1 Thus a and b are inverses of each other 1 + 1 = (1 + 1, 1 + 1, ...) = (2, 2, 2, ...) = 2 It is reassuring to see this old equality again (For these three examples, I made the arbitrary assumption that the set of odd numbers is large.) h = -1 j = 1 k = 0 (If I had instead assumed that the evens are large, this would have been the result.) h = 1 j = 0 k = 1 (It doesn't matter which assumption we make; as long as we stick to it, we can get consistent results.) Interesting observations
Let r be any positive real number (constant sequence). Then, for all r, a < r (almost-all criterion!) (terms of a converge to 0 in R) But, a > 0, since every one of a's terms is positive. Thus, we have a number that is less than every positive real number, but greater than 0. We say that a is a positive infinitesimal. Similarly, for all r, b > r (almost-all criterion again!) (Eudoxus-Archimedes principle, always a greater number exists in R) b is greater than every real number. We say that b is positive unlimited. Notice how the multiplicative inverse of an infinitesimal is an unlimited number. This is true in general. Let us now make a few more definitions and remarks, since it appears that we finally have the kind of numbers we were after.
A hyperreal number r is Members of *R - R (they are in *R but not in R) are called nonstandard numbers, and they are unlimited and nonzero infinitesimal numbers. Members of R are called real numbers, and sometimes also called standard. Also, notice that all real numbers and all infinitesimals are limited, and that all real numbers except 0 are appreciable. It is good to think of appreciable numbers as those that are neither infinitely small nor infinitely big, but they are not necessarily a real number. For example, the following sequence denotes an appreciable nonstandard hyperreal: (65, 64 1/2, 64 1/4, 64 1/8, ..., 64 1/2^n, ... ) Ever wanted to say that one over infinity is infinitesimal? Or how about that infinity plus infinity is infinity? Now is the time to say this again in a more precise language, using the tools we have built. Let e, d be infinitesimal; a, b appreciable, and K, J unlimited. Then,
Next come a few more definitions that are crucial for bridging the gap between hyperreals and reals. Let r, s be in *R. Then, r is said to be infinitely close to s, denoted here by r '=' s,
if r - s is infinitesimal. They are of limited distance apart
if their difference is limited. With these two definitions, we have another way to denote In and Li. Specifically, In = hal(0) (the set of infinitesimals is all the numbers that are infinitely close to zero) and Li = gal(0). The next one is very important: Every limited hyperreal r is infinitely close to exactly one real number, called its shadow and denoted by sh(r). Another common term for the shadow of a hyperreal is the standard part of r. The terminology "standard part" is particularly suggestive. It is reminiscent of when one talks about complex numbers and says that these numbers have a real part and an imaginary part. Similarly, limited hyperreals can be thought of as having a real, standard part, and perhaps an infinitesimal nonstandard part. In the case when the infinitesimal part is 0, the hyperreal number is just a real number. For example, in the sequence r = (65, 64 1/2, 64 1/4, ..., 64 1/2^n, ...) The standard part is 64, since r '=' 64, i.e. r - 64 gives the following infinitesimal number, which is less than every positive real number: (1, 1/2, 1/4, ..., 1/2^n, ...) The terminology "shadow" comes from a French school of nonstandard analysis founded by George Reeb, and I personally prefer it because it sounds more aesthetic and evocative, like halo and galaxy. I like to think of the hyperreals as a jagged line with peaks and valleys somewhere above the real line, and that these peaks cast shadows onto the real number line, which is their standard part. The line also extends beyond the real number line, which corresponds to the unlimited hyperreals. We are halfway done with most of the important points of the basis of nonstandard analysis, and further down I will give you standard and nonstandard proofs of familiar facts, and let you decide if you prefer one over the other. Before that, however, we must still talk about one more important point, and a few consequences. Specifically, the time has come to talk about functions, which play a prominent role in calculus and standard analysis, and we will not neglect them here either. [TOP] Say we had r in *R to be r = (1, 1/2, 1/4, ..., 1/2^n, ...) What is a natural way to define, say, sin(r)? We could do it as we have been doing everything else, term by term. In this case, this is not a problem, since the domain of the sine function is all real numbers. Thus, *sin(r) = (sin 1, sin(1/2), sin(1/4), ..., sin(1/2^n), ...) In general, this definition will work fine for any function whose domain is all of R. *f(r) = (f(r1), f(r2), ..., f(rn), ...) where rn denotes Essentially, this allows us to extend any function f(r), defined on R, to *f (r) defined on *R. The star in front of f(r) is supposed to remind us that this is a nonstandard function since its domain has been extended to include nonstandard hyperreals. However, since it normally causes no problems to omit the star, and it is clear from context whether one is talking about f(r) or *f(r), I will usually just refer to a function by f(r). Now, what if the domain of the function we want to extend is a subset of R, say B? We'll run into a bit of a problem if we again try to define everything term by term for all hyperreals, since some of the terms may be undefined. For example, consider the factorial function, only defined for r in the natural numbers N and 0. (I will write the factorial function as ! (r) instead of r!, in an effort to maintain a consistent ASCII symbolism.) !(r) = r(r - 1)(r - 2)(r - 3)...(3)(2)(1) If r is a hyperreal, our inclination may be to define *!(r) = (!(r1), !(r2), ..., !(rn), ...) but a lot of the rn's may not be natural numbers or 0, so, how do we proceed? The answer again lies in the almost-all criterion. This motivates the extension of any subset B of R to *B of *R. Another definition: Let B be a subset of R. We define the extension of B, *B, to be the set of the equivalence classes identified to all infinite sequences for which almost all of the terms are in B. In our particular example concerning the factorial function defined on N or 0, *N will be the set of hypernaturals. A few hypernaturals are given below by sequences. (1, 1, 1, 1, ...) (1, pi, 3, pi, 5, pi, 7, pi, ...) Now, we can say that the hyperextension of !(r) to *!(r) will quite simply be done term by term for all elements r in *N. If a few terms of r are not in N, we have a slight nuisance of defining what !(rn) should be for rn not in N. These isolated terms really have no impact on what *!(r) will be, so we might as well define them to be 0. This definition will work just fine in general. Let f be a function whose domain B is a subset of R. Then *f(r) is defined for all r in *B as follows: *f(r) = (f(r1), f(r2), ..., f(rn), ...) where f(rn) = 0 if rn is not in A There are a few very interesting consequences about this effort to define functions over all the hyperreals. First, we now have hypernaturals *N, hyperintegers *Z, and hyperrationals *Q, all subsets of the hyperreals *R. We might even feel tempted to expand this definition and talk about *C, the hypercomplexes, but we will not go into that here. Second, there is a special case of a function that we have also defined: a sequence! A sequence, recall, is a function of the natural numbers. Since now we have functions defined for hypernatural numbers, our result will be hypersequences, to be explored further on. Lastly, this is starting to introduce the importance of the star as an operator that defines a hypersomething. We shall soon take a closer look at the meaning of this *- transform. Before that, let us think a bit more about the meaning of *B, for any subset B of R. Is *B really different from B? This depends. Let's first consider a trivial case. Say B was the one-element set {0}. Then *B contains all infinite sequences for which almost all of the terms are 0. Well, according to our definition of a real and a hyperreal number, *B consists only of {0} as well. That is, B = *B and the *-transform hasn't produced a new set at all. This phenomenon happens whenever B is any finite subset of R. If B is a finite subset of R, then B = *B. In fact, a set B is infinite if and only if *B - B is not empty, i.e. if and only if *B contains nonstandard members. Because I want to use this again, let us treat this remark with a little more respect and call it a THEOREM: Subset B of R is infinite if and only if *B contains nonstandard members. Proof: If B is infinite, then there exists a sequence r of elements of B where every entry is distinct. This sequence cannot be in B, which contains all equivalence classes of constant sequences, since the agreement set with every element of B will either be empty or a single entry, neither of which is large by the choice of ultrafilter F. Thus, r must be in *B - B, and is a nonstandard member of *B. Although the above proof only proves one direction of the "if and only if" biconditional, the converse can be proven by showing that finite subsets are equal to their *-transform. That other proof is just an adaptation of the above argument, and I have already hinted above as to the plausibility of that result. I cannot resist giving a first taste here of the usefulness of this theorem, and of nonstandard analysis. The nonstandard proof contains a little point about the transfer principle we have not yet talked about, but that will come in the next section. THEOREM: The set P = {all prime numbers} is infinite. Classic proof: Assume that the number of primes is finite. Then, we could form a list of all n prime numbers, say P = {p1, p2, ..., pn} where pn is the greatest prime number. Form the number K + 1, where K is the product of all n prime numbers, i.e. K = p1p2p3...pn. Then K + 1 is greater than the largest prime number, pn. Furthermore, K + 1 is not divisible by any member of P, since a remainder of 1 results every time. Thus, K + 1 is a prime greater than the greatest prime. This contradiction shows our assumption must be false and the theorem is proved. Nonstandard proof: We must show that *P must contain nonstandard members. To this effect, consider a hypernatural K in *N that is divisible by every natural number. One such K could be (1, 2, 3, ..., n, ...) if the ultrafilter F contains {all even numbers, all multiples of 3, all multiples of 4, all multiples of 5, ...}. Next, consider the hyperprime number p in *P that divides K + 1. Such p exists because every hypernatural number greater than 1 has at least one hyperprime factor, by the transfer principle (next section). Then p must be nonstandard, for if it weren't, it would divide K, by assumption on K, and since p would then divide K and K + 1, it would also divide their difference, 1, which is not true for standard primes. This proves the theorem. Neat little proof, isn't it? There is only a mention there of the transfer principle, a fundamental cornerstone of nonstandard analysis, which I have managed to avoid until now. It is time to talk about it. [TOP] I am not sure if you remember, but in the section where we were talking about the properties that we wanted hyperreal numbers to have, I mentioned that we would like everything that is true about R to be true in *R as well, with a strict definition of "everything" pertaining only to well-formed statements. This condition is satisfied by something called the transfer principle, a very powerful result of mathematical logic that is in turn a consequence of another theorem by a Polish mathematician, Los's theorem. So powerful is this principle that sometimes nonstandard analysis is considered a branch of mathematical logic, because it is possible to bypass practically all of the ultrafilter construction of the hyperreals and instead jump in with transfer. This is the approach reference [2] takes, actually, but I thought that giving you the ultrafilter construction would be a better way to introduce the topic because it gives us "something we can touch." The usefulness of the transfer principle is that it allows us to stop thinking of hyperreal numbers as infinite sequences and instead treat them almost in the same way as we treat standard real numbers, in a similar way as we can forget Cauchy sequences after we finished the construction of the real numbers (nobody wants to think about converging sequences every time one needs to add 1 + 1, say). A proper discussion of the transfer principle requires a bit of a digression into symbolic logic and a description of the precise mathematical language one must use to determine what properties of R can be *-transformed into *R. I will try to not get too technical with the details of how this is done, but I do need to introduce a little terminology that may be familiar to you, and a refreshment here may be useful. In formal mathematical logic, one makes frequent use of the existential and universal quantifiers. It is common to make statements such as Eudoxus-Archimedes Principle: Let us analyze the logical components of statement EA. First, the statement defines two variables, r and s, and tells us the set in which these variables are free to roam. The statement claims "for every r." We say that r is universally quantified in statement EA. Further, the statement also talks of the existence of s, "there exists s." It doesn't state how many s's there are, but only makes the claim that at least one s exists. We say that s is existentially quantified. In symbolic logic, the phrase "for every," called the universal quantifier, is usually abbreviated by a symbol that looks like an upside-down A, and the existential quantifier "there exists" is abbreviated by symbol that looks like a mirror image of E. In this section, I will use the symbol A# for the universal quantifier and the symbol E# for the existential quantifier. Although not a formal requirement, it is customary to put parenthesis around the core of a logical sentence, after all variables have been quantified. Thus, the Eudoxus-Archimedes Principle of the natural numbers can be written as A#r in N, E#s in N, (s > r) I added commas for better readability, but they are not a standard. Also, the "in" to denote membership of a set is usually written by a symbol that looks somewhat like an epsilon, but I find that in ASCII symbolism, "in" is already symbolic and abbreviated enough. I should remark that because of an additional convention when translating symbolic logic to English that introduces the "such that" phrase immediately after existentially quantified variables, the order of quantifiers is crucial. For example, the following sentence E#s in N, A#r in N, (s > r) translates to English as There exists s in N such that for every r in N, s is greater than r and it is FALSE (there does not exist a natural number greater than every other natural number). It is also common that if two variables have the same type of quantification, we place them both under a single quantifier symbol and separate them by commas, to abbreviate matters a bit. The following statement, about the closure of the natural numbers under subtraction, can be written in the following two ways: A#r in N, A#s in N, (r - s in N) The statement is false, by the way. That's why we had to invent negative integers, to introduce a new set Z for which the statement A#r, s in Z, (r - s in Z) is true. A statement that quantifies a variable need not necessarily give the set from which the variable is to be taken. For example, we could very well write the following statement: E#s, A#r (s > r) We cannot decide on the truthfulness of the above statement because we do not know what sort of creatures s and r are. For example, the statement is true if s is a hyperreal number and if r is real (since then s could be an unlimited hyperreal) but is false if both r and s are hyperreals. When variables are specified to belong to a particular set, we say that these variables are bound. Another sort of logical symbol we will need is the logical connective. Logical connectives are the symbols that are used to make statements with more than one atomic simple logical sentence, and I believe that you should be familiar with them. Let me give a list with the ASCII symbols I will use for these symbols. /\ and \/ or (Not an exclusive or, mind you. The statements "1=1 \/ 2=2" and "1=2 \/ 1=1" are both true.) ~ not -> implies (used for if-then sentences) <-> if and only if A few examples of statements we can make in standard analysis. I hope you have no trouble translating into English and convincing yourself of the truth value I have assigned to each statement. This should be an interesting exercise.
DPR: A#r, s in R, (r < s -> E#q in R (r < q < s)) TRUE DPQ: A#r, s in Q, (r < s -> E#q in Q (r < q < s)) TRUE DPZ: A#r, s in Z, (r < s -> E#q in Z (r < q < s)) FALSE
Let B = {b1, b2, ..., bn) be a finite subset of R then A#r in B, (r = b1 \/ r = b2 \/ ... \/ r = bn) TRUE
EXP: A#r, s in R, (exp(r + s) = exp(r)exp(s)) TRUE SIN: A#r, s in R, (sin(a + b) = sin(a)cos(b) + sin(b)cos(a)) TRUE LOG: A#r, s in R, (r > 0 /\ s > 0 -> ln(r^s) = s x ln(r)) TRUE Here's a little result about prime factors that I had to use indirectly in the last proof of the previous section. PF: A#r in N, (r > 1 -> E#p in P, r/p in N) The r above is called a Dedekind cut, and in this particular example it is equal to Sqrt(2). These statements are a fancy way of claiming that Sqrt(2) is not rational. Dedekind cuts, in fact, are another way to construct the real numbers instead of using Cauchy sequences.
Let P(Q) denote the set of all subsets of Q (power set of Q) and B+ denote {all rational numbers greater than all elements of B}. Then DCQ: A#B in P(Q), A#b in B, E#l in Q, A#b' in B' (b < l /\ l <= b') FALSE Let P(R) denote the set of all subsets of R (power set of R) and B+ denote {all real numbers greater than all elements of B}. Then DCR: A#B in P(R), A#b in B, E#l in R, A#b' in B' (b < l /\ l <= b') TRUE The statement DCR is the Dedekind Completeness of the real numbers, and in fact uniquely characterizes the real numbers as the only Dedekind complete ordered field (any other Dedekind complete ordered field is isomorphic to R). It is usually stated as "every nonempty subset of R that is bounded above has a least upper bound," but perhaps the symbolic logic is a little more difficult to read here. Why the statement is false for another ordered field, such as the rationals, should be clear now from the counterexample I gave with Sqrt(2). Statement DCR provides a very important instance of the kind of properties that are NOT transferable between R and *R (and thank goodness that some properties are not transferable, because we do not want *R to be isomorphic to R). The introduction of infinitesimals makes *R Dedekind incomplete because a set with an upper bound such as S = {all r in *R that are less than all the positive real numbers} has as no least upper bound. (Why not? Introduction of infinitesimals destroys Dedekind completeness!) I have already given implicitly in the sections before this one all the necessary ingredients for creating a logical language in which we can apply the *-transform. When the *-transform is applied to a well-formed logical statement that has a certain truth value in standard analysis, the transfer principle allows us to rest assured that the *-transformed sentence has the same truth value. Basically, and without going into more detail and complicating matters, a "well-formed statement" is one in which all variables are quantified and bound, the bounds of the quantified variables are only subsets of R (which we defined in the previous section how to extend, or *-transform), everything is well-defined, and a "true-or- false" truth value can be unambiguously assigned to the statement. All of the above examples are well-formed statements, except for the one about Dedekind completeness because it has a quantified variable where the bound is not a subset of R, but a set of subsets of R, namely P(R). This statement cannot be *-transformed. More or less formally or informally, the transfer principle is the following: TRANSFER PRINCIPLE: Let phi be any well-formed statement in standard analysis and The transfer principle has a proof, but the proof is rather intricate, long, complicated, and a little boring. It is actually a special case of Los's Theorem, remember. The reason the transfer principle holds is largely due to the method we used to construct the hyperreal numbers, our "almost-all" criterion with ultrafilters. We constructed a field of numbers deliberately in such a manner that it would look almost like the real numbers, except for a few special properties that we wanted to be different. One of such properties that characterizes *R, for example, is the existence of unlimited numbers EUN: E#r in *R, A#s in R (r > s) The *-transform of a well-formed statement, basically, is simply obtained by replacing every instance of a bound by its extension and by extending every instance of a function. Actually, since the domain of a function is usually implicit from context, putting the * in front of an extended function is slightly redundant and thus usually omitted. Allow me to exemplify the *-transform by applying it to all of the statements in the last set of examples. By the transfer principle, remember, the truth value of all the statements remains the same under transfer. You may convince yourself of this by giving proofs of the following statements in nonstandard analysis.
*DPR: A#r, s in *R, (r < s -> E#q in *R (r < q < s)) TRUE *DPQ: A#r, s in *Q, (r < s -> E#q in *Q (r < q < s)) TRUE *DPZ: A#r, s in *Z, (r < s -> E#q in *Z (r < q < s)) FALSE
Let *B = {b1, b2, ..., bn) be a finite subset of *R then A#r in *B, (r = b1 \/ r = b2 \/ ... \/ r = bn) TRUE (note how B = *B and thus finite sets have no nonstandard members)
*EXP: A#r, s in *R, (*exp(r + s) = *exp(r)*exp(s)) TRUE *SIN: A#r, s in *R, (*sin(a + b) = *sin(a)*cos(b) + *sin(b)*cos(a)) TRUE *LOG: A#r, s in *R, (r > 0 /\ s > 0 -> *ln(r^s) = s x *ln(r)) TRUE or, quite equivalently, if we get systematically lazy about extended functions, *EXP: A#r, s in *R, (exp(r + s) = exp(r)exp(s)) TRUE *SIN: A#r, s in *R, (sin(a + b) = sin(a)cos(b) + sin(b)cos(a)) TRUE *LOG: A#r, s in *R, (r > 0 /\ s > 0 -> ln(r^s) = s x ln(r)) TRUE *PF: A#r in *N, (r > 1 -> E#p in *P, r/p in *N) TRUE (the proof of this statement without appealing transfer is not obvious!)
Let *B = {all a such that a^2 < 2} be and *C = {all a such that a^2 > 2) be subsets of *Q. Then *S2Q: E#r in *Q, A#s in *B, A#t in *C, (s < r < t) FALSE *S2R: E#r in *R, A#s in *B, A#t in *C, (s < r < t) TRUE It is pleasing to see that the set of hyperrationals doesn't contain any hyperirrational numbers.
DCR can NOT be *-transformed because one of the variables has a bound that is not a subset of R, i.e. DCR is not a "well-formed statement."One last remark. EUN is a bit of a special statement because it is a crossbreed with standard and nonstandard bounds in the quantifiers. It is not meaningful to apply the *-transform to that statement. The real power of transfer lies in the "inverse *-transform." We can employ methods of nonstandard analysis, freely using infinitesimals and unlimited numbers to arrive at a nonstandard theorem. If we can phrase the conclusion of the theorem as a statement about the hyperreals that is the *- transform of some statement about the real numbers, then we may remove the stars and have a theorem in standard analysis, which was proven by nonstandard techniques! There is great potential here to be exploited. Indeed, it has been exploited for the past thirty-something years, and many novel developments that seem radically different to those of standard analysis have been achieved by the mathematical community. [TOP] So there you have it. These are the basic elements of nonstandard analysis and hyperreal numbers. With these tools, it is possible to arrive at all of the same conclusions in standard analysis, but using intuitively pleasing methods. I will mention a few examples briefly, alas, for going into details would imply a new essay of length comparable to this one.
The limit L of a function f(r) of s can now be defined (or derived from standard analysis, if desired) as L '=' f(r) with r in hal(s) but r not equal to s.
We usually think intuitively of a function being continuous if moving slightly within the domain of the function implies another slight movement within the function's range. Continuity of a function at a point can now be defined with the help of halos. A function f(r) is continuous at s if f(s) '=' L where L is a real number, and if f(s + e) is in hal(L) for all infinitesimal e.
In standard analysis, a sequence is merely a function of the natural numbers. The nth term of a sequence is some real number assigned to n. With nonstandard analysis it is now possible to talk of functions of hypernatural numbers, i.e. hypersequences. We can now say that a hypersequence converges to a limit, (or you may extend this definition to seriously now talk about infinite sums) if for all unlimited n the nth term is infinitely close to some real number.
This is one of my favourites. With nonstandard analysis, the Leibniz notation for derivatives recovers the meaning its creator intended it to have (in standard analysis the Leibniz notation is used very sparingly because it is considered to be misleading). We can define (or derive, again, if desired) df/dr as the shadow of an infinitesimal increment in f divided by this increment, i.e. df / df(r) \ -- = sh ( ----- ) dr \ dr / where df(r) = f(r + dr) - f(r) for all infinitesimal dr. Let me give quickly a delicious nonstandard proof of the product rule. Compare to the standard proof given in any calculus textbook. Which proof do you prefer? d(f(r)g(r)) / f(r + dr)g(r + dr) - f(r)g(r) \ ----------- = sh ( ------------------------------ ) dr \ dr / / [f(r) + df(r)][g(r) + dg(r)] - f(r)g(r)\ = sh ( --------------------------------------- ) \ dr / / dg(r) df(r) df(r)dg(r) \ = sh ( f(r)----- + g(r)----- + --------- ) \ dr dr dr / dg(r) df(r) = f(r)----- + g(r)----- dr dr since the last term above is infinitesimal, its shadow is 0. It is almost as if nonstandard analysis allows us to replace limits and epsilon-delta definitions with plain arithmetic.
These require a bit more work and the transfer principle is used much more heavily than with other nonstandard structures. The idea is basically exactly what we would like it to be: we define sums where we add an unlimited number of terms and each term is the infinitesimal area f(r)dr. The fundamental theorem of calculus becomes almost obvious once the nonstandard terminology is invoked and interpreted in its full literality. And there is much, much more. Nonstandard analysis is a fascinating topic. As a closing remark, let me say that standard analysis works just as well as its nonstandard counterpart, although the former introduces interesting structures that deserve attention and development in their own right. Nonstandard analysis has the great appeal of allowing us to rigorously talk of intuitive topics, and perhaps, as Kurt Gödel says in the opening quote of this essay, it may some day replace standard analysis as the commonly accepted method of proof. Standards change with time, and what was accepted today may be shunned tomorrow. In mathematics, as in any other discipline or study, trends and customs change in accordance with the times, but most people resist change for various reasons. [TOP] In general, I was rather liberal with the use of ASCII symbols I employed in this e-mail, and I attempted to be consistent with the symbolism and to use somewhat meaningful names inasmuch as possible. Every time I introduced a new symbol I described its use, and I tried to be consistent and pick meaningful and intuitive symbols. In case you run across a symbol you cannot recognize and cannot find the specific instance where I defined it before I started to use it, here is a quick reference list. {} braces are used to list or describe the elements of a set R = {All real numbers} Q = {All rational numbers} Z = {All integers} N = {All positive integers, or natural numbers} C = {All complex numbers} P = {All prime numbers} I am a little loose on symbolism and sometimes use R to denote the set of natural real numbers and sometimes the field of real numbers. It's a slight point to take into account and normally causes no confusion. *R = {All hyperreal numbers} A = Any old set (not to be confused with the universal quantifier) *A = The extension of set A P(A) = {All subsets of A} called the power set of A ... "and so on." Sometimes this means "continue the pattern indefinitely" or "continue until you reach the next specified term." The particular usage should be clear from context. x Multiplication (never used as a variable). Occasionally, when I feel that the context is clear, juxtaposition will also be used to denote multiplication. a <= b "a is less than or equal to b" Really shorthand for the statement (a < b \/ a = b) E# Existential quantifier. "There exists (...) such that" A# Universal quantifier. "For all" rn The nth term of sequence r. hal(r) The halo around r gal(r) The galaxy around r sh(r) The shadow (standard part) of r '=' "Is infinitely close to" f(r) Standard function of r *f(r) Extended nonstandard function of r |

**Submit your own question to Dr. Math**

[**Privacy Policy**]
[**Terms of Use**]

Math Forum Home ||
Math Library ||
Quick Reference ||
Math Forum Search

© 1994-2015 Drexel University. All rights reserved.

http://mathforum.org/

The Math Forum is a research and educational enterprise of the Drexel University School of Education.