[I've posted this message to the sci.stat.edu UseNet newsgroup and to four mail lists where there may be interest. Apologies to recipients who receive multiple copies.]
Consider three questions:
1. Is the concept of "entity" the most fundamental concept of human reality?
2. Are the concepts of "entity" and "property of an entity" ap- propriate concepts to teach at the beginning of an introduc- tory statistics course?
3. What is a variable?
Consider a condensed version of how the concepts of entities, properties, and variables could be presented to students:
ENTITIES If you stop and observe your train of thought at this moment, you will probably agree that you think about "things". For example, during the course of a minute or so, you might think about, among other things, a friend, an appointment, todays weather, and an idea. Each of these things is an example of an "entity".
Many different types of entities exist, for example, - physical objects - processes - organisms - events - ideas - societal entities (e.g., educational institutions) - symbols - forces - waves - mathematical entities (e.g., sets, numbers, vectors).
People usually view entities as existing in two different places: in the external world and in their minds. We use entities in our minds mainly to stand for entities in the external world, much as we use a map to stand for its territory.
Most people begin to use the concept of "entity" when they are very young. Most of us use the concept automatically as a way of organizing the multitude of stimuli that enter our minds minute by minute when we are awake.
Since everything (every thing) can be usefully viewed as being an entity, the concept of "entity" may be the most fundamental con- cept of human reality.
Because people use the concept of "entity" almost entirely at an unconscious level, some people have difficulty grasping the fun- damental role that the concept plays in their thought.
The concept of "entity" is further concealed because it does not often appear directly in discussion in either (1) everyday life, (2) empirical research, or (3) statistics. Direct discussion is usually omitted because, when dealing with specific issues, it is usually not necessary to drill down all the way to the founda- tional concept and discuss "things" at such a basic level. In- stead, discussions (of specific issues) usually concern one or more particular *types* of entities, which are best referred to by their type names. For example, medical researchers often study a type of entity called "human beings".
However, as I argue below, the concept of "entity" serves as a foundation for most other concepts in statistics and empirical research. Therefore, discussion of the concept of "entity" at the beginning of an introductory statistics course is invaluable.
PROPERTIES OF ENTITIES Every entity has associated with it a set of attributes or "properties". For example, all human beings have thousands of different properties, two of which are "height" and "blood group".
For any particular entity, each of its properties has a "value". We usually report the value of a property with words, with sym- bols, or with numbers. For example, your height might be 5 feet 9 inches.
If we need to *know* the value of a property of an entity, we can apply an appropriate measuring instrument *to* the entity. If the instrument is measuring properly, it will return a measure- ment to us that is an estimate of the value of the property in the entity at the time of the measurement. For example, if we need to know the (value of the) height (property) of a person, we can apply a height-measuring instrument (e.g., a tape measure) to the person, and the instrument will give us a number that is an estimate of the persons height.
VARIABLES Empirical researchers and statisticians usually refer to proper- ties of entities as *variables*. That is, when researchers or statisticians refer to a variable, they are usually referring (either specifically or generally) to some property of some type of entity.
Thus the important statistical concept of "variable" can be de- fined in terms of the three more fundamental concepts of "enti- ty", "property of an entity", and "value of a property of an en- tity". A simple version of the definition is A "variable" is equivalent to a property of an entity.
I discuss other definitions of the concept of "variable" in the appendix.
USEFULNESS OF THE APPROACH I have argued that we can use the concepts of "entity" and "prop- erty of an entity" and "value of a property of an entity" to de- fine the concept of "variable". The three defining concepts are simple, intuitive, and fundamental. Thus I maintain that it is useful to introduce the three concepts at the beginning of an introductory statistics course, as a way of helping students to understand the concept of "variable".
I invite readers who disagree to present their views in the sci.stat.edu UseNet newsgroup (= EdStat-L).
LINK The above points are part of a broader discussion of an approach to the introductory statistics course available at
----------------------------------------------------------- Donald B. Macnaughton MatStat Research Consulting Inc. firstname.lastname@example.org Toronto, Canada Joint Statistical Mtgs, Session 201, Tuesday August 6, 2 PM -----------------------------------------------------------
APPENDIX: SOME DEFINITIONS OF THE CONCEPT OF "VARIABLE" To help evaluate the above characterization of a variable, it is useful to consider definitions of the concept that have been pro- posed by others.
Kruskal and Tanur (1978) lack entries for either "variable" or "random variable".
Kotz and Johnson (1982-1988) also do not define the term "vari- able". Their entry for "random variable" consists of "See PROB- ABILITY THEORY". In the entry for "probability theory" Heyde (1986) defines a random variable as a member of a certain class of real-valued functions of points in the sample space. Of course, the "points" are equivalent to entities.
Marriott (1990) gives the following definition: variable Generally any quantity which varies. More pre- cisely, a variable in the mathematical sense, i.e. a quan- tity which may take any one of a specified set of values. It is convenient to apply the same word to denote non- measurable characteristics, e.g. 'sex' is a variable in this sense since any human individual may take one of two 'val- ues', male or female.
Marriott defines variables in terms of the concepts of "quantity" and "characteristic". These two somewhat abstract concepts are equivalent to the more tightly delineated concept of "property of an entity".
Marriott makes no reference to the general concept of "entity". However, it is clear that entities lurk in the background of his definition. For example, whenever a "sex" variable has a value, an entity, a particular organism whose sex has been determined ("measured") is somewhere about. In fact, for virtually *all* variables, it is reasonable to see entities existing behind the variables. (The entities are whatever are associated with the rows in a standard computer-package data table, in which the col- umns represent variables.)
I believe that we should move the entities in statistical analy- sis to the foreground since, from the point of view of empirical researchers, the entities are an important and tangible aspect of the research, and as such should not be left lurking in the back- ground.
Vogt (1993) defines a variable as Variable Any finding (an attribute or characteristic) that can change, that can _vary_, or that can be expressed as more than one value or in _various_ values or categories. The opposite of a variable is a constant. For example, height: 5'7", 5'8", and so on; or religion: Catholic, Protestant, Jewish, Other; or experimental treat- ment: Drug A, Drug B, Drug C.
Vogt uses the concepts of "finding", "attribute", and "charac- teristic" to define variables. As with the defining concepts in Marriott's definition, these three somewhat abstract concepts are all equivalent to the more tightly delineated concept of "prop- erty of an entity".
Like Marriott, Vogt makes no direct reference to the concept of "entity" although there are entities lurking in the background in each of his three examples.
Modern definitions of the concept of "variable" are beginning to embrace the concept of "entity" although once referred to in a definition, entities are still often given short shrift in the rest of the discussion.
For example, Freedman, Pisani, Purves, and Adhikari define a var- iable as A _variable_ is a characteristic which changes from person to person in a study (1991, 40). These writers use the concept of "entity" in their definition but they seem to assume that only people can be entities. (One sus- pects, however, that this limitation is not their actual intent, and is instead an editing error.)
Moore, in his exemplary introductory statistics textbook, begins by defining "individuals" as the objects described by a set of data. Individuals may be people, but they may also be animals or things. He then defines a "variable" as any characteristic of an individual. A variable can take different values for different individuals (1995, 10).
Moore defines the concept of "individuals" (= "entities") in terms of the concepts of "objects", "people", "animals", and "things". Similarly, he defines the concept of "variable" in terms of the concept of "characteristic" (= "property").
(The choice of which *names* to use for the concepts "entity" and "property" is of some importance, with the choice perhaps being dictated by considerations of generality and ease of understand- ing. However, the present discussion is not about the choice of names for the concepts, but is about the concepts themselves, re- gardless of what we decide to call them.)
Note that Moore defines "individuals" in terms of the concept of a "set of data". If we assume that Moore is following the con- vention of defining each term in a conceptual system in terms of other more fundamental terms, his definition suggests that he views the concept of "set of data" as being more fundamental than the concept of "individual". Thus Moore appears to be taking a phenomenalistic approach.
My approach to defining the concept of "variable" is similar to Moore's except I suggest that it is useful to view the concept of "entity" (= "individual") as being more fundamental than the con- cept of a "set of data". In fact, I suggest we leave the concept of "entity" as a primitive. And although we can *illustrate* the concept of "entity" for students by discussing many examples of entities, we should tell students that the concept itself will, to avoid circularity, be left verbally undefined.
[As noted above, humans acquire the concept of "entity" as young children through non-verbal linking of consistent sets of stim- uli. Thus one could argue that the stimuli (sense data) that one receives are the fundamental units of reality. At a preconscious level this approach seems quite reasonable. But at the conscious level, which is the level at which all human discussion about statistics must operate, the concept of "entity" seems to hold sway as the concept that is the basis of all other concepts. (After all, even properties and sets of data are entities.) Thus at the discussion level it makes sense to designate the concept of "entity" as fundamental and therefore verbally undefined.]
Similarly, I believe that the concepts of "properties of enti- ties" and "values of properties of entities" are best left as primitives, defined solely through human experience and through discussion of examples.
On the other hand, I agree with Moore that we can give the con- cept "variable" a formal or informal verbal definition in terms of the concepts of entities, properties, and values.
In their comprehensive unified view of many of the main statisti- cal topics, Kendall, Stuart, and Ord characterize variables in a way that is similar to the approach described in this note al- though they use somewhat different underpinnings. In particular, in volume 1 in the first sentence of chapter 1 they assert that the concept of "population" is "the fundamental notion in statis- tical theory" (1987, 1994). They then give five examples of dif- ferent types of entities to illustrate what a population can be made of. However, although Kendall et al populate their popula- tions with entities, they do not recognize the concept of "enti- ty" as being a concept in its own right, more fundamental than the concept of "population". Thus they seem to want to start in the ball game at second base.
In the second paragraph of chapter 1 Kendall et al prepare for discussion of the concept of "variable" by discussing the concept of "properties". However, they concentrate on "properties of populations" as opposed to the more general concept of "proper- ties of entities". (Entities are more general than populations because all populations are also entities, but not vice versa.) I believe that we should first introduce students to the funda- mental concepts of "entity" and "property of an entity". Then we can define the concepts of "population" and "variable" in terms of those concepts. By building the discussion around what appear to be the most fundamental concepts of human reality, I believe we make the field of statistics substantially easier for students to understand.
REFERENCES Freedman, D., Pisani, R., Purves, R., and Adhikari, A. (1991), _Statistics_ (2nd ed.), New York: Norton.
Heyde, C. C. (1986), "Probability Theory (Outline)" in _Encyclo- pedia of Statistical Sciences_ (Vol. 7), ed. S. Kotz and N. L. Johnson, New York: John Wiley, pp. 248-252.
Kendall, M., Stuart, A., and Ord, J. K. (1987, 1994) _Kendall's Advanced Theory of Statistics,_ (5th and 6th eds, 3 vols), London: Charles Griffin, Edward Arnold.
Kotz, S. and Johnson, N. L., eds. (1982-1988), _Encyclopedia of Statistical Sciences_ (9 vols), New York: John Wiley.
Kruskal, W. H. and Tanur, J. M., eds. (1978), _International En- cyclopedia of Statistics_ (2 vols), New York: Free Press.
Marriott, F. H. C. (1990), _A Dictionary of Statistical Terms_ (5th ed.), Harlow, UK: Longman Scientific and Technical.
Moore, D. S. (1995), _The Basic Practice of Statistics,_ New York: Freeman.
Vogt, W. P. (1993), _Dictionary of Statistics and Methodology,_ Newbury Park, CA: Sage.