On Fri, 20 Jul 2007 14:55:28 -0000, loom91 <firstname.lastname@example.org> wrote:
>Hi, > >The students of statistics in my high-school have to do a project. The >topic we have chosen is Educational Performance in Kolkata Schools and >Its Relation to Gender and Gender Interactions. I outline our goals >below: > >1)Collect data about Madhyamik results (a centralised exam organised >by the state education board to pass the 10th grade). We will sample >marks from all-boys schools, all-girls schools and co-educational >schools. We have selected the schools to be of similar average >educational performance. The average socio-economic condition of the >students is also approximately the same, hopefully eliminating that >variable. > >2)We will examine various parameters of the marks, with emphasis on >the difference between the boys schools and the girls schools. These >include central tendencies, standard deviations, skewness etc. We will >also examine the myth that girls are relatively stronger at humanities >subjects while boys are stronger at science subjects. We may also >examine if the population is heterogeneous (that is, whether 'good >students' and 'bad students' form two distinctly distributed >populations). > >3)We will then analyse the data obtained from the co-educational >schools and try to determine if the co-educational environment lessens >the difference between boys and girls. > >Now, we are all grossly inexperienced and this is the first time we >will attempt a statistical study instead of working in out classrooms >with provided data. I'm eagerly seeking suggestions from experienced >persons about possible pitfalls and how to make our results >statistically meaningful. > >I'm also looking for specific help on the following topics: > >i)What is a suitable measure of whether girls are stronger at some >subjects while boys at other subjects? I'm thinking of comparing the >percentage of total marks obtained in one subject, standardised >against the whole population. For example, consider the variable X = >percentage of total marks earned in History+Geography. Next, we define >the standardised (wrt the entire population) variate corresponding to >X, let it be Z. Now we compute the mean of Z over the girls schools >([itex]E_1(Z)[/itex]) and the mean over the boys schools ([itex]E_2(Z) >[/itex]). > >If the first value is larger than the second value (it seems one will >have to be positive and the other negative), then we may say that >girls prefer humanities more over other subjects than boys. Next we >can do the same analysis on the boys vs girls population in coed >schools and see if the difference is less. By using the absolute marks >instead of expressing it as percentage of total marks, we can also >compare the relative performance (as opposed to preference) of boys >and girls in humanities. The same can be done for languages and >sciences. Is this a statistically sound measure (unlikely, since I >just made it up)? What are the alternatives? > >ii) What is a good way of identifying whether the population in a >school indeed consists of discreet stratas? This could be good >students/bad students (there is indication from previous results that >this may be the case) or in coed schools boys/girls (very likely the >case). In case of coed schools, there may even be four stratas: good >boys, good girls, bad boys, bad girls. It will be interesting to study >whether bad boys vs girls show more difference than good boys vs >girls. All this sounds very pretty, but I don't know how to separate >the population into stratas. > >iii)Is there some easily available (preferably free) software that >will let me do all this analysis (brownie points for fitting >probability distributions and graphing)? It would be a nightmare to do >this by hand since we usually work with less than 50 data points >instead of several hundred. > >iv)As it stand right now, we will sample two boys schools, two girls >schools and one coed school. Is this enough to be statistically >significant? How many data points should we sample from each school? >Should this be a constant or proportional to the total number of >students? > >v)Finally, is the whole proposition so glaringly ridiculous that all >serious statisticians will simply laugh at it? I hope not :redface: > >I hope you will help out. We have in all probability bitten off more >than we can chew. But we are hoping to do some meaningful work >publishable in a journal, so we need all the help we can get. I will >also be very grateful if you give me the email of someone who may be >able and willing to help. We will be marked for this in our school- >finishing (and career determining) central exams, so this is very >important to our whole class. Thanks a lot. > >Molu
Interesting project. You are so right to note that a real-world project is much more complex than doing some calculations.
An important point is what I might called "impeccable honesty". Report what you do, and report it honestly, without exaggeration. There are limits to what you can do, for various reasons (obviously including time constraints). To the extent you accurately describe what you do and its limitations, then you have made a contribution. It does not matter whether you do or don't find some statistically significant effect. Even if the stats suggest that you do, the conclusion may be undermined by hidden assumptions; if they are hidden assumptions that should be fairly obvious concerns, then it really is your obligation to point them out.
As a small example from what you wrote above... You said that a certain comparison of scores would allow you to say "... then we may say that girls prefer humanities more over other subjects than boys.". No, not at all. It is about "scores" or "performance", not about "preference". You have given an example of how you tried to eliminate one "confounding" variable. But, for all you know some of what you see will be due to one teacher who is different. It is almost impossible for you to deal with that. But it is a reminder of the issue of hidden variables.
What you are doing is science. A well done statistical survey is a good scientific project. One asks a question, tries to formulate an approach, and then collects some data. In the real world, it is common that the initial effort leads to more questions than answers; it is an iterative process. You wont have time for that, but you can remember that how you present the work and how you analyze the results you obtain are as important (more important!) than whether you find any particular result. So emphasize good clear thinking at all steps. Ending with a list of questions that come up is probably a sign of success.
There are some web sites that offer calculation tools online. They also offer various degrees of explanation. I have no direct experience with them, and thus no advice about which are good or easy to use, or whatever. So you might just explore. Here are some I am aware of: