On Jul 20, 10:55 am, loom91 <loo...@gmail.com> wrote: > Hi, > > The students of statistics in my high-school have to do a project. The > topic we have chosen is Educational Performance in Kolkata Schools and > Its Relation to Gender and Gender Interactions. I outline our goals > below: > > 1)Collect data about Madhyamik results (a centralised exam organised > by the state education board to pass the 10th grade). We will sample > marks from all-boys schools, all-girls schools and co-educational > schools. We have selected the schools to be of similar average > educational performance. The average socio-economic condition of the > students is also approximately the same, hopefully eliminating that > variable. > > 2)We will examine various parameters of the marks, with emphasis on > the difference between the boys schools and the girls schools. These > include central tendencies, standard deviations, skewness etc. We will > also examine the myth that girls are relatively stronger at humanities > subjects while boys are stronger at science subjects. We may also > examine if the population is heterogeneous (that is, whether 'good > students' and 'bad students' form two distinctly distributed > populations). > > 3)We will then analyse the data obtained from the co-educational > schools and try to determine if the co-educational environment lessens > the difference between boys and girls. > > Now, we are all grossly inexperienced and this is the first time we > will attempt a statistical study instead of working in out classrooms > with provided data. I'm eagerly seeking suggestions from experienced > persons about possible pitfalls and how to make our results > statistically meaningful. > > I'm also looking for specific help on the following topics: > > i)What is a suitable measure of whether girls are stronger at some > subjects while boys at other subjects? Assuming you have the individual subject-test scores for each person, then let mu_bi be the popolation mean for boys in subject i, and mu_gi be the population mean for girls in subject i. You want to test the hypotheses (1) mu_bi > mu_gi and (2) mu_gi > mu_bi. performing this sort of hypothesis test is one of the first things that most statistics texts cover when considering two related populations. If exactly one test fails, then your evidence supports a gender bias in the subject. If both tests fail, then the evidence indicates no bias (or, more correctly stated, fails to show there is a bias. If this is a beginning statistics class, then you should be certain to tell them the difference between those two phrasings)
> > iii)Is there some easily available (preferably free) software that > will let me do all this analysis (brownie points for fitting > probability distributions and graphing)? It would be a nightmare to do > this by hand since we usually work with less than 50 data points > instead of several hundred. Off hand, I don't know of free software, but it is likely that your school has one or more of them already on the school's computers. For that matter, at this level even Excel will have sufficient tools (although you may need to install the statistical measurements pack.)
> iv)As it stand right now, we will sample two boys schools, two girls > schools and one coed school. Is this enough to be statistically > significant? How many data points should we sample from each school? > Should this be a constant or proportional to the total number of > students?
To compare the individual schools, of course, this is fine (as long as you have a decent sized sample from each). To compare TYPES of schools, then no, it is insufficient, as you only have have a few data points. As for the sample size (from each school), I would suggest a minimum of the larger of 30 or 5% of the student population (these numbers are the same at 600 students) This way you can likely use a normal approximation to score distributions, even if the scores are not normally distributed. Many statistical tests have simpler forms for normally distributed data. This may allow you to have the class do the analysis by hand. If you use a software package, then this need is not important, obviously. In any event, large sample sizes will improve your confidence levels in the hypothesis tests.
> v)Finally, is the whole proposition so glaringly ridiculous that all > serious statisticians will simply laugh at it? I hope not :redface: Not at all. It is great if you can use an example like this (as opposed to textbook work). This should be a very good problem for a first-year statistics class. Of course, if your results show a significant difference between the schools in your district, there may be some bruised egos in the administration(s), but that is problem outside the scope of statistics ;-)