EducationWho was student and why do we care so much about his t-test?1
Section snippets
Detecting differences
The fundamental question asked in most scientific investigations is if some intervention has an effect on a measurable, biologically important parameter. Alternatively, various populations that are observed are tested to determine if they have characteristics that are the same or different. As an example, we have asked the question: when evaluating a cohort of individuals about to undergo weight loss surgery are the males heavier than the females? For continuous data such as body weight, the
Student
Student was the nom de plume of William Sealy Gossett who worked as the statistician for the Guinness brewery [1]. Neither well known, nor an academic, he was reluctant to publish under his actual name. Gossett felt insecure about his analysis of sample size effects on significance testing. He chose to remain anonymous until R.A. Fisher, one of the best-known statisticians of the day, publicly validated and refined Student’s analysis [2].
The Guinness brewery made an effort to hire highly
Central limit theorem
Sample size determines the probability that the sample mean is the same as the population mean. Figure 2 illustrates this concept. Figure 2A and C demonstrates the scattergram and frequency distribution of 100 repeated calculations of the mean using five randomly selected values from a data set of 1067 patients undergoing weight loss surgery. Figure 2B and D demonstrates the effect seen when 100 values are included in each mean. These figures demonstrate the effect of the central limit
t-values
Given the uncertainties present when sample sizes are small, when determining statistical significance one must account for the difference between mean values, the scatter inherent in the data, i.e., its standard deviation, and the sample sizes. Student combined these concepts into a single equation that calculated a t-value from which statistical significance could be determined [3]: t is the t-statistic whose numerical value is proportional to the probability that the
Central limit theorem
The means of random samples from any distribution will themselves have a normal distribution. Increasing the number of values increases the probability that the calculated mean value is the same as the real one for the population. Stated another way: The probability that any calculated mean value is the same as the actual total population mean value decreases as the sample size becomes smaller.
Gaussian distribution
The familiar bell-shaped curve.
Homoskedasticity
Variances between two groups to be compared are equal (similarly,
References (4)
“Student” as a man
Biometrika
(1939)Studies in the history of probability and statistics XXSome early correspondence between W. S. Gossett, R. A. Fisher and Karl Pearson, with notes and comments
Biometrika
(1968)