The Question¶
“We measured 9,148 babies. But we want to say something about ALL babies. How good is our estimate?”
Everything computed so far is a sample statistic — a number computed from the data we have. What we really want is a population parameter — the true value in the real world.
The sample mean is our best guess for the population mean . But how far off is it likely to be?
The Estimation Game¶
Suppose you know a population has a normal distribution with unknown and . You draw a random sample of observations. What is your best estimate of ?
Answer: the sample mean . This is the maximum likelihood estimator of for a normal distribution.
But there’s a subtlety for variance. The naive estimate is:
This is biased — it systematically underestimates the true . The unbiased estimator divides by instead:
Why ? Because we estimated from the data. That used up one “degree of freedom” — the deviations sum to zero, so the last one is not free. Dividing by corrects for this.
For large , the difference is negligible. For small (say ), it matters.
Sampling Distributions¶
If you draw many samples and compute the mean of each, the distribution of those means is called the sampling distribution of the mean.
Key result: for a population with mean and std , the sampling distribution of the mean has:
The standard error shrinks as — doubling sample size cuts error by .
Standard Error¶
The standard error (SE) measures how variable your estimate is:
For birth weight: lbs, , so lbs. Our estimate of the mean is likely within 0.015 lbs of the true mean.
For the pregnancy length difference (13 hours), we need to compute SE for a difference of means — this is what determines whether 13 hours is detectable or noise.
Sampling Bias¶
Not all estimation errors are random. Bias is a systematic error that doesn’t average away with more data.
Example: the class size paradox from Chapter 3. If you sample students and ask their class size, you oversample large classes. The estimate of mean class size is biased upward — taking more data makes it more precise but not more accurate.
In NSFG: the dataset uses survey weights (finalwgt) to correct for
deliberate oversampling of minority groups. If you ignore the weights, your
estimates of national statistics will be biased.
Bootstrap: Simulation-Based Standard Errors¶
We often can’t compute SE analytically. The bootstrap gives us a simulation-based alternative:
Draw bootstrap samples: resample with replacement from your data
Compute the statistic (mean, median, correlation, anything) for each sample
The standard deviation of the estimates is the bootstrap SE
def bootstrap_mean(data, n_boot=1000):
estimates = []
for _ in range(n_boot):
sample = np.random.choice(data, size=len(data), replace=True)
estimates.append(sample.mean())
return np.std(estimates)The bootstrap works because resampling with replacement simulates what would happen if you repeated the study with new data from the same population.
Exponential Distribution Estimation¶
For an exponential distribution, the MLE of is .
But this estimator is biased! The unbiased estimator is:
The bias disappears as , but for small samples it matters.
Exercises¶
Simulate drawing 100 samples of size from birth weight. Plot the sampling distribution of the mean.
Compute the standard error of the mean analytically and compare to the simulated std of sample means.
Bootstrap the median birth weight. What is the bootstrap SE?
Compute the mean with and without survey weights (
finalwgt). How different are they?Show that the variance estimator is biased: simulate many samples, compute variance each time, average. Does it equal the true variance?
Glossary¶
sample statistic — a value computed from data (e.g., sample mean )
population parameter — the true value in the underlying population (e.g., )
estimator — a formula for computing a population parameter from sample data
bias — systematic error;
unbiased estimator — ; right on average
sampling distribution — distribution of a statistic over many repeated samples
standard error (SE) — standard deviation of the sampling distribution;
bootstrap — resample with replacement to simulate the sampling distribution
degrees of freedom — number of values free to vary; affects bias in variance estimation