Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Statistics — From Data to Insight

DhvaniAI

A complete 14-chapter statistics course following Allen Downey’s Think Stats structure, built from scratch using NumPy and Pandas — no black-box helper classes.

Where This Fits in the Curriculum

This folder is not the starting point. See math/README.md for the full sequence.

probability/part0 → part1          ← start here (randomness, Bernoulli)
        ↓
statistics/ch01 → ch02             ← jump here (real data + histograms)
        ↓
probability/part2 → part6          ← return here (distributions, motivated now)
        ↓
statistics/ch03 → ch14             ← finish here (inference, modeling, testing)

The pivot: after ch02 (histograms), students ask “what math model fits this shape?” That question drives them back into probability to learn Normal, Poisson, CLT. Then they return here with the tools to go deeper.

Dataset

NSFG — National Survey of Family Growth (2002) A US government survey of ~13,000 women covering pregnancies, births, and health. We use it to ask increasingly sophisticated questions across all 14 chapters.

python data/download_nsfg.py   # run once to download raw data files

Chapter Map

ChapterFileCore QuestionTool Built
1ch01_eda.pyAre first babies born late?Data loading, cleaning
2ch02_distributions.pyWhat does the distribution look like?Histogram
3ch03_pmf.pyHow do we compare unequal groups?PMF from scratch
4ch04_cdf.pyWhat percentile is your birth weight?CDF from scratch
5ch05_modeling.pyCan 2 numbers describe this data?Normal, Exponential, Pareto
6ch06_pdf.pyWhat is the true shape?KDE, moments, skewness
7ch07_relationships.pyDoes age predict birth weight?Correlation, Covariance
8ch08_estimation.pyHow good is your guess?Bootstrap, sampling dist
9ch09_hypothesis_testing.pyIs the first-baby effect real?Permutation test
10ch10_least_squares.pyWhat’s the best line?OLS from scratch
11ch11_regression.pyWhat predicts preterm birth?Multiple + logistic regression
12ch12_time_series.pyHas birth weight changed over time?Autocorrelation, moving avg
13ch13_survival.pyHow long until next pregnancy?Kaplan-Meier
14ch14_analytic_methods.pyWhen can we skip simulation?CLT, normal approximation

Learning Flow

Every chapter follows the same structure:

Practical Question → Intuition → Math → Code (from scratch) → Simulation → Interpretation

Running a Chapter

cd /home/nithin/projects/cv-ml/math/statistics
python ch01_eda.py

Philosophy