Why Direct Pixel Comparison Fails — and Where Normalisation Hits Its Ceiling¶
Part I built a complete picture of everything that corrupts a pixel value. Part II asks: given that corruption, can we still compare images reliably? The answer is: sometimes yes, sometimes no — and the boundary between those two cases is the most important concept in classical computer vision.
16.1 The Experiment¶
Pixel comparison means: given a reference patch and a query image , find the location in that best matches .
The simplest metric is Sum of Squared Differences (SSD):
where is the patch at position . A perfect match gives SSD = 0.
Test it under three real-world transforms:
| Transform | Real-world cause | What changes |
|---|---|---|
| Brightness ×0.7 | Lighting change, cloud cover | All pixel values scale |
| Rotation 5° | Camera tilt, part misalignment | Pixel grid shifts |
| Scale 90% | Different camera distance | Object size in pixels |
SSD fails on all three. The question is: which failures are fixable by math, and which are not?
26.2 Fixable: The Affine Lighting Model¶
From Chapter 4, different lighting produces . This is a mathematical relationship between pixel values — the pixels are still in the same positions, just with different values. Math can compensate.
Step 1 — L2 normalisation (handle contrast, )¶
Divide each patch by its L2 norm:
If , then — the scale factor cancels. Contrast changes are removed.
Step 2 — Mean subtraction (handle brightness offset, )¶
Subtract the patch mean before normalising:
If , then — the offset cancels. Brightness changes are removed.
Step 3 — Pearson correlation (handle full affine )¶
Combine both: subtract mean, then normalise. The result is the Pearson correlation coefficient, which equals OpenCV’s TM_CCOEFF_NORMED:
for a perfect match under any transform. This is the best pixel-level comparator possible under the affine lighting model.
36.3 The Normalisation Ceiling¶
Pearson correlation handles perfectly. But the real world produces transforms that move which pixel is where:
| Transform | What breaks | Fixable by normalisation? |
|---|---|---|
| Brightness change () | Pixel values | ✓ Yes — mean subtraction |
| Contrast change () | Pixel values | ✓ Yes — L2 normalisation |
| Rotation | Pixel positions | ✗ No |
| Scale | Pixel positions | ✗ No |
| Viewpoint change | Pixel positions | ✗ No |
| Occlusion | Some pixels missing | ✗ No |
Normalisation operates on pixel values at fixed positions. Once the grid shifts — due to rotation, scale, or viewpoint — there is no per-pixel math that can compensate. You need something that is invariant to spatial transforms, not just intensity transforms.
This is the normalisation ceiling.
46.4 What Lies Beyond the Ceiling¶
The normalisation ceiling is where classical template matching ends and feature-based methods begin.
Hand-crafted features (SIFT, HOG): describe local structure in a way that is explicitly invariant to rotation and scale — by design.
Learned features (CNNs): learn to extract representations that are invariant to whatever transforms appear in the training data — by optimisation.
Both approaches abandon the pixel value as the fundamental unit of comparison. Instead they compute a descriptor — a vector of derived quantities that encodes what is at a location, not what numerical value the pixel happens to have.
This transition — from pixels to features — is the subject of Parts IV and V.

Run:
uv run python tutorials/02_why_not_pixels/to run all 7 parts covering pixel failure → normalisation → Pearson correlation → ceiling → feature motivation.
5Summary¶
| Concept | Key fact |
|---|---|
| SSD | Fails under any illumination or spatial change |
| L2 normalisation | Removes contrast () |
| Mean subtraction | Removes brightness offset () |
Pearson / TM_CCOEFF_NORMED | Full affine invariance — the best pixel-level comparator |
| Normalisation ceiling | Rotation, scale, viewpoint cannot be fixed by per-pixel math |
| Beyond the ceiling | Features — hand-crafted (SIFT) or learned (CNNs) |
Next → Chapter 9 — Convolutions and Filtering: Part IV begins — moving from comparing patches to learning features that are invariant to spatial transforms.
Want the math first? Two applied capstones in Part I deepen the material from Chapters 2 and 6:
probability/applied_sensors.md— derives the sensor noise model from Bernoulli → Binomial → Poisson → Normal → CLT.
linear_algebra/applied_images.md— the geometry behind L2 normalisation, mean subtraction, and Pearson correlation.