Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 6 — The Problem with Raw Pixels

DhvaniAI

Why Direct Pixel Comparison Fails — and Where Normalisation Hits Its Ceiling

Part I built a complete picture of everything that corrupts a pixel value. Part II asks: given that corruption, can we still compare images reliably? The answer is: sometimes yes, sometimes no — and the boundary between those two cases is the most important concept in classical computer vision.


16.1 The Experiment

Pixel comparison means: given a reference patch TT and a query image II, find the location in II that best matches TT.

The simplest metric is Sum of Squared Differences (SSD):

SSD(T,Ip)=i,j(T(i,j)Ip(i,j))2\text{SSD}(T, I_p) = \sum_{i,j} (T(i,j) - I_p(i,j))^2

where IpI_p is the patch at position pp. A perfect match gives SSD = 0.

Test it under three real-world transforms:

TransformReal-world causeWhat changes
Brightness ×0.7Lighting change, cloud coverAll pixel values scale
Rotation 5°Camera tilt, part misalignmentPixel grid shifts
Scale 90%Different camera distanceObject size in pixels

SSD fails on all three. The question is: which failures are fixable by math, and which are not?


26.2 Fixable: The Affine Lighting Model

From Chapter 4, different lighting produces I2=aI1+bI_2 = aI_1 + b. This is a mathematical relationship between pixel values — the pixels are still in the same positions, just with different values. Math can compensate.

Step 1 — L2 normalisation (handle contrast, aa)

Divide each patch by its L2 norm:

T^=TT,I^p=IpIp\hat{T} = \frac{T}{\|T\|}, \quad \hat{I}_p = \frac{I_p}{\|I_p\|}

If I2=aI1I_2 = aI_1, then I^2=I^1\hat{I}_2 = \hat{I}_1 — the scale factor aa cancels. Contrast changes are removed.

Step 2 — Mean subtraction (handle brightness offset, bb)

Subtract the patch mean before normalising:

T~=TTˉ,I~p=IpIˉp\tilde{T} = T - \bar{T}, \quad \tilde{I}_p = I_p - \bar{I}_p

If I2=I1+bI_2 = I_1 + b, then I~2=I~1\tilde{I}_2 = \tilde{I}_1 — the offset bb cancels. Brightness changes are removed.

Step 3 — Pearson correlation (handle full affine aI+baI + b)

Combine both: subtract mean, then normalise. The result is the Pearson correlation coefficient, which equals OpenCV’s TM_CCOEFF_NORMED:

r=i,j(TTˉ)(IpIˉp)(TTˉ)2(IpIˉp)2r = \frac{\sum_{i,j}(T - \bar{T})(I_p - \bar{I}_p)}{\sqrt{\sum(T-\bar{T})^2 \cdot \sum(I_p-\bar{I}_p)^2}}

r=1r = 1 for a perfect match under any aI+baI + b transform. This is the best pixel-level comparator possible under the affine lighting model.


36.3 The Normalisation Ceiling

Pearson correlation handles I2=aI1+bI_2 = aI_1 + b perfectly. But the real world produces transforms that move which pixel is where:

TransformWhat breaksFixable by normalisation?
Brightness change (bb)Pixel values✓ Yes — mean subtraction
Contrast change (aa)Pixel values✓ Yes — L2 normalisation
RotationPixel positions✗ No
ScalePixel positions✗ No
Viewpoint changePixel positions✗ No
OcclusionSome pixels missing✗ No

Normalisation operates on pixel values at fixed positions. Once the grid shifts — due to rotation, scale, or viewpoint — there is no per-pixel math that can compensate. You need something that is invariant to spatial transforms, not just intensity transforms.

This is the normalisation ceiling.


46.4 What Lies Beyond the Ceiling

The normalisation ceiling is where classical template matching ends and feature-based methods begin.

Hand-crafted features (SIFT, HOG): describe local structure in a way that is explicitly invariant to rotation and scale — by design.

Learned features (CNNs): learn to extract representations that are invariant to whatever transforms appear in the training data — by optimisation.

Both approaches abandon the pixel value as the fundamental unit of comparison. Instead they compute a descriptor — a vector of derived quantities that encodes what is at a location, not what numerical value the pixel happens to have.

This transition — from pixels to features — is the subject of Parts IV and V.

Interpolation under rotation — pixel values change even when scene content is the same

Run: uv run python tutorials/02_why_not_pixels/ to run all 7 parts covering pixel failure → normalisation → Pearson correlation → ceiling → feature motivation.


5Summary

ConceptKey fact
SSDFails under any illumination or spatial change
L2 normalisationRemoves contrast (aa)
Mean subtractionRemoves brightness offset (bb)
Pearson / TM_CCOEFF_NORMEDFull affine invariance — the best pixel-level comparator
Normalisation ceilingRotation, scale, viewpoint cannot be fixed by per-pixel math
Beyond the ceilingFeatures — hand-crafted (SIFT) or learned (CNNs)

Next → Chapter 9 — Convolutions and Filtering: Part IV begins — moving from comparing patches to learning features that are invariant to spatial transforms.

Want the math first? Two applied capstones in Part I deepen the material from Chapters 2 and 6: