Chapter 6 (Part III) introduced L2 normalisation and mean subtraction as tools for invariant matching. This page closes the linear algebra track of Part I by explaining the geometry behind those operations — and reveals that Pearson correlation is just a dot product between unit-mean vectors.
This is the applied capstone of the linear algebra series. The simulation-first build-up across
part1–part4is what this page summarises and applies to image-patch comparison.
1. An Image Patch Is a Vector¶
A grayscale patch has 9 pixel values. Unroll them into a column vector:
This is a point in 9-dimensional space. Every possible patch is a different point. Comparing two patches means measuring the distance or angle between two points in this space.
For an patch: . High-resolution patches live in high-dimensional spaces — a patch is a point in .
2. Dot Product — Measuring Agreement¶
The dot product measures pixel-by-pixel agreement. But it depends on the magnitude of both vectors — a brighter patch has a larger dot product with everything, regardless of shape similarity.
3. L2 Norm and Unit Vectors¶
Dividing by the norm gives a unit vector with .
All unit vectors lie on the surface of the unit hypersphere. Their dot product is:
where is the angle between them. Cosine similarity measures the angle — independent of magnitude. A contrast change () does not change the angle, so it does not change cosine similarity.
This is the geometric interpretation of L2 normalisation from Chapter 6.
4. Mean Subtraction — Projecting Out the Brightness Direction¶
The vector points in the “uniform brightness” direction. Projecting onto gives the mean; subtracting it removes the mean:
Geometrically, mean subtraction projects onto the hyperplane orthogonal to . A brightness offset () adds a component along — subtracted mean removes it.
After mean subtraction, the dot product is unaffected by uniform brightness changes. This is the geometric interpretation of mean subtraction from Chapter 6.
5. Pearson Correlation = Cosine of Mean-Subtracted Vectors¶
Combine both operations:
Pearson correlation is the cosine similarity of mean-subtracted vectors. It is invariant to any transform because:
Mean subtraction removes (projects out component)
L2 normalisation removes (cancels magnitude)
This geometric view makes the invariance obvious — and makes the ceiling obvious: once the pixel grid shifts (rotation, scale), the vectors and have different components shuffled, and no amount of normalisation restores alignment.
6. Orthogonality and Transforms¶
Two vectors are orthogonal when — they are geometrically perpendicular, carrying completely independent information.
An orthogonal transform preserves norms and dot products:
The Fourier transform is an orthogonal transform — it decomposes an image into orthogonal frequency components (the basis of Chapter 1’s frequency analysis) while preserving energy (Parseval’s theorem).
7. The Manifold Hypothesis — Why High-Dimensional Pixel Space Is Nearly Empty¶
An image is a point in . For a image that is . The number of possible images is 2564096 — astronomically large.
But natural images occupy a tiny, thin slice of this space. Most points in are random noise — not images of anything real. The set of natural images forms a low-dimensional manifold embedded in the high-dimensional pixel space.
This is the manifold hypothesis, and it explains why learned features work: CNNs learn to map the high-dimensional pixel space to a lower-dimensional representation that captures where you are on the natural image manifold — not where you are in the raw pixel cube.
Simulation: the linear algebra series —
part1throughpart4— develops every step on this page from scratch with vector/matrix code. Read those first if you want the build-up; this page is the destination.
Summary¶
| Concept | Geometric meaning | Connection to Ch 6 |
|---|---|---|
| Pixel patch as vector | Point in | Enables geometric comparison |
| Dot product | Pixel-by-pixel agreement; magnitude-dependent | Raw SSD without normalisation |
| L2 norm | Vector length | Divisor in L2 normalisation |
| Cosine similarity | Angle between vectors; magnitude-independent | Handles contrast () |
| Mean subtraction | Project out component | Handles brightness offset () |
| Pearson correlation | of mean-subtracted vectors | Full affine invariance |
| Manifold hypothesis | Natural images = thin slice of pixel space | Motivates learned features |
Next → Chapter 9 — Convolutions and Filtering: we move from comparing patches to learning features that are invariant to spatial transforms.