Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Linear Algebra Applied — Images as Vectors, Patches as Points

DhvaniAI

Chapter 6 (Part III) introduced L2 normalisation and mean subtraction as tools for invariant matching. This page closes the linear algebra track of Part I by explaining the geometry behind those operations — and reveals that Pearson correlation is just a dot product between unit-mean vectors.

This is the applied capstone of the linear algebra series. The simulation-first build-up across part1part4 is what this page summarises and applies to image-patch comparison.


1. An Image Patch Is a Vector

A 3×33 \times 3 grayscale patch has 9 pixel values. Unroll them into a column vector:

x=[x1,x2,,x9]R9\mathbf{x} = [x_1, x_2, \ldots, x_9]^\top \in \mathbb{R}^9

This is a point in 9-dimensional space. Every possible 3×33 \times 3 patch is a different point. Comparing two patches means measuring the distance or angle between two points in this space.

For an m×nm \times n patch: xRmn\mathbf{x} \in \mathbb{R}^{mn}. High-resolution patches live in high-dimensional spaces — a 64×6464 \times 64 patch is a point in R4096\mathbb{R}^{4096}.


2. Dot Product — Measuring Agreement

xy=ixiyi=xycosθ\mathbf{x} \cdot \mathbf{y} = \sum_i x_i y_i = \|\mathbf{x}\| \|\mathbf{y}\| \cos\theta

The dot product measures pixel-by-pixel agreement. But it depends on the magnitude of both vectors — a brighter patch has a larger dot product with everything, regardless of shape similarity.


3. L2 Norm and Unit Vectors

x=ixi2\|\mathbf{x}\| = \sqrt{\sum_i x_i^2}

Dividing by the norm gives a unit vector x^=x/x\hat{\mathbf{x}} = \mathbf{x} / \|\mathbf{x}\| with x^=1\|\hat{\mathbf{x}}\| = 1.

All unit vectors lie on the surface of the unit hypersphere. Their dot product is:

x^y^=cosθ\hat{\mathbf{x}} \cdot \hat{\mathbf{y}} = \cos\theta

where θ\theta is the angle between them. Cosine similarity measures the angle — independent of magnitude. A contrast change (y=ax\mathbf{y} = a\mathbf{x}) does not change the angle, so it does not change cosine similarity.

This is the geometric interpretation of L2 normalisation from Chapter 6.


4. Mean Subtraction — Projecting Out the Brightness Direction

The vector 1=[1,1,,1]\mathbf{1} = [1, 1, \ldots, 1]^\top points in the “uniform brightness” direction. Projecting x\mathbf{x} onto 1\mathbf{1} gives the mean; subtracting it removes the mean:

x~=xxˉ1\tilde{\mathbf{x}} = \mathbf{x} - \bar{x}\mathbf{1}

Geometrically, mean subtraction projects x\mathbf{x} onto the hyperplane orthogonal to 1\mathbf{1}. A brightness offset (y=x+b1\mathbf{y} = \mathbf{x} + b\mathbf{1}) adds a component along 1\mathbf{1} — subtracted mean removes it.

After mean subtraction, the dot product is unaffected by uniform brightness changes. This is the geometric interpretation of mean subtraction from Chapter 6.


5. Pearson Correlation = Cosine of Mean-Subtracted Vectors

Combine both operations:

r(x,y)=x~y~x~y~=cosθx~,y~r(\mathbf{x}, \mathbf{y}) = \frac{\tilde{\mathbf{x}} \cdot \tilde{\mathbf{y}}}{\|\tilde{\mathbf{x}}\| \|\tilde{\mathbf{y}}\|} = \cos\theta_{\tilde{\mathbf{x}}, \tilde{\mathbf{y}}}

Pearson correlation is the cosine similarity of mean-subtracted vectors. It is invariant to any aI+baI + b transform because:

This geometric view makes the invariance obvious — and makes the ceiling obvious: once the pixel grid shifts (rotation, scale), the vectors x~\tilde{\mathbf{x}} and y~\tilde{\mathbf{y}} have different components shuffled, and no amount of normalisation restores alignment.


6. Orthogonality and Transforms

Two vectors are orthogonal when xy=0\mathbf{x} \cdot \mathbf{y} = 0 — they are geometrically perpendicular, carrying completely independent information.

An orthogonal transform QQ preserves norms and dot products:

Qx=x,(Qx)(Qy)=xy\|Q\mathbf{x}\| = \|\mathbf{x}\|, \quad (Q\mathbf{x}) \cdot (Q\mathbf{y}) = \mathbf{x} \cdot \mathbf{y}

The Fourier transform is an orthogonal transform — it decomposes an image into orthogonal frequency components (the basis of Chapter 1’s frequency analysis) while preserving energy (Parseval’s theorem).


7. The Manifold Hypothesis — Why High-Dimensional Pixel Space Is Nearly Empty

An m×nm \times n image is a point in Rmn\mathbb{R}^{mn}. For a 64×6464 \times 64 image that is R4096\mathbb{R}^{4096}. The number of possible images is 2564096 — astronomically large.

But natural images occupy a tiny, thin slice of this space. Most points in R4096\mathbb{R}^{4096} are random noise — not images of anything real. The set of natural images forms a low-dimensional manifold embedded in the high-dimensional pixel space.

This is the manifold hypothesis, and it explains why learned features work: CNNs learn to map the high-dimensional pixel space to a lower-dimensional representation that captures where you are on the natural image manifold — not where you are in the raw pixel cube.

Simulation: the linear algebra seriespart1 through part4 — develops every step on this page from scratch with vector/matrix code. Read those first if you want the build-up; this page is the destination.


Summary

ConceptGeometric meaningConnection to Ch 6
Pixel patch as vectorPoint in Rmn\mathbb{R}^{mn}Enables geometric comparison
Dot productPixel-by-pixel agreement; magnitude-dependentRaw SSD without normalisation
L2 normVector lengthDivisor in L2 normalisation
Cosine similarityAngle between vectors; magnitude-independentHandles contrast (aa)
Mean subtractionProject out 1\mathbf{1} componentHandles brightness offset (bb)
Pearson correlationcosθ\cos\theta of mean-subtracted vectorsFull affine invariance
Manifold hypothesisNatural images = thin slice of pixel spaceMotivates learned features

Next → Chapter 9 — Convolutions and Filtering: we move from comparing patches to learning features that are invariant to spatial transforms.