Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 5 — Colour and the Imaging Pipeline

DhvaniAI

From Scene Light to Stored File

Every concept from Chapters 1–4 applied independently to each colour channel. This chapter adds colour, traces the full pipeline from photon to JPEG, and counts every transformation that corrupts the pixel value along the way.


15.1 From Grayscale to Colour

A grayscale image has one value per pixel. A colour image has three — one per channel:

Grayscale:  I[i, j]          — 2D array, shape (H, W)
Colour:     I[i, j, c]       — 3D array, shape (H, W, 3),  c ∈ {R, G, B}

Each channel is an independent grayscale image. The perceived colour comes from the ratio of R, G, B intensities at each pixel.

All physics from Chapters 1–4 applies to each channel independently:


25.2 The Bayer Filter — How Sensors Capture Colour

Camera sensors are physically monochrome. To capture colour, a Bayer colour filter array (CFA) is placed over the photosites:

R  G  R  G  R  G
G  B  G  B  G  B
R  G  R  G  R  G
G  B  G  B  G  B

Each photosite records only one channel. The pattern has 2× more green photosites than red or blue — matching the human visual system’s higher sensitivity to green.

The missing two channels at each pixel are interpolated from neighbouring photosites. This is demosaicing — itself a form of reconstruction, and itself subject to error at sharp edges (colour fringing).


35.3 Luminance — Converting to Grayscale Correctly

When colour is not needed, converting to grayscale with a perceptual luminance formula preserves the appearance of brightness:

L=0.2126R+0.7152G+0.0722BL = 0.2126 \cdot R + 0.7152 \cdot G + 0.0722 \cdot B

This is the ITU-R BT.709 standard. The large green coefficient reflects the eye’s sensitivity peak. Simple averaging (L=(R+G+B)/3L = (R+G+B)/3) gives wrong perceptual brightness — a fully saturated red looks darker than it should.

For most shape, texture, and edge detection tasks, luminance is all you need. Colour adds value when distinguishing objects that differ in hue but not brightness.


45.4 The Full Imaging Pipeline

Every digital image passes through this chain from scene to stored file:

Scene (continuous light)
  │
  ▼  [Lens] — focuses light onto sensor plane
Photosites (spatial sampling at grid Δx × Δy)
  │  + Shot noise added (Poisson, per photosite)
  │  + Dark current accumulates
  ▼
Bayer filter (one channel recorded per photosite)
  │
  ▼  [ADC] — voltage → integer  (quantization)
Raw image (one channel per photosite, 12–16 bit)
  │
  ▼  [ISP] — demosaicing, white balance, tone mapping, sharpening
RGB image (3 channels, 8–16 bit)
  │
  ▼  [Codec] — JPEG/PNG compression
Stored image (uint8, 3 channels)

Every arrow is a transformation that changes pixel values without changing the scene. The final stored pixel value reflects:

  1. Scene content — the signal we want

  2. Lighting conditions (aa and bb in the affine model)

  3. Sensor noise (shot, read, dark current)

  4. ADC quantization

  5. Demosaicing interpolation

  6. ISP choices (white balance, tone curve, sharpening)

  7. Compression artefacts

Items 2–7 are nuisances for any pixel-comparison algorithm.


55.5 Shading — When the Sensor Responds Non-Uniformly

Even with perfectly uniform scene illumination, pixel values vary across the image if the sensor has spatially non-uniform gain — called shading or vignetting.

The model:

I(i,j)=T(i,j)S(i,j)I(i,j) = T(i,j) \cdot S(i,j)

where TT is true reflectance and SS is the spatially varying shading field (S<1S < 1 near edges for vignetting).

This breaks the global affine model: aa and bb are no longer constants — they vary with position. A template extracted from the image centre will not match the same material at the edges, even if the material is identical.


65.6 The Complete Error Stack

Collecting all effects across Chapters 1–5:

EffectChapterMechanismChanges pixel values without changing scene?
Aliasing1Undersampling → phantom frequenciesYes
Shot noise2Poisson photon countingYes — random
Read noise2Amplifier/ADC electronicsYes — random
Quantization3ADC roundingYes — deterministic
Contrast change4Global aa changeYes
Brightness change4Global bb changeYes
Dynamic range clipping4SaturationYes — irreversible
Demosaicing error5CFA interpolationYes — at edges
White balance5Per-channel scalingYes
Shading5Spatially varying gainYes — position-dependent
JPEG compression5DCT quantizationYes

Every row in this table is a reason to distrust raw pixel values as direct measurements of scene content. Part II builds the case formally and develops the normalisation tools that address as many of these as possible.

Full imaging pipeline — light source → scene → lens → sensor → ISP → pixel values

Run: uv run python tutorials/00_introduction_to_digital_images/part6_colour_and_pipeline.py to generate RGB channel decomposition and shading artefact figures.


7Summary

ConceptKey fact
Colour image3D array (H, W, 3); each channel obeys same physics as grayscale
Bayer CFAOne channel per photosite; other two interpolated (demosaicing)
LuminanceL=0.2126R+0.7152G+0.0722BL = 0.2126R + 0.7152G + 0.0722B — perceptual brightness
Pipeline7+ transformations from photon to stored pixel; each one a nuisance
ShadingSpatially varying gain; breaks global affine model
Error stack11 effects that change pixel values without changing scene content

Next → Chapter 6 — The Problem with Raw Pixels: now that we understand the full error stack, Part III builds the systematic case for why raw pixel comparison fails and what can be done about it.