Chapter 13 — Self-Attention - Signals to Transformers

(Roadmap — content to be written)

This chapter covers:

The attention mechanism: queries, keys, values
Scaled dot-product attention
Multi-head attention
Positional encoding
Why attention handles long-range dependencies better than convolution

Depends on: Chapter 8 (dot products), Chapter 11 (CNNs as comparison)