(Roadmap — content to be written)
This chapter covers:
Patch embeddings: the image as a sequence
ViT architecture
Comparison with CNNs: inductive bias vs data efficiency
Positional encoding for 2D patches
Depends on: Chapters 11, 13
(Roadmap — content to be written)
This chapter covers:
Patch embeddings: the image as a sequence
ViT architecture
Comparison with CNNs: inductive bias vs data efficiency
Positional encoding for 2D patches
Depends on: Chapters 11, 13