../
2026-01-12
Manifold Hypothesis
- Consider the CIFAR-10 dataset
- Each image is of the dimension $32 \times 32 \times 3$
- We can consider this dataset as a single vector of dimension $[0, 255]^{3072}$
- If we were to draw uniformly from this, we would never accidentally stumble upon a sample from CIFAR-10
- Why?
- One reason is that the pixels aren’t completely unrelated as we are drawing them
- If a pixel is $0, 0, 0$ then its neighbors are most likely $0, 0, 0$ or very close that white shade
- It is highly unlikely that we can replicate this dependence by uniformly sampling each pixel independently
- The Manifold Hypothesis, in layman’s terms, is that this higher dimensional data lives in a lower dimensional latent space of some kind