
MathyAIwithMike
This episode explores a revolutionary paper proposing a new "Semantic Information Theory" that could redefine our understanding of AI. Challenging Shannon's classical bit-based approach, this theory uses the 'token' as its foundation, modeling LLMs as 'discrete-time channels with feedback'. It introduces novel measures like the 'Directed Rate-Distortion Function' for pre-training, the 'Directed Rate-Reward Function' for RLHF, and 'Semantic Information Flow' for inference. The theory redefines the token embedding space and introduces metrics like the Gromov-Wasserstein distance. Astonishingly, the paper derives the Transformer architecture from first principles, solidifying its theoretical importance.