
MathyAIwithMike
Discover the Compress & Attend Transformer (CAT), a novel AI architecture that dynamically adjusts its efficiency *after* training. Unlike inflexible models locked into a fixed quality-compute trade-off, CAT allows users to choose their desired performance profile at test time. It uses a compressor to create learned vector representations from chunks of the input sequence, and a decoder attends to both local tokens and compressed representations of past chunks. This design enables parallelized training and efficient generation with a rolling memory system. CAT offers a path to more flexible and resource-aware AI.