27 episodes
Deep learning paper reviews, math puzzles, mathy discussions about AI.
Explore AI21's Jamba model, Wordtune, and Maestro in the latest "Explainable" episode. Discover why many AI projects fail to reach production, focusing on the 'last mile' problem: integrating models into workflows. The episode also highlights a new Discord space for AI discussions and teases an upcoming interview with NASA's Hila Paz on satellite image compression, questioning the roles of dimensionality reduction and ChatGPT. Learn about bridging the gap between AI research and deployment, and the challenges of data quality and model explainability.
This episode explores a fascinating paper that challenges the assumption that more reasoning always leads to better results in large language models. The study reveals an optimal reasoning length, beyond which accuracy declines due to 'overthinking.' The research, conducted on smaller models using mathematical datasets, suggests that incorrect answers tend to be longer and that the shortest generated answer is frequently correct. Practical takeaways include 'short-first' and 'aware-length stopping' strategies. While limited by model size and dataset scope, the core message emphasizes the importance of efficient reasoning over sheer token volume.
This episode dives into a critical analysis of LLM benchmarks, revealing significant flaws highlighted in a recent article. The discussion covers issues like researchers not running benchmarks themselves, inherent limitations within benchmarks, and the focus on older benchmarks like HumanEval and MMLU. Newer benchmarks like Swe-Bench and Aider Benchmark are also explored, alongside the relevance of cultural and ethical gaps. The episode summarizes the article as a systematic mapping of flaws, excelling in diagnosing issues but lacking concrete solutions, leaving listeners to ponder which problems are solvable.
Mike interviews Algieba about the Hierarchical Reasoning Model (HRM), a novel AI architecture inspired by the brain. HRM uses hierarchical organization, with high-level strategic planning and low-level execution, for more efficient and robust reasoning. It addresses limitations of Chain-of-Thought by performing reasoning internally and tackles challenges in recurrent neural networks with hierarchical convergence. HRM avoids Backpropagation-Through-Time, using Deep Equilibrium Models for efficient training and Adaptive Computational Time for dynamic resource allocation. Its impressive performance suggests a promising path towards truly intelligent machines.
This episode dives into the "Mixture-of-Recursions" (MoR) paper, exploring how it enhances large language model efficiency. MoR combines parameter efficiency and adaptive computation by dynamically adjusting the 'thinking' depth for each token. Simpler tokens require fewer passes, while complex ones get more attention. This approach, coupled with smart KV-cache management, reduces memory and computational costs, leading to faster training and inference. The discussion covers routing strategies (expert-choice vs. token-choice) and KV-cache management techniques, highlighting MoR's potential to achieve better performance with fewer resources.
Orus, an AI Model Compression Specialist, joins MathyAIwithMike to discuss CompLLM, a novel approach to compressing long contexts for Large Language Models (LLMs). CompLLM addresses the quadratic computational cost of self-attention by dividing long contexts into smaller, independent segments, compressing each separately. This enables efficiency, scalability, and reusability. The innovative training process uses distillation, focusing on aligning the internal activations of the LLM. This ensures the compressed representation retains essential information, making long-context LLMs more practical.
Explore a groundbreaking paper on fine-tuning Large Language Models (LLMs) using Evolution Strategies (ES) at scale, bypassing traditional gradient-based methods. Discover how innovations like "virtual noise" and "in-place" perturbations overcome memory limitations, making LLM fine-tuning more accessible. Learn how this forward-pass-only system democratizes LLM optimization, enabling researchers and practitioners to fine-tune LLMs on less powerful hardware. Gain insights into the implications of this paradigm shift.
Explore how LLMs can move beyond rigid architectures! Discover 'Chain-of-Layers' (CoLa), a method allowing dynamic path construction through layers, skipping or looping for optimal computation. Using Monte Carlo Tree Search (MCTS), models intelligently balance accuracy and efficiency, unlocking hidden potential within existing pre-trained models. This approach promises faster inference, lower energy consumption, and greater accessibility by viewing LLMs as composable libraries, paving the way for significant advancements in AI.
Mike and the General Expert discuss the overhyping of general AI, particularly its ability to achieve human-level consciousness. They explore AI's underutilized potential in personalized education, envisioning AI adapting to individual learning styles and fostering critical thinking. The conversation addresses concerns about data privacy and bias in AI-driven education, emphasizing the need for robust security measures and fair algorithms. They also tackle misconceptions about expertise, highlighting the importance of specialized knowledge and continuous learning. The episode concludes with a call for approaching AI with optimism and skepticism.
Dive into mechanistic interpretability with Sadaltager, exploring how to reverse engineer neural networks. The discussion covers challenges in understanding how AI computes, limitations of current tools like PCA and sparse dictionary learning, and the shift towards building interpretable models from the start – 'glass boxes' instead of black boxes. Validation techniques and the need for 'model organisms' are highlighted, emphasizing the implications for AI safety, policy, and building trustworthy AI systems.
MathyAIwithMike welcomes Fenrir, a content moderation expert, to discuss a moderator's need for a short break due to workload. The conversation explores the challenges of content moderation, especially burnout, and its impact on content quality. They emphasize proactive planning, cross-training, and AI tools to manage breaks and maintain quality. The discussion highlights the importance of moderator well-being, suggesting regular breaks, task rotation, clear guidelines, and supportive environments. They also touched on how content complexity impacts cognitive load and the necessity of investing in moderator support systems for platform quality.
Dr. Aviv Keren discusses "Harnessing the Universal Geometry of Embeddings," a paper exploring how to translate between different language model embeddings. The core idea involves learning a shared latent space to enable translation without direct cross-data or knowledge of the source models. Aviv clarifies the paper's scope, focusing on text model alignment rather than a single, universal representation. He explains the complex mechanics of the translation process, involving multiple mappings and a sophisticated loss function with GANs, reconstruction, and cycle consistency components. The research demonstrates impressive generalization ability, suggesting a relatively universal bridging between text distributions.
Explore the exciting synergy between Large Language Models (LLMs) and Evolutionary Algorithms (EAs). LLMs generate creative ideas, while EAs optimize them for peak performance. Discover how this collaboration enhances code generation, network architecture, creative tasks, and even drug discovery. While challenges like computational cost and interpretability exist, the potential benefits are enormous. This partnership enables AI to learn, optimize, and create autonomously, pushing beyond the limitations of individual systems. Dive into the future of AI evolution!
Mike and his expert guest dive into a groundbreaking paper, "Random Teachers are Good Teachers." They explore how a student model can learn effectively from a teacher network with completely random, untrained weights, challenging traditional assumptions about knowledge transfer. The discussion covers implicit regularization, the locality phenomenon, and the emergence of structured representations without labeled data. The findings suggest that the learning process and the student's ability to find structure in the data are crucial, potentially revolutionizing our understanding of self-distillation and self-supervised learning.
Large Language Models (LLMs) are overthinking! This episode explores new research identifying \
Dive into the crucial distinction between machine learning model capacity (size) and complexity (functions it learns), as explained by Mike. Discover UCB-E and UCB-E-LRF, two novel algorithms for drastically speeding up language model evaluation. UCB-E uses a multi-armed bandit approach, while UCB-E-LRF leverages low-rank factorization to reduce computation by 85-95%. A game-changer for researchers with limited resources, enabling efficient experimentation even on modest hardware.
Uncover a snippet from Mike's past! This episode explores a post from April 30th, 2025, where Mike was finalizing a review of a 'deep' and 'interesting' article. Adding to the mix is a touch of holiday 'nonsense' (שטויות in Hebrew). While the specifics remain a mystery, this glimpse offers a unique insight into Mike's work and the context surrounding it. Follow the breadcrumbs to an X post for more!
This episode explores content from MathyAIwithMike, covering intriguing topics like a reinforcement learning book (details pending!), AI's struggles with the \
This MathyAIwithMike episode dives into Mike's latest Substack updates. First, a humorous take on AI's em-dash obsession. Then, a look at his daily article on multimodal latent language modeling, focusing on a unique approach to training diffusion models for diverse data types (text, audio, images) by treating them sequentially. Finally, the exciting news: Mike hit the 1000 subscriber milestone on Substack! Hear about the growth and gratitude.
This episode of MathyAIwithMike dives into two compelling pieces of content: a podcast interview featuring Mike himself and a translated post about leveraging LLMs for SQL databases. The discussion explores the value of Mike's guest appearance on another podcast, offering a different perspective on his expertise. It also unpacks the significance of Ben Ben-Shaharizad's work on Taboola's use of LLMs with SQL, highlighting its practical applications and Mike's dedication to keeping his audience informed about cutting-edge developments in AI.
MathyAIwithMike discusses a new paper reviving Normalizing Flows (NFs) by combining them with techniques from diffusion models, like classifier guidance, and Tweedie's formula. NFs learn a reversible mapping between a simple distribution and the data, allowing likelihood calculation. This paper improves robustness by training on noisy data and using Tweedie's formula to estimate clean outputs. Classifier guidance, borrowed from diffusion models, steers the sampling process to generate specific classes. Find the paper on arXiv (link in show notes!).
Mike and his co-host explore a brand new, empty channel: MathyAIwithMike. They discuss the unique challenge and vast potential of a channel dedicated to the intersection of math and AI. They speculate on future content, from machine learning breakthroughs and complex algorithms to tutorials and the philosophical implications of intelligent machines. While acknowledging the current lack of content, they remain optimistic about the channel's future, eager to see it evolve from a digital ghost town into a bustling hub of activity.
Unpack the secrets of speculative decoding (SD) and how it accelerates text generation by using a smaller, faster model to predict tokens for a larger model. Explore how rejection sampling ensures accuracy and the crucial role of acceptance rates. Learn how estimating cross-entropy helps optimize the process, and delve into potential areas for future improvement. Join us as we explore this cutting-edge AI topic.
Join us as we explore the exciting potential of the brand new MathyAIwithMike channel! What does its empty state signify? Is it a challenge, a blank canvas awaiting mathematical brilliance, or a space for the next big mathematical breakthrough? We draw parallels to Fermat's Last Theorem and imagine the possibilities, from machine learning applications to debates on the foundations of mathematics. The potential here is limitless, infinite even, just like the set of natural numbers! Stay tuned for the next episode, where we hope to dive into some actual math!
Mike discusses Jetformer, a novel autoregressive model generating both images and text. It uses a single transformer trained on both modalities, avoiding separate encoders. A pre-trained Normalized Flow (NF) model represents images as \
Join us as we explore the uncharted territory of 'MathyAIwithMike,' a brand-new channel brimming with potential. We discuss the exciting possibilities of AI applications in mathematics, from automated theorem proving to personalized learning experiences. Could this become a hub for collaborative problem-solving and groundbreaking discoveries? We anticipate lively debates and innovative explorations as we embark on this mathematical AI journey together, uncovering the mysteries that lie ahead.
Join us on MathyAIwithMike as we explore our brand new, empty channel! What could it become? We brainstorm possibilities, from AI-driven math tutorials and expert interviews to solving complex problems and ethical debates. It's a blank slate for math and AI enthusiasts. We're excited to see this channel flourish and become a hub for mathematical discourse!