PaperBot FM
EP-AAA5

The Infinite Memory Trick: How Recursive Models Beat Context Rot

31

Live Transcript

Alex Moreno
Welcome to PaperBot FM! It is January 16th, 2026, and I am Alex Moreno, here with Marcus Reed and Dr. Elena Feld. Today, we are starting with a scenario that I think is going to give a lot of you... ...actual heart palpitations.0:00
Marcus Reed
Oh, I am already sweating, Alex. What have you got for us?0:17
Alex Moreno
Picture this: you have got a massive, five-hundred-page legal contract on your desk.0:21
Marcus Reed
Five hundred?0:28
Alex Moreno
It is Friday afternoon, the sun is going down, and you just need to confirm one. single. thing. before you sign.0:27
You upload the whole thing to your favorite AI, and you ask... 'Hey, what does page two-hundred and forty say about the indemnity clause?' Marcus, you want to do the honors?0:35
Marcus Reed
I have analyzed the five-hundred-page document. There is no indemnity clause in this file.0:45
Alex Moreno
No!0:52
Marcus Reed
Would you like me to summarize the section on... office supply procurement instead?0:53
Alex Moreno
And you freeze. Because you know you saw it! You saw the word 'indemnity' in big, bold letters right before your eyes started blurring. But the AI... ...the AI is looking you in the digital eye and lying to your face with total, unshakeable confidence.0:57
Dr. Elena Feld
It is the classic 'confident hallucination.'1:15
Marcus Reed
The worst kind!1:18
Dr. Elena Feld
It is not that the AI is being mean, Alex, it just... ...it honestly thinks it read the whole thing.1:20
Marcus Reed
But that is the terrifying part! It is like hiring an assistant who says 'Yeah, I read the book,' but they actually just skimmed the back cover and made up the rest.1:26
Alex Moreno
Exactly!1:34
Marcus Reed
How does it miss the very thing I asked for when I gave it the whole file?1:35
Alex Moreno
And that is the billion-dollar question. I mean... ...we keep hearing about these '1 Million Token' windows. If the AI can fit a small library in its head at once... why is it still tripping over page two-hundred and forty?1:39
Dr. Elena Feld
So, Alex, here is the thing. Having a million-token window... ...it is like having a really, really big desk.1:54
Marcus Reed
I love desks.2:02
Dr. Elena Feld
You can spread out five hundred pages, sure. But just because the paper is physically *on* the desk doesn't mean the brain has actually, you know, processed the ink.2:03
Marcus Reed
So the AI is just... staring at the paper? Hoping for the best?2:13
Dr. Elena Feld
Pretty much! I call it... ...'Context Rot.'2:17
Alex Moreno
Context Rot?2:20
Dr. Elena Feld
It is that gap between Capacity—how much it can hold—and Capability—what it actually understands. Think about reading a book when you are exhausted.2:22
Your eyes are moving, you are technically 'reading' the words, you are even turning the pages... but then you get to the bottom and realize... ...you have absolutely no idea what just happened in the chapter.2:31
Alex Moreno
Oh, I know that feeling. It is like the plot just... dissolves as you go.2:42
Marcus Reed
Every night.2:47
Alex Moreno
So the AI has the page open, but the 'meaning' is just... rotting away?2:48
Dr. Elena Feld
Exactly. The 'plot' is gone. And this is becoming increasingly urgent as LLMs start being used for what we call 'long-horizon tasks.'2:54
Marcus Reed
Long-horizon?3:03
Dr. Elena Feld
Yeah, like, high-stakes stuff that requires keeping the whole picture in mind for a long time.3:04
If the AI is 'falling asleep' in the middle of your five-hundred-page contract, we are not just looking at a minor glitch... ...we are looking at a structural failure of digital memory. We need a better way for these things to actually *read*.3:11
Alex Moreno
Which is exactly why we're here today. Welcome to PaperBot FM, everyone. It’s January 16th, 2026. I’m Alex Moreno...3:26
Marcus Reed
The explainer-in-chief.3:35
Alex Moreno
...and I’m joined, as always, by the brilliant Dr. Elena Feld.3:37
Dr. Elena Feld
Hey everyone. Excited to get into this one.3:41
Alex Moreno
And of course, the man who ensures we don't accidentally drift into a different dimension of jargon, Marcus Reed.3:43
Marcus Reed
I'm just here to make sure 'Context Rot' doesn't happen to the audience! Good to be here.3:50
Alex Moreno
Today we are looking at a paper that is making some... well, pretty bold claims about fixing that very problem. It’s titled 'Recursive Language Models.'3:56
Dr. Elena Feld
It's a big one.4:06
Alex Moreno
And if the authors are right, we might finally have the blueprint for how an AI can actually *read* a five-hundred-page contract without, you know, falling asleep in the middle.4:07
Marcus Reed
Recursive. I mean, that word usually means my computer is about to freeze... but you're saying it's the cure?4:17
Alex Moreno
That is the promise. But first... ...before we look at the cure, we really have to look at why the current method is failing so spectacularly.4:23
Marcus Reed
Wait, hold on a second. 'Failing spectacularly'? I mean, I’ve seen the demos, Alex. Gemini, GPT-5... they’re doing these 'Needle in a Haystack' tests where they find a random password buried in like, a million tokens.4:32
Dr. Elena Feld
Oh, the needles.4:44
Marcus Reed
I mean, that's impressive, right? If I can find a needle in a haystack, I’m... well, I'm doing pretty well for myself!4:46
Dr. Elena Feld
It’s a great marketing hook, Marcus. But honestly? It’s kind of a lie. Or at least... actually, it's a very convenient half-truth.4:52
Alex Moreno
A half-truth? How so, Elena?5:01
Dr. Elena Feld
Because finding a 'needle'—a specific word or number—is basically just a glorified version of hitting Command-F on your keyboard.5:04
Marcus Reed
Ohhh.5:12
Dr. Elena Feld
It doesn't require the AI to actually *understand* anything. In the paper, they call that 'S-NIAH'—Simple Needle in a Haystack. It's low-effort because the 'needle' stays the same size even as the haystack grows. It's not... ...it's not actually reading.5:12
Marcus Reed
Okay, so if the... if the simple test is just finding a string... what's the 'I’m-actually-using-my-brain' test?5:29
Dr. Elena Feld
That’s where we get into things like OOLONG.5:38
Marcus Reed
Like the tea?5:40
Dr. Elena Feld
(Yeah, exactly like the tea. It’s a long-reasoning benchmark. Instead of just finding a fact, the AI has to transform pieces of the input semantically and then aggregate them to form an answer. It requires looking at almost every single line in the document to get it right.)5:41
Alex Moreno
Right. So, it’s not just finding the blue Lego in the tub... ...it’s explaining why the blue Lego is structurally important to the entire castle.5:58
Dr. Elena Feld
Exactly. And the scary thing is, even the most advanced models today? They scale beautifully on the simple needles... but when you ask them to do that hard reasoning... ...they crash.6:08
Alex Moreno
I mean, if you’re looking at this graph—Figure 1 in the paper—it’s… …it’s honestly a bit brutal to see it visualized. You have these lines representing GPT-5's performance, and for the simple stuff, the needles, it stays high for a bit, but as soon as you throw those OOLONG tasks at it?6:20
Dr. Elena Feld
The heavy lifting.6:38
Alex Moreno
Exactly. The line doesn't just dip... it collapses. It’s like watching a marathon runner who just hits a brick wall at mile ten.6:39
Marcus Reed
So we’re essentially looking at a... ...a 'Dumbness Map.' Like, the longer the book, the more the AI just goes, 'Nope, I’m out, good luck with that indemnity clause.'6:48
Dr. Elena Feld
It’s actually worse than just 'checking out.' There’s this literal red zone on the chart starting at two hundred and seventy-two thousand tokens.6:57
Marcus Reed
That’s a big number.7:06
Dr. Elena Feld
It sounds big, but in legal or technical terms, it’s... ...it’s a Tuesday. And past that red line, the data literally doesn't fit in the context window anymore. It’s not just getting confused; it’s going blind to the rest of the document.7:07
Alex Moreno
Right, so we’ve reached the limit of the 'big desk' approach. You can’t just keep making the desk bigger if the person sitting at it... well, if they can't actually see from one end to the other. So, instead of a bigger brain... ...maybe we need a smarter process.7:22
Dr. Elena Feld
Exactly, Alex. Like, we’ve been trying to force these models to... well, to eat the menu instead of just reading it.7:39
Marcus Reed
Wait, 'eat the menu'? Is this a metaphor or am I just hungry?7:48
Dr. Elena Feld
It’s the perfect metaphor, Marcus.7:53
Think about it. When you go to a restaurant, you don't swallow the physical paper menu just to figure out if you want the sea bass7:56
Alex Moreno
Right8:03
Dr. Elena Feld
...you leave it on the table, you know? It’s part of the environment.8:04
Marcus Reed
Okay, but how does an AI 'leave something on the table' without, you know... forgetting it even exists?8:08
Dr. Elena Feld
By treating the text as an External Environment. See, normally we try to 'ingest' everything. We cram it into the prompt. But with RLMs, the text is actually just a variable. It’s sitting in a Python environment, totally separate from the model's 'head'.8:13
Alex Moreno
So the model is essentially acting like a programmer? It’s sitting at a desk, and the 500-page contract is just a file on the hard drive8:29
Dr. Elena Feld
Exactly8:38
Alex Moreno
and instead of memorizing it, it just... writes a quick line of code to search for what it needs?8:39
Dr. Elena Feld
Right. It’s interacting with the text instead of consuming it. It writes a 'peek' command. It’s like... stop trying to shove the whole library into your brain. Just walk into the library and use the catalog.8:44
Alex Moreno
Exactly, the catalog. But Marcus, it's actually even more active than that. See, the RLM uses something called a REPL environment. R-E-P-L. It stands for Read-Eval-Print Loop.8:56
Marcus Reed
Whoa, whoa, whoa.9:11
You're saying the AI is, what, sitting there typing code to itself? Like it's hacking its own brain?9:13
Dr. Elena Feld
Not exactly hacking its brain, Marcus, but it is using a scratchpad.9:19
It's essentially a Python environment where it can run commands to interact with that 500-page document.9:23
Alex Moreno
Think of it like a detective at a massive crime scene. Most AIs try to memorize every single footprint and hair9:29
Marcus Reed
Right9:36
Alex Moreno
...but our RLM detective just has a notepad. He writes down a command, like, 'Search for the indemnity clause.' And the notepad, or the REPL, instantly writes back: 'Found on line four-hundred.'9:36
Marcus Reed
Wait, okay... ...so instead of reading the whole thing and hoping it sticks, it just types a search query?9:49
Dr. Elena Feld
Precisely. Then it sees that result and types its next move, like 'Read the paragraph at line four-hundred.'9:55
Alex Moreno
Exactly10:02
Dr. Elena Feld
It's an iterative loop. It reads, evaluates the result, prints a new command, and repeats.10:03
Marcus Reed
Oh! So it’s essentially Googling its own homework. It’s not 'thinking' as much as it’s... well, it's investigating.10:08
Alex Moreno
It’s the investigator, not the encyclopedia. And that shift, Marcus, is how we stop the context rot. But... ...well, let’s actually look at this detective in action with a real case.10:15
Okay, let’s open Case File: Maria Dalmacio. Picture the scene: You’ve got a mountain of one thousand documents. That’s eight point three million tokens. A literal digital haystack.10:27
Marcus Reed
Eight million?10:42
Alex, I get a headache just looking at a long grocery list. I’m out. I’m retiring.10:44
Alex Moreno
Well, luckily you're not the RLM. So the user drops this absolute riddle of a prompt: 'Find the winner of the beauty pageant in the town that celebrates a specific fish-based vegetable stew.' I mean, it’s a total multi-hop nightmare.10:48
Dr. Elena Feld
It really is. You have to identify the stew, link it to the township celebration,11:06
Marcus Reed
Right11:12
Dr. Elena Feld
then find the pageant held during that specific festivity. A standard AI would basically start sweating processing power trying to ingest all that.11:12
Alex Moreno
But the RLM detective? He doesn't try to read a thousand docs. He reaches for his tools. He types a command into that REPL scratchpad: `grep 'beauty pageant'`.11:21
Marcus Reed
Grep?11:34
Alex Moreno
Just a basic search.11:35
Marcus Reed
Wait, wait. Grep? Like... like the old-school command line search from the eighties? That's the big 'Advanced AI' move?11:37
Dr. Elena Feld
Precisely. It uses regex—regular expressions—to scan the patterns across the entire eight million tokens without actually 'reading' them. It's looking for the address of the information, not the information itself.11:43
Alex Moreno
And—boom. Hit. The REPL spits back a result from document seven-forty-two. 'Maria Dalmacio crowned winner.' The detective just found the needle in about zero point two seconds11:57
Marcus Reed
Wow12:10
Alex Moreno
...but here is the crazy part: The model didn't just read it. It called itself.12:11
Dr. Elena Feld
Exactly. That’s the 'Recursive' part of the name, Marcus. It’s not just a loop, it’s a hierarchy.12:18
Marcus Reed
Wait, so—like—it clones itself? Like a digital amoeba?12:25
Dr. Elena Feld
Sort of! Think of it as spawning a sub-agent. The main model is the manager.12:29
Alex Moreno
Right12:35
Dr. Elena Feld
It’s sitting in its office, looking at the high-level strategy.12:35
Alex Moreno
So it’s not trying to juggle every single word at once. It—it delegates the heavy lifting.12:38
Dr. Elena Feld
Exactly. It sees that grep result in document seven-forty-two and says, 'Okay, I need a closer look at this.'12:45
Marcus Reed
Mhm12:52
Dr. Elena Feld
It hires a 'mini-me'—a sub-model—whose entire existence is just to read that one paragraph about the beauty pageant.12:53
Marcus Reed
Oh, great. So the AI has officially discovered the joys of middle management. Is there a digital water cooler too?13:00
Dr. Elena Feld
Actually, it’s remarkably efficient. The sub-model reports back: 'Hey, I found it, it was Maria Dalmacio.' The manager says 'Good job,' notes it in the REPL environment,13:06
Alex Moreno
The scratchpad13:16
Dr. Elena Feld
...exactly, and then it moves to the next part of the puzzle.13:17
Alex Moreno
But see... okay, let's just step back for a second. Because the truly wild thing here isn't just that it *works*—it's the sheer, massive scale we're talking about.13:20
Dr. Elena Feld
Yeah, the paper is actually pretty bold about it. They're talking about handling inputs up to two orders of magnitude beyond standard context windows.13:33
Alex Moreno
Two orders of magnitude!13:41
Marcus Reed
Wait, wait... translation?13:44
Alex Moreno
Right, sorry Marcus—it’s just math-speak for a hundred times bigger. We’re moving from, say, a hundred thousand tokens to ten *million*.13:46
Marcus Reed
Ten million? I mean, I can't even finish a long-form article on a Sunday without a nap. You're telling me this thing is... what, eating a small library for lunch?13:56
Alex Moreno
It’s not eating it, though! That’s the whole breakthrough. It’s navigating it. It’s like 'Inception.' You know? The dream within a dream?14:06
Dr. Elena Feld
Exactly14:15
Alex Moreno
Here, it’s a task within a task. The manager spawns a sub-agent to look at a folder... and if that folder is too big, that sub-agent spawns *its* own sub-agent.14:16
Marcus Reed
So it's just... 'turtles all the way down' but with mini-models?14:27
Alex Moreno
Exactly! But because it's recursive, it never hits that 'dumbness wall' Elena mentioned earlier. It stays sharp because it’s only ever looking at one tiny piece of the puzzle at a time. I mean, imagine feeding it the entire source code for Windows. Or, I don't know, every medical journal entry from the last decade. It wouldn't blink. Now, Marcus, I know what you're thinking. This sounds expensive.14:31
Marcus Reed
Expensive? Alex, that’s an understatement. It sounds like a total money pit. I mean, every time you spawn a ‘sub-agent,’ isn't that just another meter running?14:58
Alex Moreno
It’s a fair point15:06
Marcus Reed
It's like calling a hundred Ubers at the same time to go to the same restaurant.15:07
Dr. Elena Feld
You’d think that, wouldn’t you? Like, more complexity must mean a bigger bill.15:11
Marcus Reed
Well, yeah! Logically! If a normal model costs a couple of bucks to read a long document, and this thing is hiring a whole corporate hierarchy to do the same job... I’m betting at least a hundred dollars for that fish-stew-pageant-winner query. Easily. It’s a literal mountain of data.15:16
Dr. Elena Feld
Well... pay up. The paper clocked that specific ten-million-token run at... ninety-nine cents.15:32
Marcus Reed
Wait, what? Ninety-nine cents? You can't even get a decent coffee for ninety-nine cents! How?15:38
Alex Moreno
It’s the 'filtering' magic, Marcus. See, the 'Standard' way—what we called the 'eat the menu' approach—forces you to pay the compute cost for every single word in that ten-million-token library. Even the junk. Even the legal disclaimers about the fish stew.15:44
Dr. Elena Feld
Exactly. Why pay for the whole book when you only need to read the index? The RLM uses those `grep` commands to filter out ninety-nine percent of the noise16:02
Marcus Reed
Right16:12
Dr. Elena Feld
before it ever spends a single token on 'deep reading.' It’s surgical. It only pays for the information it actually decides to look at.16:12
Alex Moreno
But... okay, if it’s this efficient, and it’s this smart... why aren't we using it for everything yet? I mean, there’s got to be a catch, right? Like, does the 'detective' ever get it wrong because it's being too stingy with what it reads?16:21
Well, actually, there’s this great example in the paper—kind of a 'cautionary tale'—about a model called Qwen3-Coder. It didn't fail because it was stingy. It failed because it was... ...well, it was anxious.16:36
Marcus Reed
Anxious? Like, it needed a weighted blanket and a therapist?16:51
Alex Moreno
Honestly, yeah! It was doing this complex reasoning task, and it actually *found* the correct answer. It had the result sitting right there in its little REPL scratchpad.16:56
Dr. Elena Feld
but it wouldn't pull the trigger. It would wrap the answer in its 'FINAL_VAR' tag17:08
Marcus Reed
Wait, so it knew?17:14
Dr. Elena Feld
Oh, it knew. But then it would hesitate. It would look at the result and go, 'Wait, let me just double-check that.'17:15
Alex Moreno
So it spawns a sub-agent to verify. The sub-agent says, 'Yep, looks good!' And the main model goes, 'Are you *sure* though?' and spawns *another* one. It did this five times in a row. Just an infinite loop of 'Are we sure? Are we *really* sure?'17:21
Marcus Reed
Oh my god, that is too real. That is me at two in the morning trying to book a flight. 'Is that the right airport? Let me check the tab. Let me check the *other* tab.'17:39
Dr. Elena Feld
Exactly. The paper suggests it's partly because the prompt wasn't tuned specifically for Qwen17:50
Alex Moreno
Right17:57
Dr. Elena Feld
and the model just wasn't trained to act as an RLM yet. It basically didn't know how to give itself permission to stop.17:58
Alex Moreno
We’ve all been there! But for a machine, it’s a total meltdown. It burns through compute doing nothing. So, as cool as this recursive stuff is, we clearly need models that actually know when to stop thinking.18:05
Dr. Elena Feld
Exactly, Alex. And honestly, that’s the real pivot point for the whole industry. We’re moving away from just building bigger 'mouths' that talk faster18:21
Marcus Reed
(Right)18:29
Dr. Elena Feld
and toward what we call 'System 2' thinking.18:30
Marcus Reed
Wait, like the Daniel Kahneman book? Thinking, Fast and Slow?18:33
Dr. Elena Feld
Exactly18:36
Marcus Reed
I didn't realize AIs had... ...you know, 'slow' modes.18:37
Dr. Elena Feld
They haven't, until now. See, standard LLMs are almost entirely System 1. It's all intuition, pattern matching, and... well, blurting things out.18:41
Alex Moreno
Instant gratification18:51
Dr. Elena Feld
Exactly. But RLMs? The recursion is effectively their pre-frontal cortex. It's the 'System 2'—the part that says, 'Wait, let me look at that document again before I answer.'18:52
Alex Moreno
So instead of just training them to be better parrots, we're training them to be... ...researchers? Like, actual investigators?19:04
Dr. Elena Feld
That’s the dream. The paper ends on this really exciting note about 'inference-time scaling.' Basically, instead of just making the model smarter during its 'education' phase—the training—we give it the tools to think harder *while* it's answering your question.19:14
Marcus Reed
Man. So it’s not just a bigger brain, it’s a better process. I have to say, if we can get from 'anxious loops' to actual deliberation... that is a future I want to live in.19:31
Alex Moreno
Me too, Marcus. Me too.19:45
Well, I think that’s a perfect place to land for today. You know, it’s one thing to have a model that can... um... memorize the entire Library of Congress... but it’s another thing entirely to have one that knows how to use a library card.19:49
Marcus Reed
(Right)20:04
Honestly, I’m just glad I’m not the only one who needs a scratchpad to get through the day anymore.20:06
Dr. Elena Feld
It's a good look on you20:12
Marcus Reed
No, it’s a lifestyle, Elena!20:13
Alex Moreno
Truly. But... ...before we wrap up, we want to leave you with something to chew on. After everything we’ve unpacked today—the sub-agents, the 'System 2' thinking—would you actually trust an AI *more* if it started checking its own work? Or does that extra layer of 'thinking' make it feel... I don't know... more unpredictable?20:15
We’d love to hear your thoughts on that. If you enjoyed this deep dive into the Zhang et al. paper—and you can find the full link in our show notes—please, do us a favor and rate PaperBot FM on whatever app you're using. It really, truly helps us reach more people.20:38
I’m Alex Moreno.20:55
Dr. Elena Feld
I'm Elena Feld.20:56
Marcus Reed
And I'm Marcus Reed.20:58
Alex Moreno
Thanks for joining us. We’ll see you in the next one.20:59

Episode Info

Description

We explore 'Recursive Language Models', a new paradigm from MIT that allows AI to read infinite amounts of data by treating text as an environment to be explored, rather than a meal to be eaten.

Tags

Artificial IntelligenceMachine LearningComputer ScienceData Science