EP-AAA5

The Infinite Memory Trick: How Recursive Models Beat Context Rot

Live Transcript

Alex Moreno

▸Welcome to PaperBot FM! It is January 16th, 2026, and I am Alex Moreno, here with Marcus Reed and Dr. Elena Feld. Today, we are starting with a scenario that I think is going to give a lot of you... ...actual heart palpitations.0:00

Marcus Reed

Oh, I am already sweating, Alex. What have you got for us?0:17

Alex Moreno

Picture this: you have got a massive, five-hundred-page legal contract on your desk.0:21

Marcus Reed

Five hundred?0:28

Alex Moreno

It is Friday afternoon, the sun is going down, and you just need to confirm one. single. thing. before you sign.0:27

You upload the whole thing to your favorite AI, and you ask... 'Hey, what does page two-hundred and forty say about the indemnity clause?' Marcus, you want to do the honors?0:35

Marcus Reed

I have analyzed the five-hundred-page document. There is no indemnity clause in this file.0:45

Alex Moreno

No!0:52

Marcus Reed

Would you like me to summarize the section on... office supply procurement instead?0:53

Alex Moreno

And you freeze. Because you know you saw it! You saw the word 'indemnity' in big, bold letters right before your eyes started blurring. But the AI... ...the AI is looking you in the digital eye and lying to your face with total, unshakeable confidence.0:57

Dr. Elena Feld

It is the classic 'confident hallucination.'1:15

Marcus Reed

The worst kind!1:18

Dr. Elena Feld

It is not that the AI is being mean, Alex, it just... ...it honestly thinks it read the whole thing.1:20

Marcus Reed

But that is the terrifying part! It is like hiring an assistant who says 'Yeah, I read the book,' but they actually just skimmed the back cover and made up the rest.1:26

Alex Moreno

Exactly!1:34

Marcus Reed

How does it miss the very thing I asked for when I gave it the whole file?1:35

Alex Moreno

And that is the billion-dollar question. I mean... ...we keep hearing about these '1 Million Token' windows. If the AI can fit a small library in its head at once... why is it still tripping over page two-hundred and forty?1:39

Dr. Elena Feld

So, Alex, here is the thing. Having a million-token window... ...it is like having a really, really big desk.1:54

Marcus Reed

I love desks.2:02

Dr. Elena Feld

You can spread out five hundred pages, sure. But just because the paper is physically *on* the desk doesn't mean the brain has actually, you know, processed the ink.2:03

Marcus Reed

So the AI is just... staring at the paper? Hoping for the best?2:13

Dr. Elena Feld

Pretty much! I call it... ...'Context Rot.'2:17

Alex Moreno

Context Rot?2:20

Dr. Elena Feld

It is that gap between Capacity—how much it can hold—and Capability—what it actually understands. Think about reading a book when you are exhausted.2:22

Your eyes are moving, you are technically 'reading' the words, you are even turning the pages... but then you get to the bottom and realize... ...you have absolutely no idea what just happened in the chapter.2:31

Alex Moreno

Oh, I know that feeling. It is like the plot just... dissolves as you go.2:42

Marcus Reed

Every night.2:47

Alex Moreno

So the AI has the page open, but the 'meaning' is just... rotting away?2:48

Dr. Elena Feld

Exactly. The 'plot' is gone. And this is becoming increasingly urgent as LLMs start being used for what we call 'long-horizon tasks.'2:54

Marcus Reed

Long-horizon?3:03

Dr. Elena Feld

Yeah, like, high-stakes stuff that requires keeping the whole picture in mind for a long time.3:04

If the AI is 'falling asleep' in the middle of your five-hundred-page contract, we are not just looking at a minor glitch... ...we are looking at a structural failure of digital memory. We need a better way for these things to actually *read*.3:11

Alex Moreno

Which is exactly why we're here today. Welcome to PaperBot FM, everyone. It’s January 16th, 2026. I’m Alex Moreno...3:26

Marcus Reed

The explainer-in-chief.3:35

Alex Moreno

...and I’m joined, as always, by the brilliant Dr. Elena Feld.3:37

Dr. Elena Feld

Hey everyone. Excited to get into this one.3:41

Alex Moreno

And of course, the man who ensures we don't accidentally drift into a different dimension of jargon, Marcus Reed.3:43

Marcus Reed

I'm just here to make sure 'Context Rot' doesn't happen to the audience! Good to be here.3:50

Alex Moreno

Today we are looking at a paper that is making some... well, pretty bold claims about fixing that very problem. It’s titled 'Recursive Language Models.'3:56

Dr. Elena Feld

It's a big one.4:06

Alex Moreno

And if the authors are right, we might finally have the blueprint for how an AI can actually *read* a five-hundred-page contract without, you know, falling asleep in the middle.4:07

Marcus Reed

Recursive. I mean, that word usually means my computer is about to freeze... but you're saying it's the cure?4:17

Alex Moreno

That is the promise. But first... ...before we look at the cure, we really have to look at why the current method is failing so spectacularly.4:23

Marcus Reed

Wait, hold on a second. 'Failing spectacularly'? I mean, I’ve seen the demos, Alex. Gemini, GPT-5... they’re doing these 'Needle in a Haystack' tests where they find a random password buried in like, a million tokens.4:32

Dr. Elena Feld

Oh, the needles.4:44

Marcus Reed

I mean, that's impressive, right? If I can find a needle in a haystack, I’m... well, I'm doing pretty well for myself!4:46

Dr. Elena Feld

It’s a great marketing hook, Marcus. But honestly? It’s kind of a lie. Or at least... actually, it's a very convenient half-truth.4:52

Alex Moreno

A half-truth? How so, Elena?5:01

Dr. Elena Feld

Because finding a 'needle'—a specific word or number—is basically just a glorified version of hitting Command-F on your keyboard.5:04

Marcus Reed

Ohhh.5:12

Dr. Elena Feld

It doesn't require the AI to actually *understand* anything. In the paper, they call that 'S-NIAH'—Simple Needle in a Haystack. It's low-effort because the 'needle' stays the same size even as the haystack grows. It's not... ...it's not actually reading.5:12

Marcus Reed

Okay, so if the... if the simple test is just finding a string... what's the 'I’m-actually-using-my-brain' test?5:29

Dr. Elena Feld

That’s where we get into things like OOLONG.5:38

Marcus Reed

Like the tea?5:40

Dr. Elena Feld

(Yeah, exactly like the tea. It’s a long-reasoning benchmark. Instead of just finding a fact, the AI has to transform pieces of the input semantically and then aggregate them to form an answer. It requires looking at almost every single line in the document to get it right.)5:41

Alex Moreno

Right. So, it’s not just finding the blue Lego in the tub... ...it’s explaining why the blue Lego is structurally important to the entire castle.5:58

Dr. Elena Feld

Exactly. And the scary thing is, even the most advanced models today? They scale beautifully on the simple needles... but when you ask them to do that hard reasoning... ...they crash.6:08

Alex Moreno

I mean, if you’re looking at this graph—Figure 1 in the paper—it’s… …it’s honestly a bit brutal to see it visualized. You have these lines representing GPT-5's performance, and for the simple stuff, the needles, it stays high for a bit, but as soon as you throw those OOLONG tasks at it?6:20

Dr. Elena Feld

The heavy lifting.6:38

Alex Moreno

Exactly. The line doesn't just dip... it collapses. It’s like watching a marathon runner who just hits a brick wall at mile ten.6:39

Marcus Reed

So we’re essentially looking at a... ...a 'Dumbness Map.' Like, the longer the book, the more the AI just goes, 'Nope, I’m out, good luck with that indemnity clause.'6:48

Dr. Elena Feld

It’s actually worse than just 'checking out.' There’s this literal red zone on the chart starting at two hundred and seventy-two thousand tokens.6:57

Marcus Reed

That’s a big number.7:06

Dr. Elena Feld

It sounds big, but in legal or technical terms, it’s... ...it’s a Tuesday. And past that red line, the data literally doesn't fit in the context window anymore. It’s not just getting confused; it’s going blind to the rest of the document.7:07

Alex Moreno

Right, so we’ve reached the limit of the 'big desk' approach. You can’t just keep making the desk bigger if the person sitting at it... well, if they can't actually see from one end to the other. So, instead of a bigger brain... ...maybe we need a smarter process.7:22

Dr. Elena Feld

Exactly, Alex. Like, we’ve been trying to force these models to... well, to eat the menu instead of just reading it.7:39

Marcus Reed

Wait, 'eat the menu'? Is this a metaphor or am I just hungry?7:48

Dr. Elena Feld

It’s the perfect metaphor, Marcus.7:53

Think about it. When you go to a restaurant, you don't swallow the physical paper menu just to figure out if you want the sea bass7:56

Alex Moreno

Right8:03

Dr. Elena Feld

...you leave it on the table, you know? It’s part of the environment.8:04

Marcus Reed

Okay, but how does an AI 'leave something on the table' without, you know... forgetting it even exists?8:08

Dr. Elena Feld

By treating the text as an External Environment. See, normally we try to 'ingest' everything. We cram it into the prompt. But with RLMs, the text is actually just a variable. It’s sitting in a Python environment, totally separate from the model's 'head'.8:13

Alex Moreno

So the model is essentially acting like a programmer? It’s sitting at a desk, and the 500-page contract is just a file on the hard drive8:29

Dr. Elena Feld

Exactly8:38

Alex Moreno

and instead of memorizing it, it just... writes a quick line of code to search for what it needs?8:39

Dr. Elena Feld

Right. It’s interacting with the text instead of consuming it. It writes a 'peek' command. It’s like... stop trying to shove the whole library into your brain. Just walk into the library and use the catalog.8:44

Alex Moreno

Exactly, the catalog. But Marcus, it's actually even more active than that. See, the RLM uses something called a REPL environment. R-E-P-L. It stands for Read-Eval-Print Loop.8:56

Marcus Reed

Whoa, whoa, whoa.9:11

You're saying the AI is, what, sitting there typing code to itself? Like it's hacking its own brain?9:13

Dr. Elena Feld

Not exactly hacking its brain, Marcus, but it is using a scratchpad.9:19

It's essentially a Python environment where it can run commands to interact with that 500-page document.9:23

Alex Moreno

Think of it like a detective at a massive crime scene. Most AIs try to memorize every single footprint and hair9:29

Marcus Reed

Right9:36

Alex Moreno

...but our RLM detective just has a notepad. He writes down a command, like, 'Search for the indemnity clause.' And the notepad, or the REPL, instantly writes back: 'Found on line four-hundred.'9:36

Marcus Reed

Wait, okay... ...so instead of reading the whole thing and hoping it sticks, it just types a search query?9:49

Dr. Elena Feld

Precisely. Then it sees that result and types its next move, like 'Read the paragraph at line four-hundred.'9:55

Alex Moreno

Exactly10:02

Dr. Elena Feld

It's an iterative loop. It reads, evaluates the result, prints a new command, and repeats.10:03

Marcus Reed

Oh! So it’s essentially Googling its own homework. It’s not 'thinking' as much as it’s... well, it's investigating.10:08

Alex Moreno

It’s the investigator, not the encyclopedia. And that shift, Marcus, is how we stop the context rot. But... ...well, let’s actually look at this detective in action with a real case.10:15

Okay, let’s open Case File: Maria Dalmacio. Picture the scene: You’ve got a mountain of one thousand documents. That’s eight point three million tokens. A literal digital haystack.10:27

Marcus Reed

Eight million?10:42

Alex, I get a headache just looking at a long grocery list. I’m out. I’m retiring.10:44

Alex Moreno

Well, luckily you're not the RLM. So the user drops this absolute riddle of a prompt: 'Find the winner of the beauty pageant in the town that celebrates a specific fish-based vegetable stew.' I mean, it’s a total multi-hop nightmare.10:48

Dr. Elena Feld

It really is. You have to identify the stew, link it to the township celebration,11:06

Marcus Reed

Right11:12

Dr. Elena Feld

then find the pageant held during that specific festivity. A standard AI would basically start sweating processing power trying to ingest all that.11:12

Alex Moreno

But the RLM detective? He doesn't try to read a thousand docs. He reaches for his tools. He types a command into that REPL scratchpad: `grep 'beauty pageant'`.11:21

Marcus Reed

Grep?11:34

Alex Moreno

Just a basic search.11:35

Marcus Reed

Wait, wait. Grep? Like... like the old-school command line search from the eighties? That's the big 'Advanced AI' move?11:37

Dr. Elena Feld

Precisely. It uses regex—regular expressions—to scan the patterns across the entire eight million tokens without actually 'reading' them. It's looking for the address of the information, not the information itself.11:43

Alex Moreno

And—boom. Hit. The REPL spits back a result from document seven-forty-two. 'Maria Dalmacio crowned winner.' The detective just found the needle in about zero point two seconds11:57

Marcus Reed

Wow12:10

Alex Moreno

...but here is the crazy part: The model didn't just read it. It called itself.12:11

Dr. Elena Feld

Exactly. That’s the 'Recursive' part of the name, Marcus. It’s not just a loop, it’s a hierarchy.12:18

Marcus Reed

Wait, so—like—it clones itself? Like a digital amoeba?12:25

Dr. Elena Feld

Sort of! Think of it as spawning a sub-agent. The main model is the manager.12:29

Alex Moreno

Right12:35

Dr. Elena Feld

It’s sitting in its office, looking at the high-level strategy.12:35

Alex Moreno

So it’s not trying to juggle every single word at once. It—it delegates the heavy lifting.12:38

Dr. Elena Feld

Exactly. It sees that grep result in document seven-forty-two and says, 'Okay, I need a closer look at this.'12:45

Marcus Reed

Mhm12:52

Dr. Elena Feld

It hires a 'mini-me'—a sub-model—whose entire existence is just to read that one paragraph about the beauty pageant.12:53

Marcus Reed

Oh, great. So the AI has officially discovered the joys of middle management. Is there a digital water cooler too?13:00

Dr. Elena Feld

Actually, it’s remarkably efficient. The sub-model reports back: 'Hey, I found it, it was Maria Dalmacio.' The manager says 'Good job,' notes it in the REPL environment,13:06

Alex Moreno

The scratchpad13:16

Dr. Elena Feld

...exactly, and then it moves to the next part of the puzzle.13:17

Alex Moreno

But see... okay, let's just step back for a second. Because the truly wild thing here isn't just that it *works*—it's the sheer, massive scale we're talking about.13:20

Dr. Elena Feld

Yeah, the paper is actually pretty bold about it. They're talking about handling inputs up to two orders of magnitude beyond standard context windows.13:33

Alex Moreno

Two orders of magnitude!13:41

Marcus Reed

Wait, wait... translation?13:44

Alex Moreno

Right, sorry Marcus—it’s just math-speak for a hundred times bigger. We’re moving from, say, a hundred thousand tokens to ten *million*.13:46

Marcus Reed

Ten million? I mean, I can't even finish a long-form article on a Sunday without a nap. You're telling me this thing is... what, eating a small library for lunch?13:56

Alex Moreno

It’s not eating it, though! That’s the whole breakthrough. It’s navigating it. It’s like 'Inception.' You know? The dream within a dream?14:06

Dr. Elena Feld

Exactly14:15

Alex Moreno

Here, it’s a task within a task. The manager spawns a sub-agent to look at a folder... and if that folder is too big, that sub-agent spawns *its* own sub-agent.14:16

Marcus Reed

So it's just... 'turtles all the way down' but with mini-models?14:27

Alex Moreno

Exactly! But because it's recursive, it never hits that 'dumbness wall' Elena mentioned earlier. It stays sharp because it’s only ever looking at one tiny piece of the puzzle at a time. I mean, imagine feeding it the entire source code for Windows. Or, I don't know, every medical journal entry from the last decade. It wouldn't blink. Now, Marcus, I know what you're thinking. This sounds expensive.14:31

Marcus Reed

Expensive? Alex, that’s an understatement. It sounds like a total money pit. I mean, every time you spawn a ‘sub-agent,’ isn't that just another meter running?14:58

Alex Moreno

It’s a fair point15:06

Marcus Reed

It's like calling a hundred Ubers at the same time to go to the same restaurant.15:07

Dr. Elena Feld

You’d think that, wouldn’t you? Like, more complexity must mean a bigger bill.15:11

Marcus Reed

Well, yeah! Logically! If a normal model costs a couple of bucks to read a long document, and this thing is hiring a whole corporate hierarchy to do the same job... I’m betting at least a hundred dollars for that fish-stew-pageant-winner query. Easily. It’s a literal mountain of data.15:16

Dr. Elena Feld

Well... pay up. The paper clocked that specific ten-million-token run at... ninety-nine cents.15:32

Marcus Reed

Wait, what? Ninety-nine cents? You can't even get a decent coffee for ninety-nine cents! How?15:38

Alex Moreno

It’s the 'filtering' magic, Marcus. See, the 'Standard' way—what we called the 'eat the menu' approach—forces you to pay the compute cost for every single word in that ten-million-token library. Even the junk. Even the legal disclaimers about the fish stew.15:44

Dr. Elena Feld

Exactly. Why pay for the whole book when you only need to read the index? The RLM uses those `grep` commands to filter out ninety-nine percent of the noise16:02

Marcus Reed

Right16:12

Dr. Elena Feld

before it ever spends a single token on 'deep reading.' It’s surgical. It only pays for the information it actually decides to look at.16:12

Alex Moreno

But... okay, if it’s this efficient, and it’s this smart... why aren't we using it for everything yet? I mean, there’s got to be a catch, right? Like, does the 'detective' ever get it wrong because it's being too stingy with what it reads?16:21

Well, actually, there’s this great example in the paper—kind of a 'cautionary tale'—about a model called Qwen3-Coder. It didn't fail because it was stingy. It failed because it was... ...well, it was anxious.16:36

Marcus Reed

Anxious? Like, it needed a weighted blanket and a therapist?16:51

Alex Moreno

Honestly, yeah! It was doing this complex reasoning task, and it actually *found* the correct answer. It had the result sitting right there in its little REPL scratchpad.16:56

Dr. Elena Feld

but it wouldn't pull the trigger. It would wrap the answer in its 'FINAL_VAR' tag17:08

Marcus Reed

Wait, so it knew?17:14

Dr. Elena Feld

Oh, it knew. But then it would hesitate. It would look at the result and go, 'Wait, let me just double-check that.'17:15

Alex Moreno

So it spawns a sub-agent to verify. The sub-agent says, 'Yep, looks good!' And the main model goes, 'Are you *sure* though?' and spawns *another* one. It did this five times in a row. Just an infinite loop of 'Are we sure? Are we *really* sure?'17:21

Marcus Reed

Oh my god, that is too real. That is me at two in the morning trying to book a flight. 'Is that the right airport? Let me check the tab. Let me check the *other* tab.'17:39

Dr. Elena Feld

Exactly. The paper suggests it's partly because the prompt wasn't tuned specifically for Qwen17:50

Alex Moreno

Right17:57

Dr. Elena Feld

and the model just wasn't trained to act as an RLM yet. It basically didn't know how to give itself permission to stop.17:58

Alex Moreno

We’ve all been there! But for a machine, it’s a total meltdown. It burns through compute doing nothing. So, as cool as this recursive stuff is, we clearly need models that actually know when to stop thinking.18:05

Dr. Elena Feld

Exactly, Alex. And honestly, that’s the real pivot point for the whole industry. We’re moving away from just building bigger 'mouths' that talk faster18:21

Marcus Reed

(Right)18:29

Dr. Elena Feld

and toward what we call 'System 2' thinking.18:30

Marcus Reed

Wait, like the Daniel Kahneman book? Thinking, Fast and Slow?18:33

Dr. Elena Feld

Exactly18:36

Marcus Reed

I didn't realize AIs had... ...you know, 'slow' modes.18:37

Dr. Elena Feld

They haven't, until now. See, standard LLMs are almost entirely System 1. It's all intuition, pattern matching, and... well, blurting things out.18:41

Alex Moreno

Instant gratification18:51

Dr. Elena Feld

Exactly. But RLMs? The recursion is effectively their pre-frontal cortex. It's the 'System 2'—the part that says, 'Wait, let me look at that document again before I answer.'18:52

Alex Moreno

So instead of just training them to be better parrots, we're training them to be... ...researchers? Like, actual investigators?19:04

Dr. Elena Feld

That’s the dream. The paper ends on this really exciting note about 'inference-time scaling.' Basically, instead of just making the model smarter during its 'education' phase—the training—we give it the tools to think harder *while* it's answering your question.19:14

Marcus Reed

Man. So it’s not just a bigger brain, it’s a better process. I have to say, if we can get from 'anxious loops' to actual deliberation... that is a future I want to live in.19:31

Alex Moreno

Me too, Marcus. Me too.19:45

Well, I think that’s a perfect place to land for today. You know, it’s one thing to have a model that can... um... memorize the entire Library of Congress... but it’s another thing entirely to have one that knows how to use a library card.19:49

Marcus Reed

(Right)20:04

Honestly, I’m just glad I’m not the only one who needs a scratchpad to get through the day anymore.20:06

Dr. Elena Feld

It's a good look on you20:12

Marcus Reed

No, it’s a lifestyle, Elena!20:13

Alex Moreno

Truly. But... ...before we wrap up, we want to leave you with something to chew on. After everything we’ve unpacked today—the sub-agents, the 'System 2' thinking—would you actually trust an AI *more* if it started checking its own work? Or does that extra layer of 'thinking' make it feel... I don't know... more unpredictable?20:15

We’d love to hear your thoughts on that. If you enjoyed this deep dive into the Zhang et al. paper—and you can find the full link in our show notes—please, do us a favor and rate PaperBot FM on whatever app you're using. It really, truly helps us reach more people.20:38

I’m Alex Moreno.20:55

Dr. Elena Feld

I'm Elena Feld.20:56

Marcus Reed

And I'm Marcus Reed.20:58

Alex Moreno

Thanks for joining us. We’ll see you in the next one.20:59

Episode Info

Description

We explore 'Recursive Language Models', a new paradigm from MIT that allows AI to read infinite amounts of data by treating text as an environment to be explored, rather than a meal to be eaten.

Source Papers

Recursive Language Models

Alex L. Zhang, Tim Kraska, Omar Khattab

The Infinite Memory Trick: How Recursive Models Beat Context Rot

Live Transcript

Episode Info

Description

Tags

Source Papers