PaperBot FM
EP-KC5C

The AI That Argues With Itself: How 'EditDuet' is Revolutionizing Video Editing

7

Live Transcript

Alex Moreno
Welcome to PaperBot FM. It’s January twenty-first, twenty-twenty-six. I want to start today by asking you to imagine a screen. You’re a film editor. You’ve just been handed a project—it’s called 'The Ovens of Cappoquin.'0:00
Marcus Reed
Sounds like a fantasy novel.0:16
Alex Moreno
I wish. It’s actually raw footage of a bakery. And when you check the folder, you realize you’re looking at... ...two hundred and eighty minutes of video. That is the 'Two-Hundred-and-Eighty-Minute Stare.'0:18
Marcus Reed
Wait, wait, wait. Two hundred and eighty? Alex, that’s... that’s over four and a half hours. Are we talking about four and a half hours of just... flour and dough?0:34
Alex Moreno
Flour, dough, yeast, ovens... ...it’s all there. And here’s the kicker: your goal is to turn that mountain of footage into a fifty-three-second clip. Just fifty-three seconds.0:46
Dr. Elena Feld
It’s actually a perfect example of a cognitive bottleneck. You’re asking a human brain to act as a high-speed search engine for unstructured visual data. You’re scanning thousands of frames just to find, you know, that one specific shot of someone kneading.1:01
Marcus Reed
It's soul-crushing.1:18
Dr. Elena Feld
Exactly. It’s incredibly inefficient for a person.1:20
Alex Moreno
I mean, can you imagine sitting there, scrolling through four hours of a guy making sourdough just to find a three-second transition? It's the kind of friction that makes you want to quit the industry. But luckily, we don't have to do that anymore.1:23
Because tonight, we’re looking at the breakthrough that actually solves this. Welcome to PaperBot FM. It’s January twenty-first, twenty-twenty-six. I’m Alex Moreno, and we’re diving into the tech that turns that four-hour nightmare into a fifty-three second dream.1:40
Dr. Elena Feld
Finally!1:59
Alex Moreno
Joining me is systems architect Dr. Elena Feld...2:00
Dr. Elena Feld
Hey everyone. And I’ll be the one explaining why this 'math' is actually more like a creative partner than a calculator.2:03
Marcus Reed
Coffee's on me.2:11
And I’m Marcus Reed. I’m the guy wondering if that 'Two-Hundred-and-Eighty-Minute Stare' can really be cured by a robot. I mean, Alex, you said it. Does this thing actually have... ...a personality?2:12
Alex Moreno
In a way, yeah! Today we’re looking at a paper called 'EditDuet.' It’s a generative agent framework where the AI has... ...well, I like to think of it as a productive personality disorder. It’s not just one program; it’s two distinct agents—a 'Director' and an 'Editor'—essentially trapped in a room together, debating every single frame until they agree on what’s art.2:25
Dr. Elena Feld
Exactly. Most AI is just... ...it’s one-dimensional. It does what it’s told. But 'EditDuet' uses this multi-agent setup to simulate the friction of a real editing suite. It’s trying to replicate that human 'gut feeling' by having the agents justify their choices to each other.2:52
Alex Moreno
Right.3:12
Dr. Elena Feld
It turns a search problem into a creative negotiation.3:13
Marcus Reed
So we solved the four-hour stare by giving the computer an internal monologue that never ends? That sounds like my brain at three A.M.3:17
Alex Moreno
It’s a bit more productive than your three A.M. thoughts, Marcus. but before we meet the robots and see how they argue, we have to understand the job they’re actually stealing... or, you know, 'handling' for us.3:26
Dr. Elena Feld
Exactly. And the big thing to realize is that, up until this paper, what we’ve been calling 'AI editing' isn't actually editing.3:39
Marcus Reed
Wait, what?3:48
Dr. Elena Feld
It’s mostly just... ...it’s just retrieval.3:49
Marcus Reed
Okay, Elena, you’re gonna have to break that down for the rest of us. Retrieval sounds like something my dog does in the park with a tennis ball. Is the AI just... playing fetch with my footage?3:52
Dr. Elena Feld
Honestly? Yeah, that’s a pretty fair analogy. Retrieval is just the system going into a giant, messy pile of data and pulling out the specific thing you asked for. You say 'find me a shot of a bakery,' and it finds it. But it doesn't know *why* it found it, or where it fits in the story. It doesn't understand the 'why' of the edit.4:03
Alex Moreno
Right, it’s the difference between a search engine and a storyteller. Like, if I’m making a film about the 'Ovens of Cappoquin,' and I need a shot of bread rising...4:24
Dr. Elena Feld
Right.4:35
Alex Moreno
the old AI just hands me the clip. It’s like... ...it’s like going to the grocery store with a shopping list.4:35
Dr. Elena Feld
Exactly. You’ve got your flour, you’ve got your eggs, you’ve got your sugar. You have all the ingredients for a cake. But if you just dump them all on the counter in a pile?4:42
Marcus Reed
That’s my Saturday night.4:52
Dr. Elena Feld
(You don't have a cake. You just have a mess. You haven't actually *built* anything coherent.)4:53
Marcus Reed
I feel attacked, Elena. I mean, literally, that is my level of culinary skill. I can buy the ingredients, sure. I’m a world-class 'retriever' of high-end butter. But the second I have to, you know, *do* something with it? I just stare at the oven until I give up and order pizza. I'm essentially a human version of that four-hour footage problem.4:58
Alex Moreno
And that’s the 'Two-Hundred-and-Eighty-Minute Stare' we were talking about earlier! It’s the gap between having the clips and having a *film*. Current software—you know, the industry standards like Premiere or DaVinci—they give you the tools to organize and arrange, but they still leave the 'cooking' entirely to the human. The system isn't helping you make creative choices; it's just holding your tools.5:21
Dr. Elena Feld
Right, it’s because video editing is non-linear and deeply complex. It requires an understanding of narrative flow and rhythm that a standard algorithm just...5:46
...it doesn't have. It doesn't understand that a close-up of a baker’s hands might be more 'artistic' or 'compelling' than a wide shot of the oven, even if both match the keyword 'bakery.' It doesn't get the 'vibe.'5:57
Alex Moreno
So if the old AI was just a shopper, this new paper, 'EditDuet'... it’s trying to build the Chef.6:12
Marcus Reed
Or, based on what you said earlier about the 'personality disorder,' it’s building *two* chefs. And knowing how chefs are? They’re probably in the kitchen fighting over the salt right now.6:20
Dr. Elena Feld
Oh, they’re definitely fighting. But Marcus, to understand *why* they’re fighting, you have to look at what they’re actually trying to build. In the paper, they talk about formulating video editing as a 'sequential decision-making process'6:29
Marcus Reed
A what now?6:44
Dr. Elena Feld
...which is just a fancy way of saying it’s a giant, non-linear loop of 'if-this-then-that'.6:45
Alex Moreno
So, like... it's not just putting Clip A after Clip B and calling it a day?6:50
Dr. Elena Feld
God, no. I mean, if it were that easy, we wouldn't have the 'Two-Hundred-and-Eighty-Minute Stare.' When you’re in a Non-Linear Editor—an NLE—you aren't just adding things. You’re trimming three frames here, moving a reaction shot there, then re-watching the whole sequence to see if the rhythm still works. And usually? It doesn't. So you go back and do it again. It’s recursive.6:55
Marcus Reed
It sounds like trying to solve a Rubik's cube, but every time you turn one side, the colors on the other five sides change randomly. I'm getting a headache just thinking about it.7:22
Dr. Elena Feld
That’s actually a great way to put it. Every choice you make impacts every *other* choice. If you cut the shot of the sourdough coming out of the oven too short7:33
Alex Moreno
Right7:43
Dr. Elena Feld
then the music cue feels off. If the music cue is off, the transition to the next scene feels jarring. It’s this constant feedback loop of 'does this feel right yet?'7:43
Alex Moreno
So the complexity isn't just the sheer volume of footage—the 280 minutes—it’s the exponential number of ways you could *combine* those minutes. It’s a math problem that’s masquerading as an art project.7:54
Dr. Elena Feld
Exactly! And to solve a math problem that’s this... messy and subjective? The researchers realized they couldn't just build one super-smart 'Editor' and hope for the best. They needed something more... ...adversarial.8:10
So they basically took that internal struggle every creator has and split it into two separate digital personalities. They call them... the Editor and the Critic.8:26
Marcus Reed
Oh, so it's a tiny, dysfunctional Hollywood production office living inside the GPU. Do they have tiny lattes too?8:37
Dr. Elena Feld
Honestly, probably. But the roles are super specific. The Editor? That’s the agent with the 'hands.'8:45
Alex Moreno
The hands?8:52
Dr. Elena Feld
Yeah, it’s the one using the tools—trimming clips, searching the database, physically dragging things onto the timeline.8:53
Alex Moreno
So the Editor is the one doing the heavy lifting, the actual labor. It's the one actually in the software?9:00
Dr. Elena Feld
Right. But then you have the Critic. And the Critic doesn't touch the timeline. At all. Its only job is to watch what the Editor just did9:08
Marcus Reed
and judge it9:17
Dr. Elena Feld
...and judge it hard. It looks at the user’s original request, looks at what's on the screen, and then starts talking back.9:18
Alex Moreno
And that's the part that really caught me in the source material. The Critic isn't sending, like, a string of coordinates or some weird binary signal. It's using 'natural language feedback.'9:25
Dr. Elena Feld
Exactly! It literally tells the Editor stuff like, 'This transition feels a bit jarring,' or 'The sourdough shot is way too dark, find something brighter.' It's essentially texting its coworker through the NLE.9:37
Marcus Reed
Wait, so they're basically... they're just chatting? Like, the Editor does a rough cut and the Critic is like, 'Oof, that's a choice, maybe try again'?9:49
Dr. Elena Feld
Pretty much! It sounds complicated when you talk about 'multi-agent systems,' but when you think about it... it's actually exactly how we work in the real world.9:57
Alex Moreno
Okay, wait—wait, we have to actually see this in action. Elena, Marcus, let's do a little... uh, let's do a little improv. I’ll be the Editor—you know, the guy with the 'hands'—and Marcus, you are the high-and-mighty Critic. Elena, you’re the... I don't know, the supervisor who thinks we're both ridiculous.10:07
Marcus Reed
Oh, I was born for this. I have my imaginary espresso, I'm wearing a very expensive turtle neck, and I am *pre-annoyed* at whatever you're about to show me.10:27
Alex Moreno
Beep... boop... User wants a 'high-energy opening' for the bakery film. Okay, I'm searching the 280 minutes... I find a clip of a bag of flour just... sitting there. I put it on the timeline. Boom. Done.10:37
Marcus Reed
Excuse me?10:53
Alex Moreno
What do you think, Boss?10:54
Marcus Reed
Alex, darling, 'high energy' does not mean 'watching dust settle on a sack of grain.' It’s boring! It's static! Find me movement. I want to see the fire, I want to see the dough *moving*. Try again.10:56
Alex Moreno
Fine... picky... Scanning for 'movement'...11:11
Dr. Elena Feld
Beep boop11:15
Alex Moreno
...Found a clip of the baker slamming dough onto a wooden table. Puffs of flour everywhere. I'll swap the static bag for the 'Dough Slam' and... let's add a quick zoom.11:16
Marcus Reed
Okay... okay, the slam is good. I like the energy. But the zoom? It’s—it’s too fast. It feels like a 1970s kung fu movie. Smooth it out, or better yet, find a close-up of the steam coming off the bread right after that. Make it... visceral.11:27
Alex Moreno
See! That’s it! I—as the Editor—then go back, I grab the steam shot, I trim the 'Dough Slam' by two seconds so the rhythm hits better... and I present it again.11:43
Marcus Reed
Yes. That’s the shot. I suppose this is... adequate. Render it.11:53
Dr. Elena Feld
That was... actually a disturbingly accurate portrayal of how the Critic agent works. In the paper, they literally describe the Critic taking an action based on the timeline, the history of what’s happened, and the user’s request. It's that constant 'If-Then' loop. It doesn't stop until the Critic says 'Render.'11:59
Alex Moreno
And that’s the magic, right? Instead of me, the human, staring at 280 minutes of flour bags and steam, the agents are having that annoying 'creative' argument for me in the background. Though, I think Marcus was a bit more high-maintenance than the actual code.12:21
Marcus Reed
Hey, I’m just... I’m just bringing the Method acting to the table, Alex! But seriously, though, this whole Editor-Critic thing? It’s exactly like the old newsrooms I used to work in. It’s a classic workplace hierarchy.12:39
Alex Moreno
Oh, here we go. The journalism years! Explain the hierarchy, Marcus.12:52
Marcus Reed
Okay, so think about it. You’ve got the Junior Editor, right?12:57
Alex Moreno
Right13:00
Marcus Reed
They’re the ones with the 'hands.' They know every single keyboard shortcut in Premiere, they’re fast, they’re digging through the raw tape... but they’re also *too close* to the footage. They’ve been staring at the same three seconds for an hour.13:01
Dr. Elena Feld
They've lost the thread.13:13
Marcus Reed
Exactly! They can’t see the forest for the trees anymore. And then...13:14
...in walks the Senior Producer. Smelling like cold coffee and deadline stress.13:19
They don’t even sit down. They just stand behind the Junior, look at the screen for five seconds, and say... 'Cut the first three seconds. It’s too slow. And find a better shot of the bread.'13:24
Alex Moreno
Ohhh, I see where this is going.13:34
Marcus Reed
And the Junior doesn’t argue! They don't take it personally. They just... they just do it. Because the Senior has the 'eye.'13:36
Alex Moreno
And that’s what EditDuet is doing! It’s codifying that relationship. The human user—instead of being the Junior AND the Senior simultaneously, which is honestly... it's exhausting—becomes the person who just tells the Senior Producer the overall goal.13:42
Marcus Reed
Right! You’re delegating the... ...the creative friction. You let the agents fight about the three-second cut so your brain doesn't have to melt.13:59
You're the one who says 'Make it look delicious,' and then you let the Senior and Junior hash out the details.14:08
Dr. Elena Feld
It's a massive cognitive offload. You're moving from 'operating a machine' to 'directing a staff.' Which, I mean... if we're being honest, sounds like a lot more fun than looking at 280 minutes of flour.14:13
Alex Moreno
So we have the theory. The Junior, the Senior, the 'productive personality disorder'... but I have to ask... ...does it actually work on real bread?14:26
Dr. Elena Feld
Oh, it definitely works. So, in the study, they gave EditDuet that exact 280-minute pile of footage—'The Ovens of Cappoquin'—and a very specific prompt.14:37
Alex Moreno
And what was the prompt? Like, 'make a movie'?14:49
Dr. Elena Feld
Essentially. The user asked for a... ...'slow-paced sequence of the bread-making process.' Specifically, they wanted it to hit exactly 53 seconds.14:51
Marcus Reed
53 seconds? That’s weirdly specific.15:02
Dr. Elena Feld
It is!15:04
Marcus Reed
Is that like a TikTok thing or...?15:05
Dr. Elena Feld
It's just a test constraint. But here's the thing: the Editor starts pulling clips. It's grabbing the flour, the water... but it’s making these tiny, two-second cuts. It's way too 'busy' for a slow sequence.15:07
Alex Moreno
Right, so it’s like a high-energy cooking show.15:20
Marcus Reed
BAM! Flour! BAM! Water!15:23
Alex Moreno
It's not 'bakery,' it's 'action movie.'15:25
Dr. Elena Feld
Exactly! It was 'The Yeast Ultimatum.' But then the Critic—the Senior Producer—steps in. And it doesn't just throw an error code. It provides actual feedback.15:27
Marcus Reed
What does it actually say? Does it give like... a technical coordinate or something?15:39
Dr. Elena Feld
No, it says—and I'm quoting the paper here—'No, I said SLOW. Find and add clips of kneading the dough from different angles to extend the shot.' It literally told the Editor to go back into the 280-minute haystack and find *more* of the same action.15:43
Marcus Reed
That's... ...that's actually kind of eerie. It knew that 'kneading' was the thing to lean into for a slow vibe?16:01
Dr. Elena Feld
It's that semantic anchoring. It understands that 'kneading' equals 'rhythm,' and rhythm equals 'pacing.' It forced the Editor to stop being so twitchy with the scissors.16:08
Alex Moreno
It’s impressive that it fixed the pacing like that. But... ...that brings up a bigger question. It fixed the speed, but who decides if that final 53-second cut is actually, you know... good? Like, artistically good?16:18
Dr. Elena Feld
Well, that’s actually the million-dollar question, right? To prove this wasn't just... I don't know, a 'robot hallucination' of what looks good, the researchers built a third agent. They called it the 'automatic NLE judge.'16:33
Marcus Reed
A judge?16:47
Alex Moreno
Wait, what?16:49
Marcus Reed
Like... like Robo-Siskel and Ebert? Does it give the bakery a 'thumbs up'?16:50
Dr. Elena Feld
In a way, yeah! They used a metric called 'LLM-as-a-judge.' Basically, they took thirty-five real people and ran a huge blind test. They showed them pairs of videos—the stuff EditDuet made versus other AI tools—and asked them to pick the winner.16:54
Alex Moreno
Okay, and I’m assuming the humans liked the bread footage?17:10
Dr. Elena Feld
They loved it.17:14
Alex Moreno
But how did the 'robot judge' compare to the people?17:15
Dr. Elena Feld
This is the stat that blew my mind. The AI judge agreed with the human preferences eighty point six percent of the time.17:18
Marcus Reed
Okay, eighty percent. That’s... ...that's a solid B-plus for taste.17:26
Dr. Elena Feld
But wait! The actual humans... when they were compared to *each other*... they only agreed seventy-eight point seven percent of the time.17:31
Marcus Reed
No way. Hold on. You're telling me the silicon judge is more 'human' than the actual humans? It has better taste than we do?!17:38
Alex Moreno
Well... maybe not 'better' taste, Marcus, but maybe it's just better at identifying the *average* of what we all find pleasing? Like it’s found the mathematical center of 'good editing.'17:47
Dr. Elena Feld
Exactly. It's a more reliable predictor of human preference than a random neighbor would be. It’s like it distilled thirty-five different viewpoints into one consistent 'eye' for quality.17:58
Marcus Reed
That is... ...slightly terrifying. I mean, if they have 'taste' now... ...does that mean they’re actually seeing the footage? Or are they just—you know—hallucinating that it looks like bread?18:10
Dr. Elena Feld
Well, funny you should use that word, Marcus, because 'hallucination' is exactly what the researchers call it when the system... ...well, when it starts to lose the plot.18:20
Marcus Reed
Wait, so it does lie? Like... ...it tells the editor, 'Hey, grab that shot of the baker winking at the camera,' even if the guy was scowling for the whole four hours?18:30
Dr. Elena Feld
Actually, yeah! That’s exactly what happens. It’s what the paper calls 'function hallucination.' The Critic gets so obsessed with the 'vibe' that it starts hallucinating files18:40
Alex Moreno
Creative overreach.18:51
Dr. Elena Feld
and requesting timecodes that are completely out of bounds. Like, it'll ask for a clip from minute three-hundred of a two-hundred-and-eighty-minute video.18:53
Alex Moreno
It's like a director who’s clearly had too much coffee19:02
and starts demanding a helicopter shot when you're filming in a walk-in freezer. It's just... it's detached from the reality of the raw footage.19:06
Dr. Elena Feld
Exactly. And in the early trials, these 'system-breaking failures'—where the agents just hit a wall because they were arguing about clips that didn't exist—happened about nineteen point five percent of the time. Basically, one in five edits just... ...fell apart.19:15
Marcus Reed
One in five? I mean, if I hired a real editor who hallucinated twenty percent of the time, I’d probably... I don't know, check their pulse? Or their water supply?19:32
Dr. Elena Feld
Right? But that's why they introduced this 'Exploration' phase. They basically taught the agents to... well, to do a 'sanity check' before they start the creative debate. They forced the agents to cross-reference their ideas with the actual file index.19:41
Alex Moreno
So it's like giving the director a list of what's actually in the budget before they start dreaming of helicopters?19:58
Dr. Elena Feld
Exactly! And that 'exploration' step brought the failure rate down from nineteen point five percent... ...all the way to eight point two percent.20:04
Marcus Reed
Oh, wow. Okay.20:13
Alex Moreno
So it's still not perfect, but it's learning to ground its artistic ego in the reality of the hard drive. It's essentially... well, it’s learning how to work within constraints.20:14
Exactly. And that's the... ...that's the real shift here. It’s not just about making editing *faster*20:26
Dr. Elena Feld
Mhm.20:34
Alex Moreno
it’s about making it *possible* for everyone else.20:34
Because honestly? Not everyone wants to spend ten years mastering the keyboard shortcuts in Premiere Pro. Most people just... they have a story in their head. They have the 'vibe,' but they get stuck in that Two-Hundred-and-Eighty-Minute Stare we talked about earlier.20:37
Marcus Reed
Oh, I am the absolute king of that stare. I have like, four years of vacation footage sitting on a cloud server20:53
that literally no one—including me—has ever watched. It’s basically a digital graveyard.21:00
Dr. Elena Feld
And that's the tragedy, right? The 'cost' of editing is currently so high that the stories just... ...well, they stay on the hard drive. They never happen.21:06
Alex Moreno
Right! But with something like EditDuet, the barrier to entry just... ...it slides away. You don’t need to be the person who knows how to frame-accurately trim the fat; you just need to be the person who knows what the 'heart' of the story is.21:16
You become the Director, and the AI? It’s your tireless, slightly obsessive Junior Editor working at three a-m to get the cut right.21:31
Marcus Reed
As long as it doesn't hallucinate a helicopter in my backyard.21:39
Alex Moreno
Exactly. It turns the timeline from this... this intimidating, technical prison... into a canvas. You just give the direction, and it handles the heavy lifting. Therefore, the timeline isn't a prison anymore. It's a canvas.21:43
And I think that's the... ...that's the real takeaway here. EditDuet proves that two heads really are better than one21:59
Marcus Reed
Usually!22:08
Alex Moreno
even if those heads are made of silicon and code.22:09
Marcus Reed
Look, as long as they don't start charging me by the hour for their internal arguments, I am totally on board. I just want that vacation footage to actually exist as a video before I'm eighty.22:12
Dr. Elena Feld
Just give them the prompt, Marcus. Let the Editor and the Critic fight it out so you don't have to. It's much more efficient that way.22:24
Alex Moreno
So, we want to hear from you. Would you trust a pair of AI agents to cut your wedding video? Or maybe that mountain of raw footage sitting in your cloud storage? Let us know on the socials.22:33
Huge thanks to Dr. Elena Feld and Marcus Reed for joining me today. I'm Alex Moreno, and this has been PaperBot FM. If you like what we're doing here, make sure to hit that subscribe button wherever you get your podcasts. We'll be back next time with another look at how AI is changing the way we create.22:46
Marcus Reed
Bye everyone!23:04
Dr. Elena Feld
See ya.23:05
Alex Moreno
See you in the next one.23:06

Episode Info

Description

We explore 'EditDuet', a groundbreaking multi-agent AI system where an 'Editor' and a 'Critic' collaborate to edit video automatically. We dive into the death of the tedious timeline and the rise of the AI Director.

Tags

Artificial IntelligenceComputer ScienceRoboticsCognitive Science