Peter Gostev
- A version of this story originally appeared in the BI Tech Memo newsletter.
- Sign up for the weekly BI Tech Memo newsletter here.
I interviewed Peter Gostev, AI capability lead at Arena.ai, recently. He created what I call the defecation test, officially known as BullshitBench. Something else he mentioned caught my attention, though.
It’s a striking example of AI agents in the wild. While editing photos in Adobe Lightroom, he had 50 images that needed denoising, a tedious task typically done one-by-one. Instead of doing it manually, or even knowing how to batch it, Gostev let OpenAI’s Codex AI coding service figure it out.
“You have to go and click into each one to denoise 50 photos. That sounds like hard work, so I just got Codex to go and work out how to do that,” Gostev told me. “It just worked.”
What’s notable is how Codex pulled this off: not through an official API, plugin, or browser workaround, but by somehow interfacing directly with the desktop app, despite no clear support for this.
Gostev is technically advanced and what he did here is something I probably couldn’t do. Still, it’s a glimpse of where AI agents are heading: not just assisting, but autonomously navigating and operating software like a human would (or in this case doing it faster and better than a smart human).
Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.
Â