On Thursday, OpenAI released GPT-5.3-Codex, a new model that extends its Codex coding agent beyond writing and reviewing code to performing a much wider range of work tasks. The release comes as competition continues to heat up among artificial intelligence companies vying for market share in the AI-powered coding tools space.
OpenAI says GPT-5.3 combines the coding performance of GPT-5.2-Codex with the reasoning and professional-knowledge capabilities of GPT-5.2, while operating 25% faster. This allows GPT-5.3-Codex to handle long-running tasks that involve research, tool use such as web search or database calls, and complex execution and planning across both general work tasks and software development.
Codex has reached over 1 million developers, OpenAI claims. And while Anthropic’s Claude Code has also seen rapid adoption, head-to-head data comparing the two tools remains scarce. SemiAnalysis reports that 4% of GitHub public commits, or new code uploaded to repositories, are currently being authored by Claude Code, and it projects that figure could reach 20% or more by the end of 2026.
Benchmark one-upmanship
OpenAI says GPT-5.3-Codex now has the best score of any model on SWE-Bench Pro, a benchmark that evaluates real-world software engineering across four programming languages. The same goes for Terminal-Bench 2.0, which measures the terminal skills coding agents need.
Anthropic says its new Claude Opus 4.6 model, also announced Thursday, achieved top scores on several industry benchmarks, including Humanity’s Last Exam (complex multidisciplinary reasoning), GDPval-AA (economically valuable knowledge work), and BrowseComp (hard-to-find information search).
OpenAI says its new model is capable of taking into account larger bodies of information while working on a task, as well as thinking about those tasks for longer periods without human intervention. In testing, OpenAI says it saw GPT-5.3-Codex autonomously iterate on game development over millions of tokens using generic prompts like “fix the bug” or “improve the game.”
Similarly, Anthropic says its new Opus 4.6 model can comprehend larger code bases and make more thoughtful decisions about how to add new code.
OpenAI says GPT-5.3-Codex is built to support the full software life cycle, including debugging, deploying, and monitoring code, as well as writing product requirement documents and conducting research.
Beyond coding to knowledge work
The same agentic capabilities that expand Codex’s coding skill can apply to tasks well outside the realm of software development, OpenAI says, extending to functions such as creating slide decks and analyzing data in spreadsheets.
On GDPval, an OpenAI evaluation measuring performance on well-specified knowledge-work tasks across 44 occupations, GPT-5.3-Codex matches GPT-5.2 while adding stronger coding capabilities. On OSWorld-Verified, which tests computer use in a visual desktop environment, GPT-5.3-Codex achieved 64.7% accuracy, compared with 38.2% for its predecessor.
Anthropic has taken its Claude Code tool in the same direction—to help a wider pool of information workers with a far broader set of business tasks.
GPT-5.3-Codex is the first model OpenAI classifies as “high capability” for cybersecurity-related tasks under its Preparedness Framework, and the first the company has directly trained to identify software vulnerabilities. OpenAI is committing $10 million in application programming interface (API) credits to accelerate cyber defense, particularly for open-source software and critical infrastructure systems.
GPT-5.3-Codex is now available to paid ChatGPT subscribers in the Codex app, in the command-line interface, as an IDE extension, and on the web. OpenAI says it is working to enable API access (used by enterprise and independent developers) to the model soon.