claude opus 4.8

Anthropic has launched Claude Opus 4.8, its latest flagship AI model, while also confirming that the more advanced Claude Mythos system will begin rolling out in the coming weeks. According to the firm, Opus 4.8 delivers major upgrades in coding, reasoning, and autonomous AI workflows, with a strong focus on enterprise use cases. The Dario Amodei-led company is also positioning Opus 4.8 as a more reliable model with lower hallucination rates and better self-correction abilities during complex reasoning. The release comes as competition in the AI industry intensifies between Anthropic, OpenAI, Google, and Meta, especially around AI systems that can independently handle software engineering and research tasks.

Benchmark data released along with the model shows Claude Opus 4.8 delivering some of the strongest performance numbers in the industry across agentic coding, reasoning, and computer-use tasks. On SWE-Bench Pro, one of the most difficult real-world software engineering benchmarks, Opus 4.8 scored 69.2%, outperforming Opus 4.7 at 64.3%, OpenAI’s GPT-5.5 at 58.6%, and Google’s Gemini 3.1 Pro at 54.2%. Notably, the benchmark measures how effectively AI systems can autonomously resolve real GitHub software issues, debug repositories, and generate working patches across complex codebases.

However, OpenAI retained a lead in terminal-based coding workflows. On Terminal-Bench 2.1, which evaluates agentic terminal coding and command-line execution tasks, GPT-5.5 achieved 78.2%, ahead of Opus 4.8’s 74.6%. Still, Anthropic narrowed the gap significantly compared with Opus 4.7, which scored 66.1%. Meanwhile, Gemini 3.1 Pro posted 70.3% on the same benchmark.

Anthropic also demonstrated major improvements in reasoning performance. On Humanity’s Last Exam, a benchmark designed to test multidisciplinary expert-level reasoning, Opus 4.8 achieved 49.8% without tools and 57.9% with tools enabled. Those numbers exceeded Opus 4.7’s scores of 46.9% and 54.7%, respectively, while also outperforming GPT-5.5, which scored 41.4% without tools and 52.2% with tools.

A major focus of Opus 4.8 is reducing hallucinations and unsupported claims, an issue that has become increasingly important as AI systems enter enterprise environments. Anthropic says early testing showed the new model is significantly more likely to admit uncertainty when evidence is weak rather than confidently generating inaccurate responses.

Another major development announced by Anthropic is Claude Mythos, the company’s advanced cybersecurity-focused AI system that is expected to roll out more broadly to customers in the coming weeks. Mythos is believed to be specifically designed for advanced cyber operations, including vulnerability discovery, code auditing, exploit-path analysis, defensive infrastructure testing, and autonomous threat investigation. Until now, access to the model had reportedly been restricted to a defensive coalition known as Project Glasswing, where technology giants like Amazon, Microsoft, and Apple were allowed to use the preview system for critical infrastructure defense. The model has raised concerns among global policymakers and tech executives because of fears that such highly capable cybersecurity AI systems could be weaponised if misused. During early internal testing, Mythos allegedly scanned around 1,000 open-source software projects and identified more than 23,000 security vulnerabilities within minutes.

The Tech Portal is published by Blue Box Media Private Limited. Our investors have no influence over our reporting. Read our full Ownership and Funding Disclosure →