On what would be an obvious next for the fast growing AI space, OpenAI has now released an artificial intelligence (AI) agent capable of performing various tasks autonomously on behalf of users, on its own browser. Called ‘Operator’, OpenAI’s latest AI agent can carry out tasks such as navigating websites, typing, and clicking buttons using the internet. For example, this AI agent can fill out forms, book flights, plan grocery orders, and even complete user purchases.
An AI agent is a program that uses artificial intelligence to analyze data, make decisions, and perform tasks to achieve specific goals.
According to the company, the ‘Operator’ is based on a new model called Computer-Using Agent (CUA), trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen. Speaking of details, the Operator mainly uses the combined capabilities of GPT-4’s vision features and advanced reasoning through reinforcement learning.
However, the ChatGPT maker clarifies that the Computer-Using Agent (CUA) is still in research preview and hence has limitations. CUA operates via an iterative loop that includes three stages – perception (screenshots provide visual context), reasoning (evaluates next steps with the chain of thought), and action (performs tasks like clicking, scrolling, or typing). First, it processes raw pixel data to understand what’s happening on the screen, and then the feature uses a virtual mouse and keyboard to complete actions. “It can navigate multi-step tasks, handle errors, and adapt to unexpected changes,” the company claims in a blog post.
To make this new AI agent more reliable, OpenAI is collaborating with several companies including DoorDash, eBay, Instacart, Priceline, Uber, and more. The aim behind these collaborations is to ensure that Operator works well on their platforms while respecting their service terms.
For now, the AI trendsetter has particularly released a “research preview” of its AI agent, Operator. Talking about availability, Operator will be available for ChatGPT Pro subscribers first, a recently introduced $200 per month service plan that provides access to all of the latest tools. However, OpenAI also plans to offer the tool through other paid subscription plans, and after some time, the company will also add it to the free version of ChatGPT.
Users can access the initial research preview through operator.chatgpt.com. OpenAI CEO Sam Altman also said during a live stream that Operator will soon be available in other countries, but Europe may take some time. Interestingly, Microsoft-backed OpenAI is not alone in the AI agent race. In October 2024, Anthropic announced a similar capability for Claude 3.5 Sonnet. While in December last year, the tech titan Google launched its own web-browsing AI agent, called ‘Project Mariner.’
This comes days after Sam Altman informed that the company has finalized the first version of its latest reasoning AI model, ‘o3-mini’. After testing by external safety researchers, the new model will be launched in a few weeks.