OpenAI has now unveiled a new API pricing model called “Flex processing.” This new tier is designed to provide developers with reduced costs for accessing AI models – the catch is that it will come with slower processing speeds and intermittent resource availability.
Flex processing is now available in beta for two of OpenAI’s models — o3 and o4-mini — both of which are tailored toward reasoning tasks. With Flex, developers can halve their token costs. For example, under standard pricing, the o3 model is priced at $10 per million input tokens and $40 per million output tokens. Under the Flex tier, the rates are reduced to $5 and $20 respectively. Similarly, o4-mini drops from $1.10 and $4.40 to $0.55 and $2.20 per million input and output tokens.
This will be useful for small developers, since OpenAI is lowering the financial barrier by halving the cost of API usage for specific AI models (thus making them more accessible to a wider user base)). These users often prioritize affordability over real-time performance, and AI models can be pricey at times (and development in the global AI landscape is getting more and more expensive by the day). In addition to this, Flex processing could be useful in non-critical sectors, such as academic, low-budget, or non-commercial contexts. The release of Flex processing also comes at a time when infrastructure costs and accessibility are critical factors in the adoption of large language models. OpenAI faces increasing pressure from rivals like Google, Anthropic, and others, which are expanding their portfolios with low-cost, high-efficiency models aimed at budget-conscious users.
“Flex processing provides significantly lower costs for Chat Completions or Responses requests in exchange for slower response times and occasional resource unavailability. It is ideal for non-production or lower-priority tasks such as model evaluations, data enrichment, or asynchronous workloads,” the AI research firm noted. To utilize Flex processing, developers are required to modify their API calls by setting a service_tier=”flex” parameter. While this approach offers savings, it comes with technical caveats. OpenAI’s documentation notes that the standard timeout for these requests is 10 minutes, but developers may extend this to 15 minutes to improve success rates. Additionally, developers are encouraged to implement exponential backoff or fallback mechanisms to switch to the standard tier when necessary.
In addition to this, OpenAI has also updated its access policies for certain models and features. Notably, access to the o3 model, as well as key functionalities like streaming support and reasoning summaries, is now gated behind an identity verification process for users in the lower three spend tiers (tiers 1 through 3). Only customers in tiers 4 and 5 are exempt from this new requirement.