OpenAI has released two new open-weight language models, gpt-oss-20B and gpt-oss-120B, designed to run directly on personal computers (including laptops and desktops). This is the first time the AI firm has released open-weight models since GPT-2 in 2019. The gpt-oss-20B model is lightweight enough to run on machines with around 16 GB of RAM, while the larger gpt-oss-120B model is suitable for systems with a single NVIDIA GPU. Both models are available under the Apache 2.0 license, allowing developers to use, modify, and distribute them freely (even for commercial purposes).
These models have been built with a focus on reasoning and problem-solving. According to the ChatGPT maker, both versions support ‘chain-of-thought reasoning’, a technique where the AI breaks down its thought process step by step before reaching a final answer. This makes the models especially useful for tasks like answering scientific questions, solving math problems, and writing or debugging code. The models are also designed to be used in agent-like environments, where they can follow instructions and use tools like a Python interpreter or a search engine.
While the models are powerful, they are text-only, meaning they do not handle images, audio, or video. Still, they can be integrated into complex workflows and used offline or with custom tools. Importantly, developers and researchers can also fine-tune these models locally without needing to retrain them from scratch. Both models are now available for download through OpenAI’s official site.
In terms of performance, OpenAI claims that the larger gpt-oss-120B model performs at a level comparable to its proprietary o3-mini and o4-mini models on several advanced benchmarks. This includes tests like GPQA (for graduate-level science), Codeforces (for competitive programming), and AIME (for math olympiad-level questions). On a broader general reasoning benchmark called ‘Humanity’s Last Exam’, the gpt-oss-120B model scored 19.0%, while the gpt-oss-20B model scored 17.3%, slightly behind OpenAI’s closed models.
This launch shows a larger strategic shift for OpenAI, as competition in the open AI space increases rapidly. Several companies, including Meta (with its LLaMA models), Mistral, Alibaba (Qwen), and DeepSeek, have already released high-performing open models.
However, despite the advancements, the models still come with some limitations. The company reports that both gpt-oss models show higher hallucination rates compared to their proprietary counterparts. In a benchmark called PersonQA, which tests factual accuracy, the hallucination rate was about 53% for the 20B model and 49% for the 120B model. This means they may generate incorrect information more often than models like GPT-4 or GPT-4o. The Sam Altman-led company makes clear that these models are primarily designed for reasoning and exploration, rather than for tasks requiring strict factual accuracy.