Amazon Web Services has partnered with AI chip startup Cerebras Systems to bring Cerebras’ specialized processors to Amazon’s cloud platform. The agreement will allow developers to run AI models using Cerebras hardware inside AWS data centers, with the infrastructure also connected to Amazon’s Trainium chips. The setup is designed mainly to speed up AI inference, the stage where trained models generate responses to user prompts.
Under the planned architecture, AI workloads will be distributed between Amazon’s proprietary processors and Cerebras hardware to improve performance. Amazon’s Trainium chips will handle the ‘prefill’ phase of the inference process, in which a user’s prompt is converted into tokens that can be processed by the model. The subsequent ‘decode’ phase – where the model generates its response token by token – will run on Cerebras processors that are optimized for extremely fast token generation. This division of tasks is designed to reduce latency and significantly accelerate the time it takes for AI models to respond to user queries.
The new capability will be integrated into Amazon Bedrock, AWS’s managed service for building and deploying generative AI applications. Through Bedrock, developers will be able to access the Cerebras-powered infrastructure without having to manage specialized hardware directly. The service will support both open-source large language models and Amazon’s own generative AI systems, including its Nova model family, allowing companies to deploy applications ranging from conversational assistants to document analysis and software development tools.
The development becomes even more significant as Cerebras Systems is known for its unconventional approach to semiconductor design. Instead of cutting a silicon wafer into many smaller chips, the company builds the entire wafer into a single processor called the Wafer-Scale Engine (WSE). The latest version of this processor contains around 4 trillion transistors and about 900,000 AI-optimized cores, making it one of the largest chips ever built. This design enables massive on-chip memory and extremely high bandwidth communication, allowing faster processing of large neural networks and improving real-time AI inference performance.
The collaboration also highlights the intensifying competition in the AI hardware market, which has been dominated by Nvidia and its GPU accelerators. Demand for these chips has surged with the rapid adoption of generative AI, leading cloud providers and technology companies to explore alternative architectures and develop proprietary silicon. For example, Google uses its custom Tensor Processing Units (TPUs) to power AI models across its cloud services. Similarly, Microsoft has introduced its Maia AI accelerator and Cobalt CPUs to support large-scale AI computing. Meta Platforms has also developed its own Meta Training and Inference Accelerator (MTIA) chips to run AI workloads across its platforms like Facebook and Instagram.
The Tech Portal is published by Blue Box Media Private Limited. Our investors have no influence over our reporting. Read our full Ownership and Funding Disclosure →