This article was published 1 yearago

Amazon
Source: Tony Webster from Minneapolis, Minnesota, United States [CC BY]

Outages are not a rare occurrence on the web, and the same was proved with Amazon’s cloud computing unit. On Tuesday, Amazon Web Services (AWS) faced a significant outage that temporarily impacted businesses relying on its services. The disruption caused engineers to encounter difficulties accessing specific tools, raising concerns about the overall scale and immediate impact on affected enterprises. As one of the leading cloud computing providers globally, AWS is relied upon by countless businesses to support their operations, data storage, and infrastructure needs.

After several hours, Amazon announced that the issue was resolved and the cloud services offered by Amazon Web Services (AWS) have been restored, and that “all AWS Services are operating normally.”

Downdetector – the well-known outage tracker – informs that 58% of users reported problems with the website, while 26% of them ran into problems with the AWS console. The remaining 16% of users reported issues with connecting to AWS’s servers. The number of reports received continued to climb steadily until it crossed the 12000-mark earlier in the day, following which the outage reports fell to less than 700, as per media reports.

“Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use of other AWS services. Additionally, customers may have experienced authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS,” Amazon said in its AWS dashboard.

“Customers may also have experienced issues when attempting to initiate a Call or Chat to AWS Support. As of 2:47 PM, the issue initiating calls and chats to AWS Support was resolved. By 1:41 PM, the underlying issue with the subsystem responsible for AWS Lambda was resolved. At that time, we began processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services. As of 3:37 PM, the backlog was fully processed. The issue has been resolved and all AWS Services are operating normally,” it added. AWS Lambda lets customers run computer programs without having to manage any underlying servers.

Speaking more of the outage, it resulted in temporary disruptions and limitations for businesses utilizing AWS services, affecting their operations and causing inconvenience for their customers. Numerous websites and services were impacted when AWS went down. These include services at Amazon’s Alexa and Amazon Music, EDGAR system of the US securities regulator, the Boston Globe and the New York Metropolitan Transportation Authority, Southwest Airlines, the Verge, and AP for Students. Delta Air Lines added that its own website was facing problems, but refused to elaborate if it was related to the AWS outage.

“You don’t realize how many eggs are in the Amazon Web Services basket until an outage takes out most of the systems you use day-to-day,” Kevin Montano, a photographer for a local NBC station in Albany, posted in a tweet on Tuesday.