Pinterest’s bookmarking service Instapaper has made a full recovery following last week’s ordeal that saw the platform go down for over a day. The issue, which was caused by Instapaper hitting a system ceiling on its AWS hosted database, has now been fixed.
In case you are unaware of it, Instapaper is a bookmarking service owned by Pinterest. The service allows web content to be saved across platforms so that it pursued some time later on a different device. The service works across platforms like iOS, Android and across devices like e-readers, smartphones and tablets. Instapaper has millions of users across the world.
The service went down on the 9th of February. However, Instapaper acknowledged the fact after as many as 31 hours had already passed. In a rush to get things up and running, Instapaper restored only a part of its services wherein content saved over the last six weeks was restored. The company promised to effect a full restore soon.
Although the company had projected one week as the amount of time it could take to get the service up and running with its full capabilities again, it managed to achieve this in a much shorter duration.
Announcing the news, Instapaper said:
After suffering from an extended outage on Wednesday, February 9 at 12:30PM PT through Thursday, February 10 at 7:30PM PT, we brought the Instapaper service back up with limited access to archives as a short-term solution while we worked to restore the service completely.
Today at 1AM PT we completely restored the Instapaper service, including access to all archives. We performed the restoration without losing any of your older articles, changes made to more recent articles or articles saved after recovering from the outage.
The reason for this outage was also explained and according to Pinterest product engineer Brian Donohue, it was caused due to data failure caused by a 2 TB file size limit for RDS instances created before April 2014. Apparently, the company’s bookmarks table exceeded the said limit on Wednesday, leading to critical failure.
However, the company acknowledged that the issue was difficult to predict and while this sort of faults are unlikely, there was hardly any information to indicate that the database was reaching a failure point.
As far as we can tell, there’s no information in the RDS console in the form of monitoring, alerts or logging that would have let us know we were approaching the 2TB file size limit, or that we were subject to it in the first place. Even now, there’s nothing to indicate that our hosted database has a critical issue.
Instapaper has learned its lesson and says that going into the future, issues will be pushed to Pinterest’s Site Reliability Engineering team. It also said that the company will start testing its databases every months instead of every three months. Users on the other hand, merely appeared relieved for the recovery and were profuse in offering thanks to the the engineering team for saving content accumulated over years.