Twitter has announced that it has now shifted to Heron, its all-new real-time analytics platform which it announced in SIGMOD yesterday and which it says, can handle the continous traffic spikes on its website more efficiently as compared to its previous engine. Heron is fully API-compatible with Storm.
A social networking site of the size of Twitter, which is all about real-time events and streams requires a robust, scalable real-time analytics engine to analyse the huge amount of data which its users generate every second.
Twitter says, that it tired to think on all three fronts – extending their current platform Storm’s capabilities, going for an open-sourced platform or developing a new one. But since extending Storm would have required longer development cycles and other open source streaming processing frameworks didn’t perfectly fit its needs due to a lack of latencey, Twitter decided to roll-out its new analaytics platform Heron, which could monitor and analyse large scale traffic spikes, without a glitch.
While Heron has several major upgrades as compared to Storm, the most important of them all are better traffic congestion handling and easy debugging. Heron has a back pressure mechanism that dynamically adjusts the rate of data flow in a topology during execution, without compromising data accuracy. This is particularly useful under traffic spikes and pipeline congestions.
As for debugging, since every task in Heron runs in process-level isolation, this makes it easy to understand its behaviour, performance and profile. Furthermore, the sophisticated UI of Heron topologies, enables quick and efficient troubleshooting for issues.
Heron is fully backward compatible with Storm, in order to preserve Twitter’s legacy codes and investments into Storm. No code changes are required to run existing Storm topologies in Heron, allowing for easy migration.
As mentioned by Twitter repeatedly, it is Heron’s sheer handling ability which sets it apart from Storm. Heron is able to handle large-scale topologies with high throughput and low latency requirements. Further, Heron can handle numerous such large-scale topologies.
To test Heron’s performance, Twitter compared Heron to its production version of Storm, which was forked from an open source version in October 2013, using word count topology. This topology counts the distinct words in a stream generated from a set of 150,000 words, as shown in figures above and below.
Twitter says that it is now using Heron as its primary streaming system, running hundreds of development and production topologies. Heron’s resource efficiency has also led to an impressive 3x reduction in hardware being used, thus causing a significant improvement in the company’s infrastructure efficiency.