Apart from all the other big ticket announcements made during its Build conference keynote today, Microsoft also announced Azure Data Lake, a data repository for big data analytics and Microsoft’s attempt to take on Amazon’s rapidly growing AWS.
Azure Data Lake would provide developers unlimited cloud storage, allowing them to save all of their data and projects that they have been working upon at a single spot. This announcement comes two years after Microsoft adopted Hadoop open source big data software and came out with its Azure HDInsight service. More recently, Microsoft partnered with another Hadoop vendor, Cloudera.
So data lake, primarily, is an attempt to allow developers and data scientists to store all of this data in a central repository and then analyze it with tools they are already familiar with.
Azure Data Lake is well optimized for analaytic workload, and is compatible with Hadoop Distributed File System and Microsoft’s HDInsight and open source tools like Spark, Storm and Kafka. Data Lake would also welcome all of the unstructured data from developers. As for all of the meaningful relational data, that can be stored in Azure’s SQL Data Warehouse, another service which Microsoft announced at Build today.
Microsoft’s corporate vice president for its data platform T.K. Ranga Rengarajan explained that the service is built on top of Azure’s hyperscale network and supports both single files that can be multiple petabytes in size, as well as high volumes of small writes and with very low latency. This helps the service to run better for real-time websites and Internet of things services.
There is no fixed limit on the file size and account size in Azure Data Lake
On the sidelines, Microsoft also shed some light on its new level of abstraction that it intends to offer to its developers for operations on multiple databases through its Azure SQL Databsae service. The new Azure SQL Data Warehouse will become available in public preview in June.