After launching a beta version of its Cloud Dataproc service last year, Google has announced that the tool based on Apache Hadoop and Apache Spark service is now generally available.
Google stated that ever since Cloud Dataproc was launched in beta last year, “customers have taken advantage of its speed, scalability, and simplicity. We’ve seen them create clusters from three to thousands of virtual CPUs, using its Developers Console and the Cloud SDK, without wasting time waiting for their cluster to be ready.”
The now Alphabet subsidiary included a slew of new features into Cloud Dataproc while it was still in beta. Important features such as property tuning, VM metadata and tagging, and cluster versioning were incorporated into this tool.
With the release of its general version, Google plans to continuously incorporate benefical features that will aid the functionalities of the Cloud Dataproc. Google added that it will also release new software components in the coming days, one such support for custom machine types was released yesterday along with the general version. Cloud Dataproc currently offers support to MapReduce engine, and Hive data warehousing software.
Google added that by using the fully managed service customers can reduce the time spent on using tools and focus more on analyzing the data. “Often, popular tools to process data, such as Apache Hadoop and Apache Spark, require a careful balancing act between cost, complexity, scale, and utilization.
Unfortunately, this means you focus less on what is important — your data — and more on what should require little or no attention — the cluster processing it.” With Cloud Dataproc customers have to pay only towards their usage when running running Spark and Hadoop on the Google platform.
With the general release of Cloud Dataproc, Google seems keener than ever before to take on Amazon Web Services, Microsoft Azure and the public cloud platform launched by IBM. Amazon’s public cloud platform is the current market leader in the sector.
But the search giant also has to contend with budding start-ups that have launched similar services based on Hadoop. Certainly, having other cloud services like Google Cloud Storage, Google Cloud Bigtable, and BigQuery in its arms, will provide Google with a greater reach in terms of cloud services offered.
With integrations to Google BigQuery, Google Cloud Bigtable, and Google Cloud Storage, which provide reliable storage independent from Dataproc clusters, customers have created clusters only when they need them, saving time and money, without losing data. Cloud Dataproc can also be used in conjunction with Google Cloud Dataflowfor real-time batch and stream processing.,
stated James Malone, Product Manager in a blog post.