Azure Data Catalog, Microsoft’s in-house tool for data discovery and data mining is going public, says Joseph Sirosh, Corporate Vice President at Microsoft and in charge of Azure ML, Microsoft’s specialised Machine learning platform which it announced in February last year.
Launched last February, Azure ML is the Redmond giant’s massive machine learning platform under Azure. Azure is what Microsoft uses to handle the vast amount of data in its depositaries. And considering the fact that Microsoft has more data in its data banks than many major corporations put together — Azure is something special.
Azure Data Catalog, a much needed repository of all data in taxonomised form, not only stores data along with its sources onto Microsoft’s very own cloud storage, but also catalogues in a manner that facilitates easy browsing and searching.
First off, users must register themselves with Azure’s data source registration tool. Registering enables the service to extract structural metadata including the attribute name and data type. This metadata is then copied to the catalogue in the clouds while the original data is left untouched into its original place.
Now, once the metadata is stored in the clouds it, someone looking for the data can easily find it by searching the catalogue in the clouds and instead of having through the actual data contained in the sources, all they would have to do is to look for the relevant keyword in the metadata and voila! Azure connects you to the relevant source and enables you to use your own data visualization tools for editing and modification purposes. The keywords may be found using techniques such as keyword searching, filtering etc.
And guess what? It gets even better. As users search for and find data, they are given the option of adding their own tags to the particular source listing, thus ensuring that those coming after would have more to go on and the catalogue gets better by the day.
The service will be made available by Microsoft Inc. to others in order to make data discovery easier. However, for situations where some of the data in the catalogue needs to be hidden from prying eyes, companies can obtain access to a better, upgraded version that of the service that enables them to control access.
The service will be made available in form of a public preview starting next Monday.