While artificial intelligence is becoming more ubiquitous each day, there is still the lack of human understanding among machines. A group of researchers at Microsoft believe, and I agree, that the ability to conceptualize is one of the defining characteristics of humanity. Our knowledge and the ability to link one fact to another is the background of our vast understanding of language.
So today, the researchers have publicly defined their efforts in the form of ‘Concept Graph’ to bridge the gap between human and machine understanding of concepts. Microsoft has also built an extensive knowledge database called ‘Probase’ with over 5.4 million concepts(way more than any other public database) to supplement this public tool. It can be seen as a digitized footprint of human concepts, harnessed from billions of web pages and years worth of search logs.
Researchers are working to refine the understanding of machines and eliminate some known problems in natural language processing. For example, if you feed the statement ‘animals other than dogs such as cats’ then the machine will find it ambiguous and deduce two outcomes — ”cats are animals” or ”cats are dogs.” Humans, on the other hand, can easily conceptualize that the second statement is false and now we need to build this knowledge database within computers. The computers will first need to understand the difference between persons and animals, in general.
The primary aim of developing the ‘Microsoft Concept Graph’ public tool is to map text entities in a statement into semantic concept categories by mixing in some probablity and then eliminating the ones which don’t relate to the topic in question. The computer breaking down the contect in a text-based communication is quite similar to the way humans operate in the real world. With this project, Microsoft is looking to impart common-sense computing capabilities in machines to make them “aware” of the mental state of human beings.
The researchers are building a ‘Concept Tagging model’ on this very model to map short text(or instances) into a large concept space with human-level understanding. Thus, this concept model will be able to tag the text on basis of both human and machine concepts for understanding the correct meaning behind the text. It will then employ highly probable and exceedlingly unlikely categories, contexts and relationships to further build out the knowledge graph of its human undestanding.
Microsoft is not making the complete tool available to public as of today. It is only releasing single instance conceptualization which can automatically produce a ranked list of categories for any text input into the vast system. This basic-level conceptualization will be provided to preferentially rank efficient and appropriate categories to each text instance input in the system.
One can easily understand the same via the example of Microsoft, which can be categorized into a large number of concepts such as company, software company, and largest OS vendor. But what will be basic concept of this instance?
The Concept Graph will go through its vast database and try to match it with objects that resemble the working on the company. So, the graph may find objects such as McDonald’s and BMW for company, which hold less similarity to Microsoft. It will then move on to largest OS vendor where it may not be able to find any reasonable object other than Microsoft, so it’s out of the question. Finally, when the concept graph goes through software company, it may find Oracle, Adobe, IBM, which are a lot more similar to Microsoft. Thus, software company will be the basic level concept for the same.