AI models, so far, have been used for a variety of tasks. Now, they will be used to check the work of other AI models. This is something that Meta – the parent company of social media platforms Facebook and Instagram – has been working on for quite some time, and earlier last week, the company revealed that it is launching several AI models from its research arm – Fundamental AI Research (FAIR).
Chief among the new models is a model that can check and evaluate the work of other AI models (which is a major breakthrough). Called the Self-Taught Evaluator, the model is described by the company as a “strong generative reward model with synthetic data.” Capable of generating its own training data, this model does away with the requirement of the human element during the training phase. It creates contrasting outputs based on data generated by other AI systems, and leverages another AI model to evaluate them (enabling the Self-Taught Evaluator to constantly assess its performance).
This builds upon established research in AI, especially in the area of Reinforcement Learning from AI Feedback (RLAIF). The model also takes a page out of OpenAI’s book (based on the company’s recently released o1 models) and minimizes the need for humans in the development of AI – it is informed by the “chain of thought” technique by breaking down complex problems into logical steps. It also streamlines the process of training AI models and paves the way for the development of more autonomous AI systems.
“We hope, as AI becomes more and more super-human, that it will get better and better at checking its work, so that it will actually be better than the average human,” Jason Weston, one of the researchers, commented on the matter. “The idea of being self-taught and able to self-evaluate is basically crucial to the idea of getting to this sort of super-human level of AI,” he added.
In addition to this, the social media giant has also introduced Spirit LM. This is an open-source language model that is set to integrate speech and text, utilizing phonetic, pitch, and tone tokens to allow for a more natural representation of spoken language. In this, it has an advantage for traditional LLMs, which rely on Automatic Speech Recognition (ASR) systems for the same purpose. Meta notes that Spirit LM is available as Spirit LM Base (focussing on capturing the nuances of speech) and the full Spirit LM (focussing on the emotional elements).
“Many existing AI voice experiences today use ASR to techniques to process speech before synthesizing with an LLM to generate text — but these approaches compromise the expressive aspects of speech. Using phonetic, pitch and tone tokens, Spirit LM models can overcome these limitations for both inputs and outputs to generate more natural sounding speech while also learning new tasks across ASR, TTS and speech classification,” Meta commented on the matter.