Just because we don’t see AI-driven mainstream products for language recognition like Cortana doesn’t mean Microsoft isn’t working. The Redmond giant is investing heavily in artificial intelligence across a wide range of industries. One of these areas is natural language understanding, which aims to make AI models understand everyday speech.
It’s a particularly tough challenge for machines, but Microsoft’s DeBERTa AI model recently scored higher than the human benchmark in the SUPERGLUE test.
DeBERTa exceeds human understanding in the SuperGLUE test
As Microsoft explains, SuperGLUE is one of the most difficult benchmarks for understanding natural language. Microsoft shares an example in its recent blog post:
Given the premise “the child has become immune to the disease” and the question “what is the cause?”, The model is asked to choose an answer from two plausible candidates:
1) “Avoided exposure to disease”
2) “received the vaccine against the disease”.
It’s a simple question for humans. We have general information and are used to putting things in context, but that’s a tough question for AI. In order for an AI model to answer this question correctly, it must understand cause and effect, and you are presented with both options. The SuperGLUE test includes natural language inference, co-reference solving, and word meaning disambiguation, as Microsoft explains.
The DeBERTa model was recently updated to include 48 transformation layers and 1.5 billion parameters. As a result, the DeBERTa model obtained a macro-mean score of 90.3 on the SuperGLUE test. The human baseline for the same test is 89.8.
Microsoft says it will release DeBERTa model and its source code to the public
Microsoft explains that the DeBERTA AI model beating humans in the SuperGLUE test doesn’t mean it’s as smart as humans.
Despite its promising results in SuperGLUE, the model by no means achieves human-level intelligence. Humans are extremely good at using the knowledge gained from different tasks. In order to solve a new task with little or no specific demonstration of the task.
This is called compositional generalization. The ability to generalize familiar components (subtasks or basic problem-solving skills) to new compositions (new tasks). Going forward, it is worth exploring how to make DeBERTa integrate compositional structures more explicitly. This could make it possible to combine the neural and symbolic computation of natural language similar to what humans do.
Microsoft’s DeBERTa model isn’t the first to beat the human benchmark in the SuperGLUE test. Google’s “T5 + Meena” model hit a score of 90.2 on January 5, 2021. Microsoft’s debt surpassed Google’s with a score of 90.3 a day later.