One day when you instruct Cortana in a state of inebriation to book a cab, it will understand your alcohol-layered lisp and call a cab for you. In what could further thin down the line between human and machine, Microsoft has developed a new speech recognition system that is capable of transcribing conversational speech with minimal error. Making a major breakthrough in speech recognition, Microsoft has been able to create a technology that not only recognizes words in a conversation with low error rate but also with an efficiency equivalent to humans. Also Read - How to download and install Windows 11 Preview beta on your PCAlso Read - Your Window PC can now run Android apps
In a paper published this week, a team of researchers and engineers in Microsoft Artificial Intelligence and Research reported that the new speech recognition system makes the same or fewer errors than professional transcriptionists. The Word Error Rate (WER) of the new system is 5.9 percent, down from the 6.3 percent WER the team reported just last month. The 5.9 percent error rate is said to be about equal to that of people who were asked to transcribe the same conversation, and it s the lowest ever recorded against the industry standard Switchboard speech recognition task. Merging artificial intelligence with speech recognition, Microsoft one day hopes to use the technology in making its voice-based assistant Cortana along with the speech-to-text transcription software more efficient. Also Read - Windows 11 launched: Top 5 features of Windows 11 which makes it better than Windows 10
Microsoft s chief speech scientist, Xuedong Huang, said, We ve reached human parity. This is an historic achievement.” Microsoft has been long working on improving its speech recognition technology which holds potential in the field of medicine where instances of people battling speech disorders have difficulties carrying out even the otherwise most simplest of tasks. It could further help those who are already undergoing speech therapies and aid in the process. Microsoft s milestone success suggests that for the first time a computer is capable of recognizing words in a conversation just as well a human would. ALSO READ: Cortana mocks Apple iPad Pro in latest Microsoft Surface Pro 4 ad
In its blog, Microsoft notes that the research milestone comes after decades of research in speech recognition, beginning way back in the early 70s with Defense Advanced Research Projects Agency (DARPA). Over the years, a lot of technology companies have joined in the pursuit of improving software-based speech recognition. Other than medical implications, the system will have a broader implementation for consumer or business products which could be driven by speech recognition. For example, entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.
While this is being touted as the culmination of ongoing research of over 20 years, Microsoft says that the research milestone doesn t mean that the system is perfect right now. The new system, similar to humans, also tends to mishear certain words like have for is or a for the . ALSO READ: BUILD 2016: Microsoft promises updated Cortana version coming to India this year
But what brings the success rate is the use of latest neural network technology. Researchers used neural language models in which words are represented as continuous vectors in space, and words like fast and quick are close together. This allows the models generalize from word to word.
These neural networks use large amounts of data which teach computers to recognize patterns from inputs such as images or sounds. In order to reach the human parity milestone, Microsoft researchers used a homegrown system for deep learning called Microsoft Computational Network Toolkit. The kit has been made available on GitHub via an open source license.
For humans, it is easier to decode words in a conversation. But for a machine how does it work? The blog further goes on to explain that the toolkit s ability to quickly process deep learning algorithms using specialized GPU chip across multiple computers helped researchers reach human parity in lesser time. ALSO READ: Watch Viv demo, the next-gen AI that eats Siri, Google Now and Cortana for breakfast
Despite achieving a major breakthrough, researchers warn that there is still a long way to go for speech recognition system to reach a level of precision where irrespective of age, accent, or surroundings, the results are with minimal errors. The researchers will now focus on ways to teach computers not only to transcribe words but also understand the words; it would basically mean adding the human-like emotional or logical response to conversations.