Baidu’s Deep Speech 2 recognition software can write text messages faster than humans: Study

Researchers devised an experiment that pitted Baidu’s Deep Speech 2 recognition software against 32 regular smartphone users. For English, the speech recognition was three times faster and with 20.4 percent lower error rate.

  • Updated: August 30, 2016 8:05 AM IST

A speech recognition software for smartphones can write text messages three times faster than humans, say computer scientists from Stanford University, Chinese company Baidu and University of Washington. The researchers say that the discovery can spur the development of innovative speech recognition apps. An experiment was devised that pitted Baidu’s Deep Speech 2 cloud-based speech recognition software against 32 texters, ages 19 to 32, working the built-in keyboard on an Apple iPhone. “They grew up texting, so we’re putting speech recognition up against people who are really good at this task,” said James Landay, professor of computer science at Stanford.

The subjects took turns typing or speaking about 100 phrases sourced from a standard library of everyday phrases such as “physics and chemistry are hard,” “have a good weekend” and “go out for some pizza and beer”. The testing app recorded their times and accuracy rates. Half the subjects performed the task in English using the QWERTY keyboard; the other half conducted the test in their native Mandarin Chinese, using iOS’ Pinyin keyboard. For English, speech recognition was three times faster than typing and the error rate was 20.4 percent lower. In Mandarin Chinese, speech was 2.8 times faster with an error rate 63.4 percent lower than typing.

“We knew speech recognition is pretty good, so we expected it to be faster, but we were actually quite surprised to find that it was almost three times faster than typing on a keyboard,” said co-author Sherry Ruan, computer science PhD student at Stanford. Although the researchers used Baidu’s speech recognition software, they suspect that other high-accuracy speech engines perform at a similar level.

“We should put speech in more applications than just typing an email or text message,” Landay noted. You could imagine an interface where you use speech to start and then it switches to a graphical interface that you can touch and control with your finger. The study was published online at ALSO READ: Baidu’s Android marketplace launches new feature to personalize app collection

  • Published Date: August 30, 2016 8:00 AM IST
  • Updated Date: August 30, 2016 8:05 AM IST