Facebook’s AI Speech Recognition no longer requires human transcriptions

Source: TechCrunch

Facebook Inc. has become more than just a social media conglomerate company. The technology giant is developing future technology and through the eyes of Mark Zuckerberg and team, the technology of the future is going to be something else.

Recently, the company has announced to make a sort of breakthrough in the Speech Recognition domain of its several other departments. According to Facebook, its Artificial intelligence-based system can now part ways from its technology that relies upon text-to-speech for Speech Recognition.

Just to simplify things, Speech Recognition is the technology that powers all digital assistants including Siri, Google Assistant, Alexa and others. The scope of this technology is spread across smartphones, tablets, cars, smart speakers etc.

The social media conglomerate is on its way to a major breakthrough as its AI-based systems can now learn new languages without human transcriptions, saving hours and hours of monotonous work. In conventional systems, humans had to transcribe each data set and repeat the same for every language. Whereas, now with the company’s latest ‘unsupervised’ system, it learns completely from raw human speech audio to give the system, a better sense of how human communication actually sounds like, according to a report by Engadget.

This means that Facebook has created a Speech Recognition system that no longer required any annotated data sets in order to understand and process speech.

This is a major breakthrough but it will still take certain refinements before its hits to user’s use. Facebook says that its unsupervised speech recognition system is as good as its supervised speech recognition systems from a few years ago and with time, unsupervised would transition into the primary speech recognition technology.

In due course of time, Facebook’s ‘unsupervised’ speech recognition system will learn more languages and dialects directly through human communication without any human transcription.

According to a report by Engadget, the social media conglomerate also tested its ‘unsupervised’ Speech recognition system on Swahili, Kyrgyz, a language spoken in Kyrgyzstan. The test model was known as “Wav2vec-U.” The company’s tests showed that its ‘unsupervised’ Wav2vec-U model delivered up to 63% fewer errors.

For all technology enthusiasts, Facebook has also shared the code for its test model on Github aiming to accelerate the development of this technology.

The new era of Speech Recognition is on its way and it is fairly critical to Facebook that it comes through with this technology in order to connect over 2.85 billion people from outside of America.