Advertisement

Updated October 20th, 2020 at 13:45 IST

Facebook unveils AI translator for 100 languages without relying on English data

Facebook announced that the MMT model that can directly translate “100×100 languages” in any direction without relying on only English-centric data.

Reported by: Kunal Gaurav
Facebook
| Image:self
Advertisement

Facebook unveiled a software based on machine learning that can translate between any pair of 100 languages without relying on English. The first multilingual machine translation (MMT) model is an open-source artificial intelligence software which directly trains on one language to another without using English as intermediate data which helps preserve the meaning.

Facebook AI research assistant Angela Fan said in a blog post that advanced multilingual systems can process multiple languages but they compromise on accuracy by relying on English data to bridge the gap between the source and target languages. Fan announced that the MMT model that can directly translate “100×100 languages” in any direction without relying on only English-centric data.

“Our model directly trains on Chinese to French data to better preserve meaning. It outperforms English-centric systems by 10 points on the widely used BLEU metric for evaluating machine translations,” wrote Fan.

Read: Facebook Rejects 2.2 Million Ads; Takes Down 120,000 Posts Ahead Of US Elections 2020

Read: Congress Cries Foul Over Blocking Of Meira Kumar's Facebook Page, Draws 'Bihar Polls' Link

The research assistant further described that Facebook built the “many-to-many” data set with 7.5 billion sentences for 100 languages. She said that tech giant used several scaling techniques to build a universal model with 15 billion parameters, which captures information from related languages and reflects a more diverse script of languages and morphology.

Bridge languages

Fan said that the team identified a small number of bridge languages, which are usually one to three major languages of each group, to connect the languages of different groups. Giving the example of Hindi, Bengali, and Tamil as bridge languages for Indo-Aryan languages, she said that the team mined parallel training data for all possible combinations of these bridge languages. 

“Our training data set ended up with 7.5 billion parallel sentences of data, corresponding to 2,200 directions. Since the mined data can be used to train two directions of a given language pair...our mining strategy helps us effectively sparsely mine to best cover all 100×100 directions in one model,” she wrote.

Read: Trump Slams Facebook, Twitter For Taking Down Controversial Article Critical Of Joe Biden

Read: Facebook Bans Anti-vaccination Ads To Clamp Down On Misinformation Amid COVID-19

Advertisement

Published October 20th, 2020 at 13:46 IST

Your Voice. Now Direct.

Send us your views, we’ll publish them. This section is moderated.

Advertisement
Advertisement

Trending Quicks

BREAKING: Mukhtar Ansari Suffers Heart Attack, Admitted to Hospital in Serious Condition
6 minutes ago
Mukhtar Ansari Sentenced to Life Imprisonment in 1990 Arms Licence Case | Live
7 minutes ago
Mukhtar Ansari Sentenced to Life Imprisonment in 1990 Arms Licence Case | Live
9 minutes ago
Congress leader Mani Shankar Aiyar
10 minutes ago
ED Attack case: Shahjahan Sheikh Sent to Judicial Remand Till April 9
15 minutes ago
Manipur
16 minutes ago
Eknath Shinde
20 minutes ago
Inflation and growth
21 minutes ago
CUET
22 minutes ago
UP: Man Burned Alive By In-Laws in Mathura, Probe On
24 minutes ago
Diljit Dosanjh
28 minutes ago
Advertisement
Advertisement
Advertisement
Whatsapp logo