Columbia Science Review
  • Home
  • About
    • Our Team
  • Blog
  • Events
    • 2022-2023
    • 2021-2022
    • 2020-2021
    • 2019-2020
    • 2018-2019
    • 2017-2018
    • 2016-2017
  • Publications
  • COVID-19 Public Hub
    • Interviews >
      • Biology of COVID-19
      • Public Health
      • Technology & Data
    • Frontline Stories >
      • Healthcare Workers
      • Global Health
      • Volunteer Efforts
    • Resources & Links >
      • FAQ's
      • Resource Hubs
      • Student Opportunities
      • Podcasts & Graphics
      • Mental Health Resources
      • Twitter Feeds
      • BLM Resources
    • Columbia Events >
      • Campus Events
      • CUMC COVID-19 Symposium
      • CSR Events
    • Our Team
  • Contact

Transfer Learning: Teaching Machines to Become Better Translators

6/1/2022

0 Comments

 
Picture
Illustrated by Sreoshi Sarkar


By Eleanor Lin

There are 7,000 languages spoken around the world, each one unique in the ways it encodes meaning. Translation is thus a challenging task for humans and machines alike. How do you find the right words and grammatical constructs to accurately transfer the sense of a message in the original language to a different one?

Neural machine translation (NMT) takes a very different approach to solving this problem than a human translator would. NMT relies on showing a machine learning algorithm millions of examples of human-generated translations from one language to another, in order to teach it to translate on its own. For example, to train an algorithm to translate from English to German, the sentence "I read a book," paired with translations written by human translators fluent in English and German (e.g., "Ich lese ein Buch") might appear alongside millions of other example sentences.

However, this becomes a problem for low-resource languages, which are languages that do not have large amounts of data available to use as examples for training NMT systems. A prime example of a high-resource language is English: the dominant language of the internet, translated to and from countless other languages, and which accounted for 56% of all web content in 2014. Meanwhile, with over 7,000 languages worldwide but only an estimated 165 used online, there are bound to be thousands of low-resource languages, for which high-quality translation training data are sparse, or even non-existent. Without sufficient data to learn from, NMT produces subpar translations to and from low-resource languages.

In order to overcome this obstacle, researchers at the University of Southern California created a transfer learning method in 2016 to improve NMT performance in low-resource languages. The main idea is that by first teaching a "parent" translation model to translate between two high-resource languages, a "child" model will be better able to learn how to translate between a paired high-resource and low-resource language. We can analogize this to the fact that after a person has learned French, it might be easier for them to learn one of the other Romance languages, since they are related languages and share similarities in vocabulary and grammar. Similarly, the researchers found that when they transferred some of the information learned by a parent model to a child model, they obtained better translation quality than NMT models that did not use transfer learning. The boost in performance was even more pronounced when the high-resource and low-resource languages used in the parent and child models, respectively, were more similar to one another.

There remains plenty of room for improvement to machine translations across the board, as indicated by the simple fact that human translations are still the gold standard by which machine translations are judged. Nevertheless, transfer learning has potential to broaden the power of neural machine translation to more languages around the world, allowing low-resource language communities to carve out their own digital spaces alongside dominant languages on the global stage. After all, knowledge—and therefore, language—is power.

​
0 Comments



Leave a Reply.

    Categories

    All
    Artificial Intelligence
    Halloween 2022
    Winter 2022-2023

    Archives

    April 2024
    January 2024
    February 2023
    November 2022
    October 2022
    June 2022
    January 2022
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    November 2019
    October 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    April 2018
    March 2018
    February 2018
    November 2017
    October 2017
    May 2017
    April 2017
    April 2016
    March 2016
    February 2016
    December 2015
    November 2015
    October 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    May 2014
    April 2014
    March 2014
    February 2014
    December 2013
    November 2013
    October 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    April 2011
    March 2011
    February 2011
    September 2010
    August 2010
    July 2010
    June 2010
    May 2010
    April 2010
    March 2010
    February 2010
    January 2010
    December 2009
    November 2009
    July 2009
    May 2009

Columbia Science Review
© COPYRIGHT 2022. ALL RIGHTS RESERVED.
Photos from driver Photographer, BrevisPhotography, digitalbob8, Rennett Stowe, Kristine Paulus, Tony Webster, CodonAUG, Tony Webster, spurekar, europeanspaceagency, Christoph Scholz, verchmarco, rockindave1, robynmack96, Homedust, The Nutrition Insider
  • Home
  • About
    • Our Team
  • Blog
  • Events
    • 2022-2023
    • 2021-2022
    • 2020-2021
    • 2019-2020
    • 2018-2019
    • 2017-2018
    • 2016-2017
  • Publications
  • COVID-19 Public Hub
    • Interviews >
      • Biology of COVID-19
      • Public Health
      • Technology & Data
    • Frontline Stories >
      • Healthcare Workers
      • Global Health
      • Volunteer Efforts
    • Resources & Links >
      • FAQ's
      • Resource Hubs
      • Student Opportunities
      • Podcasts & Graphics
      • Mental Health Resources
      • Twitter Feeds
      • BLM Resources
    • Columbia Events >
      • Campus Events
      • CUMC COVID-19 Symposium
      • CSR Events
    • Our Team
  • Contact