AI has been receiving a lot of attention in the media lately as it finds increasingly more application in a number of industries.

This is especially exciting since improvements in neural machine translation (‘NMT’) now mean that it’s possible for numerous language combinations to often get translations that are not only intelligible, but actually quite good.

However, why is it still not perfect? Why is it so difficult to generate natural, native-sounding language? To understand that we must consider how humans produce language as opposed to a machine.

How does the human brain produce language?

Humans create variations of words based on their interactions. Whether with friends, business contacts, family — we all do it. Unlike machines, we understand words from their overall context.

Humans take numerous things into account when forming a sentence: our knowledge about a subject, the words’ relationship to our environment, and the company we’re in as we speak.

As we grow we continuously absorb images, developing emotions towards people and things that affect us. Consequently, we recognize sounds and voices and subconsciously associate them with those images, actions, and emotions. The older we get, the more talented we become at dissecting language. Our ability to express our thoughts coherently and efficiently improves.

Experiences are recognised by our visual senses. They’re then recorded and categorized. Emotions and intuition are triggered by chemicals that affect our nervous systems, we may then use words to express those emotions. 

We see something that triggers an emotional reaction, and subconsciously cross-reference it with our own language “database” to identify how best to express ourselves. These things must be experienced, and experience takes time to develop. There aren’t any shortcuts. Simply remembering words isn’t enough. It’s what we associate with different variations of words that counts. 

How does AI translation differ?

For machines such as NMT systems, the starting point is in the words and phrases that make up a source text, as well as any relevant content material in their database. This is what a machine translation’s linguistic decisions for the best target text are based on, as opposed to the images, emotions and experiences that are a primary reference point for humans.

While much can be achieved with highly advanced analysis of a source text, the many environmental nuances which subconsciously guide the human linguistic decision-making process are impacted by implicit elements that we naturally incorporate into our language.

The expression “reading between the lines” refers to the drawing of inferences — not from words directly, but from our knowledge, experience, or the context associated with a piece of text. We can then use this to better express ideas or thoughts in another language when cultural nuances differ, or we are constrained by the structure of our mother tongue.

If future NMT solutions could continuously consult enormous amounts of pre-existing, constantly updated, humanly translated data from which they could analyse and incorporate surrounding variables in a speaking environment, future NMT solutions could very well match human-level translations.

translation 

How do we distinguish the good translations from the bad?

One of the biggest challenges with machine translations is the on-going quality assessments they must undergo to ensure human standards are maintained. Numerous rating systems have been developed over recent years to judge the accuracy of SMTs (Statistical Machine Translations) and NMTs. The more common ones are: BLEU (Bilingual Evaluation Understudy), TER (Translation Edit Rate), and GTM (General Text Matcher).

Each has its pros and cons. BLEU scoring splits text into segments, compares them to a corpus of existing human translations, and measures closeness using several statistical metrics.

TER supports linguists with post-editing. It uses the finalised target and its source text, compares the total text to existing, accepted translations in the MT, then provides the minimum number of edits required to optimise the target. This “edit rate” isn’t to be taken as an exact figure. Rather, it gives the linguist an idea of the required total effort. It may be that many of the edit suggestions aren’t serious or necessary, so this methodology isn’t ideal as a judge of the overall quality of a machine translation output.

GTM uses several similarity metrics which check for “hits” (two words that match in the candidate and reference text) and matches of “runs” (adjacent sequences of matching words). It does so across numerous variations, factoring in all identified matches regardless of length.

TER and GTM are said to function better than other rating systems, as they look at numerous variations at different lengths, then provide a final metric indicating the effort required to improve the target text.

What does this mean for the future of AI translation?

For the foreseeable future, the difficulty any language scoring system has is functioning without a single objectively defined winning standard. There is no such thing as “the perfect, most accurate translation”. Anybody describing a translation positively in the superlative doesn’t understand that there are many accepted versions, all considered linguistically and possibly, stylistically correct. 

For now, automatic machine translation evaluation measures are less reliable than human evaluations, and are still far from being able to entirely substitute human judgement. However, through the application of well-known evaluation metrics, machine translation will continue to support linguists in getting a better idea of the quality of different translations —especially when used for standard text that must stick close to the source (e.g. technical manuals, instructions/descriptions).

Only by combining smart technology, as well as human input and decision-making, can a final version be produced within a minimal time frame, at a reasonable price and, most importantly, at the expected level of quality. 

Further reading:

Hannes Ben

Published 23 May, 2018 by Hannes Ben

Hannes Ben is Chief International Officer at Forward3D and Locaria and a contributor to Econsultancy.

8 more posts from this author

You might be interested in

Comments (2)

Brian Hennessy

Brian Hennessy, Founder at Founder, Thread | Cofounder, Talkoot

Thanks for the article Hannes. I agree completely. One of the reasons language changes so quickly is because language is one of the critical ways subcultures distinguish themselves from the larger culture. Ever since the advent of mass media, the faster mass culture has been able to appropriate slang from subcultures, the quicker the language of those subcultures change.

Teens create their own language, in part, to stand apart from adults. AI is the ultimate appropriator of language. The faster machines learn to 'sound natural,' the faster people will redefine what is means to 'sound natural.' I predict it will be a never-ending arms race.

The primary way people adapt language is through metaphor. Probably the larger reason natural language might prove to be a snipe hunt is that persuasive storytelling is based heavily on metaphorical language. A persuasive metaphor is the unexpected combination of two unrelated meanings to create a new, unique third meaning. This is the very definition of a good idea--whether it be a turn of phrase, business or product. Computers are great at the raw processing it takes to turn that idea into a reality, once you have an idea. But they won't likely outpace the human brain, with it's trillions of synaptic connections per inch, to create unexpected new ideas.

24 days ago

Avatar-blank-50x50

Hannes Ben, Chief International Officer at Forward3D Group

Thanks for the insightful and detailed comment on my article.

I very much share your thoughts and opinions.

AI will be able to help us in generating content around a narrowly defined subject. But it will be up to humans to create pieces of text which cater to their feelings. The AI won’t be wrong, just sound different.

HB

23 days ago

Comment
No-profile-pic
Save or Cancel
Daily_pulse_signup_wide

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Digital Pulse newsletter. You will receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.