Machine Translation - Capita Translation and Interpreting

Machine Translation
What, When, Why & How?
Capita Translation and Interpreting - WHITE PAPER
Inside the WHITE PAPER
What
_ _ _ _ is_ Machine
_ _ _ _ _ _Translation
_ _ _ _ _ _ _(MT)
_ _ _ ?_ |_See
_ _page
_ _ _2_ _ _ _ _ _ _ _
When
_ _ _ _ should
_ _ _ _ _you
_ _ consider
_ _ _ _ _ _MT
_ _?_|_See
_ _page
_ _ _3 _ _ _ _ _ _ _ _ _ _
Why
_ _ _ _will
_ _MT
_ _ help
_ _ _ you
_ _ _?_| _See
_ _page
_ _ _4 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
How
_ _ _ _can
_ _you
_ _ _use
_ _MT
_ _ most
_ _ _ _effectively
_ _ _ _ _ _ _? _| See
_ _ page
_ _ _5_ _ _ _ _ _
What
_ _ _ _ level
_ _ _ of
_ _quality
_ _ _ _ can
_ _ _you
_ _ _expect
_ _ _ _from
_ _ _MT
_ _ ?_ |_See
_ _ page
_ _ _6_
How
_ _ _ _safe
_ _ _is _your
_ _ _data
_ _ _when
_ _ _ using
_ _ _ _MT
_ _ ?_ |_See
_ _page
_ _ _7_ _ _ _ _ _
What
_ _ _ _does
_ _ _ the
_ _ _future
_ _ _ _hold
_ _ _for
_ _MT
_ _?_|_See
_ _page
_ _ _8 _ _ _ _ _ _ _ _
© Capita plc
Introduction
More for less is the cry! It is a common call across departments in organisations all over the world
and it’s no different in the translation arena. Couple this with the demands for immediacy that
are now common place and the expectation that mountains of content can be translated within
limited budgets – translations are needed quicker than ever before. So could Machine Translation
finally hold the key to meeting these expectations?
This white paper goes back to basics and explores the what, the when, the why and the how of
Machine Translation, often denoted by its abbreviation, MT. What is MT, when can it be used to
its best potential, and why would anyone want to use it?
A machine can translate up to 18 times more
content than a human can, whilst saving as
much as 90% in terms of costs.
Forrester, June 2013
What is
Machine
Translation?
Machine Translation is the use of software to translate
text or speech from one natural language into another.
MT is often referred to as automated translation or
instant translation. The main reason for the existence of
Machine Translation is to lower translation costs.
As computational developments become more and more
commonplace and the internet presents more multilingual and global
opportunities, research and development in the Machine Translation
field continues to be top of the agenda for all Language Service
Providers (LSPs).
The most common types of Machine Translation on the market today
are Statistical Machine Translation (SMT), Rule-Based Machine
Translation (RBMT), and Hybrid Systems, which combine both RBMT
and SMT.
One of the key issues to address at the outset of any project to
implement a Machine Translation system is to look at the exact
business needs you are trying to address and the best way to address
them. The reason for this is that quite often one size does not fit all
and therefore you need to design a system and supporting process
that meets the requirements of those business needs.
Research and development into
Machine Translation continues to
be on the top of the agenda for
Language Service Providers.
Capita plc | 2
When should
you consider
MT?
The use of Machine Translation depends
greatly upon the context and final
intention of the translated text. MT
works better for some subject matters
and language pairs than for others, and
accuracy and quality can vary.
Source: Europarl: A Parallel Corpus for Statistical Machine Translation,Philipp Koehn, 2005.
Machine Translation is most commonly used for high-volume, added-value content (such as blog comments, Wikis,
forum discussions and user reviews) that would not otherwise be translated, intended for readers who appreciate
that the translated content may not be refined.
A recent survey conducted by the Common Sense Advisory found that localisation buyers favour Machine
Translation over human translation for documentation (50% vs. 29%), FAQs (40% vs. 29%), and knowledge bases
(40% vs. 27%).
Source: Great Expectations for Post-Edited MT; How LSPs can Accelerate Turnaround Times and Lower Costs, The Common Sense Advisory, August 2013.
Machine Translation is also frequently used for reference material, internal documents, gisting purposes, gauging
customer feedback and sentiment analysis, where a high level of quality is not as necessary.
If content is ‘business critical’ and the reader is likely to rely heavily on the text for accuracy and quality, then it’s
potentially not a good candidate for MT alone.
3 | Capita plc
Why will MT
help you?
Machine Translation can help businesses in
many ways, and a recent study conducted by
the Common Sense Advisory found that the
top reasons for organisations using Machine
Translation are lower costs, faster turnaround
times, and the ability to handle more volume.
Source: Why Machine Translation Appears in Global Content Strategies, The Common Sense
Advisory, August 2013.
More content can be translated
Increase customer satisfaction
IBM estimates that 2.5 quintillion bytes of data are
created every day. (Source: www.IBM.com). With
more content in the public domain, it is inevitable
that more content will require translation. A high
quality, professional translation is not always
necessary, so now businesses can consider
translating content that was previously not
translated for a number of reasons such as cost and
speed of completion.
With the help of MT you are able to provide
information to your customers both before and
after the sale. A customers buying experience can
be enhanced from the initial browsing and shopping
stages, right through to customer support e-mails,
knowledge bases, cross-lingual chat and multilingual
search.
According to the Common Sense Advisory, 67% of
companies using MT experience faster turnaround
times and the ability to translate more content.
Using MT means that you can harvest and analyse
opinions and reviews across languages and markets,
whilst quickly and easily generating a rapid response
to foreign language comments. The value of social
media is totally undermined if you can’t understand
tweets or Facebook posts. Social media monitoring is
becoming increasingly important, and the same
should apply to your international markets.
Source: Post-Edited Machine Translation Defines, The Common Sense
Advisory, April 2013.
Augment sales opportunities
One of the values of MT is that businesses can
increase their audience (and sales opportunities) by
increasing the amount of content available to that
audience. MT enables more communication, to more
people, in more languages.
Save money
Hosting multiple MT engines means that an LSP
does not have to undertake extensive re-tuning of a
master engine each time they deploy the customer
and content specific engine. This significantly
reduces the amount of input required from expensive
IT resources. Cost savings are passed on to you as
well, as vendors typically charge an average of 64%
of the full word rate for post-editing Machine
Translation output.
Understand the market
Save time
Automated translation inevitably saves time as little
human intervention is required. Some LSPs believe
that a translator doing light post-editing can
produce 20,000 words per day – versus 2,700 without
MT
Source: Translation Future Shock, The Common Sense Advisory, April 2013.
Additionally, the raw MT output should - over time increase in quality and therefore the amount of work
(edit distance) involved by a professional linguist in
getting it to publication, or ‘good enough’ quality will
reduce.
Source: Trends in Translation Pricing, Common Sense Advisory, September
2012.
Capita plc | 4
How can you use MT most effectively?
Through their investigations and testing Capita TI has found that getting the best quality information upfront
means that they can provide a better quality output. There is an old saying which holds true that “garbage in
equals garbage out”. Machines are not as capable as humans of de-constructing complex texts, so just remember
that if a human would have difficulty understanding what you have written, a machine wouldn’t stand a chance.
In order to achieve maximum results (both in terms of quality and cost) from your MT solution, it is essential to
write your documentation in a clear, coherent, concise and structurally correct way.
Another aspect to consider is the suitability of the corpora available. High quality, subject specific, translation
memories, style guides and glossaries will inevitably produce better MT output. Used as part of a long-term
localisation strategy, MT can include a feedback loop to incorporate on-going material into the MT system.
Evaluating how MT can be most appropriately used is key for any company before they set out on their localisation
journey. When it comes to MT, one size does not fit all – there are many different use cases for MT, ranging from
content gisting to productivity gains. Be clear in your requirements and expectations so that the most appropriate
MT solution can be adopted.
5 | Capita plc
What level of
quality can you
expect from
MT?
Biggest Question
surrounding MT !
This is probably THE biggest question that surrounds
MT and will ultimately dictate its success or failure in
any organisation. If you are setting the bar at the
level of human translation then ultimately it will
fail as currently raw output, even in a custom-built
engine, is a long way from that ideal.
This then begs a second question; what IS quality? or
more importantly what is ‘good enough’ quality?
‘Good enough’ quality is if the translated text
achieves the business purpose of the content. Every
piece of content is there to do a specific job and the
question is what does the quality need to be in order
to do that job?
If content is ‘value add’, user generated or it is
something that wouldn’t have been there before MT
then most times anything is better than nothing.
The one big caveat is that it should still convey the
correct meaning. If style, grammar and spelling are
wrong that is forgiveable, but if the meaning has
been lost then you would be better with nothing,
as the content has not achieved the objective and
even worse may damage the brand. For this type of
content - with a good engine - you will achieve the
objective. If the engine is not of a sufficient standard,
or in its early days when the machine has not been
tuned, you may consider a light post edit to just to
check that the meaning is conveyed properly.
MT engines produce the best quality when they have
been custom built for each client. Not only will these
engines be industry specific, but due to the fact that
some customers have varying document/content
types, the engines are also specifically tuned to the
subject matter and style of the content. Using niche
engines means that the quality parameters can be
better refined than generic engines. This means that
the quality of the MT output going to post-editors
will no doubt increase.
The level of quality will only improve over time if the
engine learns and adapts. Including a quality
assurance feedback loop in the process will ensure
that the MT engine, its data set and the overall
process are continually being updated with dynamic
factual input.
Capita plc | 6
How safe is your data when using MT?
According to the Common Sense Advisory, security concerns 19% of those buying MT as a service. Translation
buyers worry about the communications hygiene of some of their suppliers, especially when it comes to using free
MT services over unsecured networks (Source: Great Expectations for Post-Edited MT; How LSPs can Accelerate
Turnaround Times and Lower Costs, The Common Sense Advisory, August 2013). It is vital to ensure that your LSP
administers a number of security policies across the business such as network firewalls, secure transmission, device
encryption, web restrictions, secure hosting environments and fully integrated Translation Management Systems.
Free online Machine Translation tools are great - for obvious reasons – but would you
really want your business critical, sensitive or personal data to be available to the
general public?
It is most beneficial to you when your LSP builds a custom Machine Translation engine for use by you, and only
you. Not only does this mean that the quality of your documentation will be tailored towards your specific
terminology requests, but also that your data will not be available to any other customers.
7 | Capita plc
What does the
future hold for
MT?
In today’s world, every commercial enterprise, government agency and NGO produces an unrelenting flood of
words, images, audio files, video clips, social media, and user-generated content. This information needs to be
processed, managed, and transformed for various uses – and that transformation often includes translation. All
businesses that plan to translate more content will inevitably have to use some form of MT to do it.
One of the big hurdles in the industry today is expectation management across all parties. When everyone
involved knows exactly what to expect from the process, the practice becomes a lot more manageable and useful.
MT or not MT? – that is the question.
Dealing with the human element in translation has been one of the challenges as quite often the linguists have
been reluctant to change because of their misconceptions and fears over Machine Translation. Once this has been
cracked then we will have some great converts who will hopefully spread the word in their network. Putting the
words ‘quality’ and ‘machine translation’ in one sentence would shiver the bones of many a linguist. What needs to
be realised is that if MT is really going to come of age then the two words need to become synonymous and that is
a large leap from the recent past. Translators need to be engaged so that they are part of the process. Whether
they are comfortable with the technology or not, professional translators will learn that Machine Translation is
simply a productivity tool and will learn to use it. Some will even find that specialising in post-editing MT output
can be more lucrative than doing it all by head and hand, and may provide a useful entry point into the world of
professional translation.
Harnessed in the right way, Machine Translation can be a very useful tool which will increase businesses’
multilingual accessibility by offering a translation option for content for which professional human translation is
not financially feasible. The key is to incorporate it in a way which is appropriate.
Capita plc | 8
About Capita Translation and Interpreting.
Capita Translation and Interpreting (Capita TI) provides a comprehensive range of language and localisation
services, such as website translation, interpreting, human translation, machine translation, proofreading and
transcription.
Our success is built on a combination of traditional translation methods, innovative technology and a clear focus
on quality. Capita TI can support new customers and existing clients from both the public and private sectors with
a wide range of services, built on an evolving mix of time-tested translation methods and emerging technology,
such as custom-built Machine Translation engines. Our services enable clients with an international reach to
deliver their key messages on a truly global scale.
You can be certain that everything we do is backed by Capita’s substantial know-how, extensive resources and long
record of outstanding service.
If you have any questions regarding this report, our localisation services, or if you
wish to engage in a free-of-charge MT consultation with our in-house experts,
please contact us by emailing [email protected]
9 | Capita plc
Capita Translation and Interpreting
Riverside Court
Huddersfield Road
Delph, Oldham
Greater Manchester
OL3 5FZ
United Kingdom
TEL (UK & EU) +44(0)845 367 7000
TEL (US) +1 (800) 579-5010
www.capitatranslationinterpreting.com
Part of The Capita Group Plc
© Capita plc