Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki Fujita, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura Graduate School of Information Science, Nara Institute of Science and Technology 1 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Background 2 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Speech Translation Systems ● Translate speech from source language to target ASR こんにちは、駅はどこですか? MT Hello, where is the station? TTS 3 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Problem: Delay ● Wait for the whole utterance to end before translating Delay ASR こんにちは、駅はどこですか? MT Hello, where is the station? TTS 4 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Solution: Divide into Smaller Chunks ● Choose appropriate timing to start translation Delay: Reduced ASR こんにちは、駅は MT MT Hello, TTS どこですか? MT the station where is it? TTS TTS 5 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Previous Work: Incremental Dependency Parsing/Manual Rules [Ryu+ 04] ● Utilize knowledge of English/Japanese to derive rules subj I went prep prep to the park with your brother Translate after the first prepositional phrase completes! MT MT 私は公園に行きました あなたの弟と ● - Requires a bilingual linguist to design rules ● - Requires an accurate incremental dependency parser 6 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Previous Work: Division on Pauses [Fugen+ 08, Bangalore+ 12] ● Simply divide on short pauses in the utterance ASR hello ● ● where is the station - Cannot capture relationship between languages - Result will greatly change with speech speed, disfluencies 7 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Proposed Method ● Utilize the TM directly to choose translation timing ● ● ● ● + Can be constructed automatically + Uses information about the language pair + Very simple to implement Specifically: ● ● ● Choose translation timing at the end of each phrase in the phrase table Utilize reordering probabilities to adjust granularity Adapt the language model to the translation task 8 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Preliminaries 9 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Phrase Based Machine Translation ● Divide the sentence into small phrases and translate Today I will give a lecture on machine translation . Today I will give 今日は、 を行います Today 今日は、 a lecture on の講義 machine translation 機械翻訳 machine translation 機械翻訳 a lecture on の講義 I will give を行います . 。 . 。 今日は、機械翻訳の講義を行います。 ● Score translations with translation model (TM), reordering model (RM), and language model (LM) 10 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Translation Model Creation ● Perform automatic alignment of bitext ● From aligned text, extract phrases for translation ホ テ 受 ルの付 the hotel front desk ホテル の → hotel ホテル の → the hotel 受付 → front desk ホテルの受付 → hotel front desk ホテルの受付 → the hotel front desk Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Lexicalized Reordering Model ● ● Probabilistically models reorderings for increased accuracy of translation Given current phrase and next phrase: Monotone: 背 の 高い 男 太郎 を 訪問 した the tall man visited Taro Discontinuous Right: 私 は 太郎 を 訪問した I ● Swap: visited Taro Discontinuous Left: 背 の 高い 男 を 訪問 した visited the tall man “monotone” + “discontinuous right” = “right probability” Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Proposed Method 13 Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” phrase exists ↓ wait the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” phrase exists ↓ wait the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” “where is” phrase exists ↓ wait the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” “where is” phrase exists ↓ wait the station→ 駅 station Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” the station→ 駅 station “where is” “where is the” phrase exists phrase missing ↓ ↓ wait translate “where is” Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” the station→ 駅 station “where is” “where is the” phrase exists phrase missing ↓ ↓ wait translate “where is” Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method One: Choosing Translation Timing with Phrases ● Input words one at a time from ASR ● While words exist in phrase table, don't translate yet Phrase Table hello→ こんにちは where is→ どこですか where→ どこ the→ その Input String hello where is the “hello” “hello where” phrase exists phrase missing ↓ ↓ wait translate “hello” the station→ 駅 station “where is” “where is the” “the station” phrase exists phrase missing utterance ends ↓ ↓ ↓ wait translate translate “where is” “the station” Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Problem with Method One ● Has the potential to degrade translation accuracy: Normal phrase-based translation: こんにちは 駅 は どこ ですか Hello, where is the station Translation with early timing: こんにちは 駅 は どこ ですか Hello, the station where is it Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method Two: Adjusting Timing with Reordering Probabilities ● ● First, temporarily choose strings according to method one Next, if that phrase's right probability exceeds a threshold, actually translate the words in the cache Example (threshold = 0.8): hello where is the station “hello” “hello where” “where is” “where is the” “the station” phrase exists phrase missing phrase exists phrase missing utterance ends ↓ ↓ ↓ ↓ ↓ wait choose “hello” wait choose “where is” translate ↓ ↓ “where is right probability right probability is the station” is 0.9 > 0.8 0.6 < 0.8 ↓ ↓ translate “hello” do not translate yet ● Threshold 1.0 = traditional, 0.0 = method one Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Problem with Method Two ● LMs are traditionally trained on sentences ● ● This is not appropriate for translating shorter chunks e.g.: The translator will try to “finish” sentences こんにちは Hello . 駅 は どこ ですか Where is the station ? Where is it . Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Method Three: Language Model Adaptation ● Before learning the language model, split the training data according to the same criterion Traditional LM Training Hello, where is the station. My name is John. Chunk Segmentation Hello, where is the station. My name is John. LM Training Sentence Level LM Proposed Method LM Training Chunk Level LM Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Experiments Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Experimental Setup ● Four Types of Experiments: ● ● ● ● ● Japanese-English BTEC Travel Conversation (ja-en) Japanese-English BTEC with 11+ Words (ja-en 11+) English-Japanese BTEC Travel Conversation (en-ja) French-English WMT News (fr-en) Evaluation Measures: ● Accuracy – – ● 14-ref BLEU for BTEC, 1-ref BLEU for News Manually-graded acceptability Delay (Seconds) Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Accuracy (BLEU) Result One: Comparison Across Settings 80 70 60 50 40 30 20 10 0 t=1.0 t=0.0 0 2 en-ja ja-en ja-en (11+) fr-en 4 6 8 10 12 14 Delay (Seconds) ● ● Delay decreases in all settings Better delay/accuracy tradeoff for long sentences, similar languages Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Result Two: Compare with Pause-based Segmentation Accuracy (BLEU) 50 45 40 35 Proposed Pause 30 25 20 1 1.5 2 2.5 3 3.5 4 4.5 Delay (Seconds) ● In faster settings proposed method best ● In slower settings pause-based method best Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Manual Evaluation ● Decrease in manual evaluation as well, but less obvious than evaluated by BLEU Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Conclusion Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Conclusion ● Proposed method for choosing timing in speech translation using phrase table and reordering model ● ● ● ● Considers reordering tendencies across languages Simple and language independent Competitive accuracy Future work: ● ● Combination of prosodic, reordering, and syntactic cues How do we evaluate translations? Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Thank You! Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Example: サーフィン に いい 場所 を 教え て ください please tell me a good surfing place サーフィン に | いい 場所 を 教え て ください for surfing | please tell me a good place
© Copyright 2026 Paperzz