Kyoto University Participation to WAT 2016 Fabien Cromieres Chenhui Chu Toshiaki Nakazawa [email protected] [email protected] KyotoNMT [email protected] Attention Model <1000> Results new state LSTM <1000> Input: ウイスキーはオオムギから 製造される <2620> 水素 hydrogen is <1000> <1000> <1000> LSTM LSTM LSTM LSTM LSTM さ LSTM <620> <620> ウイスキー を We investigated 調査 whisky した <30000> Source Embedding 私 は 学生 I です am a Previously generated word produced from barley present <620> Target Embedding is at <500> <620> whisky petroleum さ softmax <620> and れる れる Current context gas 製造 製造 maxout LSTM natural から から <1000> from 石油 オオムギ <3620> produced や は <1000> LSTM 天然ガス ウイスキー <1000> Ja -> En EBMT NMT 1 NMT 2 the 現在 <1000> オオムギ Student barley Output: whisky is produced from barley EBMT vs NMT • Depending on experiments, we changed (see Result column for details): • multi-layer LSTM • larger source and target vocabulary size The important details • During our experiments, we found that using these settings appropriately had a significant impact on final results: • Regularization • Beam-search • ADAM • normalizing the loss by length • Ensembling • ensembling of several models or selfensembling • Segmentation • Automatic segmentation via JUMAN or KyotoMorph • Or subword units with BPE • Random noise on previous word embedding • in the hope of reducing cascading errors at translation time • we add noise to the target word embedding at training time • works well, but maybe just a regularization effect AM-FM 59.52 56.27 55.85 Pairwise 47.0 (3/9) 44.25 (4/9) JPO Adequacy 3.89 (1/3) - # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 200k (JUMAN) 52k (BPE) - NMT 2 1 30k (JUMAN) 30k (words) x4 En -> Ja EBMT NMT 1 BLEU 31.03 36.19 AM-FM 74.75 73.87 # layers Source Vocabulary • as shown in picture above • Training algorithm BLEU 21.22 24.71 26.22 Pairwise 55.25 (1/10) JPO Adequacy 4.02 (1/4) Target Vocabulary Ensembling 52k (BPE) - New word • We used mostly the network size used in the original paper • weight decay • dropout • early stopping • random noise on previous word embedding • from 2 days for single-layer model on ASPEC Ja-Zh • to 2 weeks for multi-layer model on ASPEC Ja-En • Uses dependency trees for both source and target side は <1000> • Training done on a NVIDIA Titan X (Maxwell) • Example-based Machine Translation • Tree-to-Tree Previous state <1000> [email protected] KyotoEBMT • Essentially an implementation of (Bahdanau et al., 2015) • Implemented in Python with the Chainer library <1000> Sadao Kurohashi • EBMT: less fluent • NMT: more under/over-translation issues Src Ref EBMT NMT 本フローセンサーの型式と基本構成,規格を図示, 紹介。 Shown here are type and basic configuration and standards of this flow with some diagrams. This flow sensor type and the basic composition, standard is illustrated, and introduced. This paper introduces the type, basic configuration, and standards of this flow sensor. Conclusion and Future Work • Very good results with Neural Machine Translation • especially for Zh -> Ja • Long training times mean that we could not test every combination of setting for each language pair • Some possible future improvements: • Adding more linguistic aspects • Adding newly proposed mechanisms (copy mechanism, etc.) NMT 1 2 Ja -> Zh EBMT NMT 1 52k (BPE) BLEU 30.27 31.98 AM-FM 76.42 76.33 # layers Source Vocabulary NMT 1 2 Zh -> Ja EBMT NMT 1 NMT 2 30k (JUMAN) BLEU 36.63 46.04 44.29 AM-FM 76.71 78.59 78.44 # layers Source Vocabulary Pairwise 30.75 (3/5) 58.75 (1/5) JPO Adequacy 3.88 (1/3) Target Vocabulary Ensembling 30k (KyotoMorph) - Pairwise 63.75 (1/9) 56.00 (2/9) JPO Adequacy 3.94 (1/3) - Target Vocabulary Ensembling NMT 1 2 30k (KyotoMorph) 30k (JUMAN) x2 NMT 2 2 200k (KyotoMorph) 50k (JUMAN) - Code available (GPL) • KyotoEBMT: http://lotus.kuee.kyoto-u.ac.jp/~john/kyotoebmt.html • KyotoNMT: https://github.com/fabiencro/knmt
© Copyright 2026 Paperzz