RERANKER HW4 SIYU QIU 1. Motivation Given n-best hypothesis sentences, we can analyze them through many meaningful aspects, extracting features and computing a weighted score so as to find the best 1 sentence matching the source. Then the question is how to set the weights of those features. In order to improve the BLEU score, I use MERT to find the optimal weights of features and generating 8 features for one sentence. 2. MERT The algorithm to achieve a minimum error training used in this homework id Powell Search. The basic idea is, for one weight, we can construct sample number of lines, with the weight as x, the corresponding feature as slope, the other F eatureẆ eight0 as y intersection. Then, we find the upper-envelop and threshold points for each target weight. By combining all the threshold points, sorting them, the whole x-axis is segmented into several parts. In each part, compute the total BLEU score and the segments rank highest will be out candidate weights. One thing that need to pay attention to is that size of candidate weights list will not be one in most cases. Heuristically, the weights should not fall beyond the interval of [−5, 5], thus before selecting a candidate weight, filtering is done first. Then I choose randomly from those filtered candidate list, iterating those steps until one yielding better BLEU score is found. 3. Feature Besides the given three: language model weight, translation model p(e—f) weight, lexical translation model p+lex(f—e) weight, 5 more features are added. • Number of words The number of words in hypothesis prevent the hypothesis from being to short. • length of reference sentence/ length of hypotheses sentence There should be some penalty for those hypotheses sentences that have length differ much from source sentences. • Number of untranslated words By using the alignment file, I built a dictionary mapping source Russian 1 2 SIYU QIU word to English word. Then, for each hypothesis, we can compute the number of words in source sentences that were not translated. • OOV For those word that never occurred in training data were treated as out of vocabulary words, 4. Experiment and Analysis 4.1. Experiment Setting. After training with MERT the weight used are as follows: Table 1. settings OOV number of word uncover LM Plex (f |e) rl/hl P (e|f ) -0.3252 0.1416 -0.4597 -1 -0.5 1.4240 -0.5061 4.2. Experiment Result. Those weights explain something, for example, OOV weight is less than 0 meaning that, number of OOV words are penalized; as the same, weight for uncover is also below zero. With these setting, the final BLEU score on test sets are 28.09. Also, some other settings which have a better performance on training data are tried but do worse on test set, which may mean overfitting.
© Copyright 2026 Paperzz