Technology
BLEU
BLEU (Bilingual Evaluation Understudy) is the industry-standard metric for automatically assessing machine translation quality: it correlates MT output with human reference translations using modified n-gram precision.
BLEU is a core metric for machine translation (MT) evaluation, introduced by IBM Researchers Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu at the 2002 ACL conference. It quantifies translation quality by comparing the machine-generated text (candidate) against one or more human-created reference translations. The algorithm primarily relies on modified n-gram precision, counting the overlap of word sequences (up to 4-grams are common) between the candidate and the references. A brevity penalty is applied to discourage overly short translations. The final BLEU score is a single number between 0 and 1: a score closer to 1.0 indicates higher similarity to the human reference, establishing it as a quick, inexpensive, and highly correlated alternative to costly human evaluation.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1