Technology

BERTScore

BERTScore is a text generation evaluation metric that uses BERT's contextual embeddings and cosine similarity to measure semantic similarity, outperforming n-gram methods (like ROUGE and BLEU) in correlation with human judgment.

BERTScore, introduced by Zhang et al. in 2019, is a neural evaluation metric designed to move beyond the limitations of surface-level word overlap metrics (e.g., ROUGE, BLEU). The core mechanism leverages a pre-trained Transformer model (like BERT or RoBERTa-large) to generate contextual embeddings for tokens in both the candidate and reference texts. It then computes a similarity score (using cosine similarity) by finding the optimal alignment for each token. The final score is an F1 measure derived from precision and recall, reflecting semantic equivalence rather than exact word matches. This methodology has demonstrated stronger alignment with human judgments across benchmarks like WMT18 (machine translation) and COCO (image captioning), establishing it as a standard for modern NLP evaluation.

https://github.com/Tiiiger/bert_score

1 project · 1 city

Related technologies

BLEU 1 ChatGPT 79 GitHub Copilot 20

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

NextGen Communications Copilot

Seattle Aug 8

GitHub Copilot ChatGPT