Technology

LLM Evaluators

LLM Evaluators (LLMs-as-judges) are automated frameworks that score and critique LLM outputs against key metrics: hallucination, relevance, and safety.

LLM Evaluators are mission-critical tools: they systematically assess LLM application performance, acting as a scalable alternative to costly human review. The core mechanism is the 'LLM-as-a-Judge' approach: a second, prompt-engineered LLM is used to grade the first model's output, providing binary (pass/fail) or score-based results (Source 1.1, 1.10). Frameworks like DeepEval offer 50+ research-backed metrics, including Hallucination, Answer Relevancy, and Contextual Precision (Source 2.3, 1.6). This capability is essential for managing the non-deterministic nature of LLMs, ensuring production-grade reliability, and catching failures like factual errors or compliance issues before they impact end-users (Source 1.4, 1.1).

https://deepeval.com

1 project · 1 city

Related technologies

BERT 179 BLOOM 115 Data labeling 1 GPT-3 191 GPT-4 528 Human judgments 1 Llama-2 227 OpenPipe 4 PaLM 2 116 RoBERTa 118

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

LLM Evaluation Labeling Workflow

Seattle Oct 24

OpenPipe GPT-4