.

Technology

MMLU

MMLU (Massive Multitask Language Understanding) is the definitive 57-subject benchmark for rigorously testing a large language model's (LLM) general knowledge and reasoning depth.

MMLU is the premier benchmark for evaluating Large Language Models (LLMs): Massive Multitask Language Understanding. It features 15,908 multiple-choice questions spanning 57 diverse academic and professional subjects (e.g., law, computer science, US history). This test measures a model's breadth of world knowledge and problem-solving ability, pushing performance past simple conversational tasks. When released in 2020, the top model (GPT-3 175B) scored 43.9%; today's leaders like GPT-4o consistently hit 88% accuracy. We use MMLU to track model generalization and identify critical shortcomings across specialized domains.

https://arxiv.org/abs/2009.03300
3 projects · 2 cities

Related technologies

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

Sign in to see who built these projects