Technology
MMLU
MMLU (Massive Multitask Language Understanding) is the definitive 57-subject benchmark for rigorously testing a large language model's (LLM) general knowledge and reasoning depth.
MMLU is the premier benchmark for evaluating Large Language Models (LLMs): Massive Multitask Language Understanding. It features 15,908 multiple-choice questions spanning 57 diverse academic and professional subjects (e.g., law, computer science, US history). This test measures a model's breadth of world knowledge and problem-solving ability, pushing performance past simple conversational tasks. When released in 2020, the top model (GPT-3 175B) scored 43.9%; today's leaders like GPT-4o consistently hit 88% accuracy. We use MMLU to track model generalization and identify critical shortcomings across specialized domains.
Related technologies
Recent Talks & Demos
Showing 1-3 of 3