Technology
AgentInstruct
AgentInstruct is a multi-agent framework for Generative Teaching: It autonomously creates massive, high-quality synthetic data for instruction-tuning large language models (LLMs).
This extensible agentic framework, developed by Microsoft Research, automates high-volume synthetic data creation for LLM fine-tuning: It uses raw data (text, code) as seeds to generate both prompts and responses. The system employs a multi-agent workflow (Content Transformation, Instruction Generation, Refinement) to ensure diversity and quality across 17+ tasks. This method is proven effective: Post-training the Mistral 7B base model with an AgentInstruct-generated 25 million-pair dataset resulted in the Orca-3 model, which showed gains of 40% on AGIEval and 54% on GSM8K.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1