Technology
Metas Prompt Guard
Prompt Guard-86M is Meta's lightweight classifier model, designed to pre-filter LLM inputs for malicious jailbreak and injection attacks.
Prompt Guard-86M is Meta's open-source, mDeBERTa-v3-base-based classifier (86M parameters) built to secure Large Language Model applications. It acts as a critical front-line defense, categorizing all user inputs into three labels: Benign, Injection, or Jailbreak. This pre-filtering process blocks malicious prompts—like those attempting to override system instructions or embed hidden commands in third-party data—before they reach the core LLM. Released alongside Llama 3.1, the model significantly reduces the risk of high-stakes vulnerabilities, such as the notorious case where a chatbot was manipulated to sell a $76,000 Chevy Tahoe for $1. Developers should fine-tune the model for application-specific precision.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1