.

Technology

Metas Prompt Guard

Prompt Guard-86M is Meta's lightweight classifier model, designed to pre-filter LLM inputs for malicious jailbreak and injection attacks.

Prompt Guard-86M is Meta's open-source, mDeBERTa-v3-base-based classifier (86M parameters) built to secure Large Language Model applications. It acts as a critical front-line defense, categorizing all user inputs into three labels: Benign, Injection, or Jailbreak. This pre-filtering process blocks malicious prompts—like those attempting to override system instructions or embed hidden commands in third-party data—before they reach the core LLM. Released alongside Llama 3.1, the model significantly reduces the risk of high-stakes vulnerabilities, such as the notorious case where a chatbot was manipulated to sell a $76,000 Chevy Tahoe for $1. Developers should fine-tune the model for application-specific precision.

https://huggingface.co/meta-llama/Prompt-Guard-86M
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects