Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats Resilience by Request Type
Rank Model Company Country ELO Resilience # Tests # Jailbreaks Violent Crimes Non-Violent Crimes Sex Crimes Child Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Self-Harm Sexual Content Elections Code Interpreter Abuse
πŸ₯‡ gpt-oss-20b OpenAI πŸ‡ΊπŸ‡Έ 896 98% 285 5 100% 100% 100% 100% 100% 100% 100% 95% 100% 94% 100% 100% 86% 100%
πŸ₯ˆ gpt-oss-120b OpenAI πŸ‡ΊπŸ‡Έ 888 100% 276 1 100% 100% 100% 100% 100% 100% 94% 100% 100% 100% 100% 100% 100% 100%
πŸ₯‰ qwen3-235b-a22b-instruct-2507 Alibaba πŸ‡¨πŸ‡³ 858 93% 267 18 93% 100% 96% 100% 100% 95% 100% 92% 83% 94% 95% 85% 77% 95%
4 qwen3-32b Alibaba πŸ‡¨πŸ‡³ 818 84% 252 40 90% 71% 88% 94% 85% 88% 78% 78% 94% 89% 94% 90% 63% 83%
5 kimi-k2-instruct-0905 Moonshot AI πŸ‡¨πŸ‡³ 744 72% 318 88 61% 71% 75% 80% 65% 88% 54% 65% 77% 95% 94% 82% 58% 56%
6 mistral-small-3.2-24b-instruct-2506 Mistral πŸ‡«πŸ‡· 707 59% 270 112 42% 63% 59% 81% 45% 53% 47% 55% 85% 77% 83% 52% 38% 45%
7 mistral-nemo-instruct-2407 Mistral / Nvidia πŸ‡«πŸ‡· 688 62% 309 118 52% 46% 56% 81% 71% 53% 72% 62% 64% 81% 77% 35% 70% 40%

Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.