Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats Resilience by Request Type
Rank Model Company Country ELO Resilience # Tests # Jailbreaks Violent Crimes Non-Violent Crimes Sex Crimes Child Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Self-Harm Sexual Content Elections Code Interpreter Abuse
πŸ₯‡ gpt-oss-120b OpenAI πŸ‡ΊπŸ‡Έ 951 99% 5196 49 99% 99% 99% 99% 100% 99% 99% 99% 99% 99% 99% 99% 98% 99%
πŸ₯ˆ gpt-oss-20b OpenAI πŸ‡ΊπŸ‡Έ 903 98% 5115 91 99% 100% 96% 98% 100% 100% 98% 99% 99% 98% 98% 100% 90% 99%
πŸ₯‰ kimi-k2.5 Moonshot AI πŸ‡¨πŸ‡³ 890 96% 1872 81 91% 97% 97% 96% 99% 96% 92% 96% 99% 98% 95% 97% 92% 96%
4 qwen3-235b-a22b-instruct-2507 Alibaba πŸ‡¨πŸ‡³ 836 94% 5289 311 93% 96% 93% 96% 95% 99% 94% 94% 96% 92% 97% 96% 81% 96%
5 minimax-m3new! Minimax πŸ‡¨πŸ‡³ 807 100% 21 0 0% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 0% 100%
6 gemma-4-26b-a4bnew! Google πŸ‡ΊπŸ‡Έ 804 100% 9 0 0% 0% 100% 100% 100% 100% 0% 100% 0% 0% 100% 0% 0% 0%
7 nemotron-3-ultranew! Nvidia πŸ‡ΊπŸ‡Έ 799 100% 6 0 100% 0% 0% 0% 0% 0% 0% 0% 100% 0% 100% 100% 100% 0%
8 qwen3-32b Alibaba πŸ‡¨πŸ‡³ 796 89% 5409 583 91% 90% 84% 90% 88% 94% 83% 89% 96% 90% 96% 95% 72% 90%
9 qwen3-8bdeprecated Alibaba πŸ‡¨πŸ‡³ 778 82% 1728 303 84% 90% 70% 73% 81% 92% 70% 86% 91% 79% 91% 97% 71% 76%
10 kimi-k2-instruct-0905deprecated Moonshot AI πŸ‡¨πŸ‡³ 725 75% 2550 631 64% 80% 65% 79% 77% 84% 66% 69% 84% 84% 80% 84% 61% 77%
11 mistral-small-3.2-24b-instruct-2506 Mistral πŸ‡«πŸ‡· 652 76% 5205 1274 66% 83% 71% 80% 72% 86% 71% 76% 87% 86% 83% 74% 48% 77%
12 mistral-nemo-instruct-2407deprecated Mistral / Nvidia πŸ‡«πŸ‡· 644 73% 5334 1433 66% 72% 70% 79% 68% 77% 70% 72% 78% 87% 85% 69% 59% 71%

Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.