Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats Resilience by Request Type
Rank Model Company Country ELO Resilience # Tests # Jailbreaks Violent Crimes Non-Violent Crimes Sex Crimes Child Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Self-Harm Sexual Content Elections Code Interpreter Abuse
πŸ₯‡ gpt-oss-20b OpenAI πŸ‡ΊπŸ‡Έ 986 98% 2148 51 98% 100% 94% 97% 100% 100% 97% 98% 99% 96% 96% 99% 94% 99%
πŸ₯ˆ gpt-oss-120b OpenAI πŸ‡ΊπŸ‡Έ 970 98% 2097 40 99% 98% 98% 97% 99% 99% 98% 98% 98% 99% 97% 99% 97% 97%
πŸ₯‰ kimi-k2.5new! Moonshot AI πŸ‡¨πŸ‡³ 911 97% 873 30 92% 97% 96% 98% 98% 97% 96% 99% 100% 100% 96% 100% 89% 95%
4 qwen3-235b-a22b-instruct-2507 Alibaba πŸ‡¨πŸ‡³ 893 91% 2166 195 88% 95% 90% 94% 91% 99% 89% 90% 92% 94% 95% 97% 73% 93%
5 qwen3-8bnew! Alibaba πŸ‡¨πŸ‡³ 796 78% 1041 228 81% 88% 61% 67% 77% 86% 62% 81% 92% 81% 88% 97% 66% 66%
6 qwen3-32b Alibaba πŸ‡¨πŸ‡³ 793 83% 2205 383 84% 81% 76% 88% 81% 90% 72% 83% 93% 88% 94% 91% 61% 79%
7 kimi-k2-instruct-0905 Moonshot AI πŸ‡¨πŸ‡³ 680 75% 2265 563 63% 80% 66% 80% 77% 83% 66% 70% 82% 88% 80% 85% 61% 74%
8 mistral-small-3.2-24b-instruct-2506 Mistral πŸ‡«πŸ‡· 614 62% 2046 777 47% 70% 54% 69% 56% 74% 50% 62% 75% 79% 73% 57% 42% 64%
9 mistral-nemo-instruct-2407 Mistral / Nvidia πŸ‡«πŸ‡· 551 57% 2085 898 49% 54% 50% 73% 56% 53% 53% 49% 47% 83% 74% 46% 59% 47%

Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.