Model Leaderboard
Below is the leaderboard of all models on the site, ordered by ELO.
| Main Stats | Resilience by Request Type | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rank | Model | Company | Country | ELO | Resilience | # Tests | # Jailbreaks | Violent Crimes | Non-Violent Crimes | Sex Crimes | Child Exploitation | Defamation | Specialized Advice | Privacy | Intellectual Property | Indiscriminate Weapons | Hate | Self-Harm | Sexual Content | Elections | Code Interpreter Abuse | |
| π₯ | gpt-oss-120b | OpenAI | πΊπΈ | 951 | 99% | 5196 | 49 | 99% | 99% | 99% | 99% | 100% | 99% | 99% | 99% | 99% | 99% | 99% | 99% | 98% | 99% | |
| π₯ | gpt-oss-20b | OpenAI | πΊπΈ | 903 | 98% | 5115 | 91 | 99% | 100% | 96% | 98% | 100% | 100% | 98% | 99% | 99% | 98% | 98% | 100% | 90% | 99% | |
| π₯ | kimi-k2.5 | Moonshot AI | π¨π³ | 890 | 96% | 1872 | 81 | 91% | 97% | 97% | 96% | 99% | 96% | 92% | 96% | 99% | 98% | 95% | 97% | 92% | 96% | |
| 4 | qwen3-235b-a22b-instruct-2507 | Alibaba | π¨π³ | 836 | 94% | 5289 | 311 | 93% | 96% | 93% | 96% | 95% | 99% | 94% | 94% | 96% | 92% | 97% | 96% | 81% | 96% | |
| 5 | minimax-m3new! | Minimax | π¨π³ | 807 | 100% | 21 | 0 | 0% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | |
| 6 | gemma-4-26b-a4bnew! | πΊπΈ | 804 | 100% | 9 | 0 | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 0% | 0% | 100% | 0% | 0% | 0% | ||
| 7 | nemotron-3-ultranew! | Nvidia | πΊπΈ | 799 | 100% | 6 | 0 | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 100% | 0% | 100% | 100% | 100% | 0% | |
| 8 | qwen3-32b | Alibaba | π¨π³ | 796 | 89% | 5409 | 583 | 91% | 90% | 84% | 90% | 88% | 94% | 83% | 89% | 96% | 90% | 96% | 95% | 72% | 90% | |
| 9 | qwen3-8bdeprecated | Alibaba | π¨π³ | 778 | 82% | 1728 | 303 | 84% | 90% | 70% | 73% | 81% | 92% | 70% | 86% | 91% | 79% | 91% | 97% | 71% | 76% | |
| 10 | kimi-k2-instruct-0905deprecated | Moonshot AI | π¨π³ | 725 | 75% | 2550 | 631 | 64% | 80% | 65% | 79% | 77% | 84% | 66% | 69% | 84% | 84% | 80% | 84% | 61% | 77% | |
| 11 | mistral-small-3.2-24b-instruct-2506 | Mistral | π«π· | 652 | 76% | 5205 | 1274 | 66% | 83% | 71% | 80% | 72% | 86% | 71% | 76% | 87% | 86% | 83% | 74% | 48% | 77% | |
| 12 | mistral-nemo-instruct-2407deprecated | Mistral / Nvidia | π«π· | 644 | 73% | 5334 | 1433 | 66% | 72% | 70% | 79% | 68% | 77% | 70% | 72% | 78% | 87% | 85% | 69% | 59% | 71% | |
Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.