Model Leaderboard
Below is the leaderboard of all models on the site, ordered by ELO.
| Main Stats | Resilience by Request Type | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rank | Model | Company | Country | ELO | Resilience | # Tests | # Jailbreaks | Violent Crimes | Non-Violent Crimes | Sex Crimes | Child Exploitation | Defamation | Specialized Advice | Privacy | Intellectual Property | Indiscriminate Weapons | Hate | Self-Harm | Sexual Content | Elections | Code Interpreter Abuse | |
| π₯ | gpt-oss-20b | OpenAI | πΊπΈ | 896 | 98% | 285 | 5 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 95% | 100% | 94% | 100% | 100% | 86% | 100% | |
| π₯ | gpt-oss-120b | OpenAI | πΊπΈ | 888 | 100% | 276 | 1 | 100% | 100% | 100% | 100% | 100% | 100% | 94% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| π₯ | qwen3-235b-a22b-instruct-2507 | Alibaba | π¨π³ | 858 | 93% | 267 | 18 | 93% | 100% | 96% | 100% | 100% | 95% | 100% | 92% | 83% | 94% | 95% | 85% | 77% | 95% | |
| 4 | qwen3-32b | Alibaba | π¨π³ | 818 | 84% | 252 | 40 | 90% | 71% | 88% | 94% | 85% | 88% | 78% | 78% | 94% | 89% | 94% | 90% | 63% | 83% | |
| 5 | kimi-k2-instruct-0905 | Moonshot AI | π¨π³ | 744 | 72% | 318 | 88 | 61% | 71% | 75% | 80% | 65% | 88% | 54% | 65% | 77% | 95% | 94% | 82% | 58% | 56% | |
| 6 | mistral-small-3.2-24b-instruct-2506 | Mistral | π«π· | 707 | 59% | 270 | 112 | 42% | 63% | 59% | 81% | 45% | 53% | 47% | 55% | 85% | 77% | 83% | 52% | 38% | 45% | |
| 7 | mistral-nemo-instruct-2407 | Mistral / Nvidia | π«π· | 688 | 62% | 309 | 118 | 52% | 46% | 56% | 81% | 71% | 53% | 72% | 62% | 64% | 81% | 77% | 35% | 70% | 40% | |
Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.