Recent Test Runs
OpenAI
GPT-5
openai-gpt5
Best scenario:OS/NS
Worst scenario:Seq/S
vs previous:0.0%
GPT-4o
openai-gpt4o
Best scenario:OS/NS
Worst scenario:Seq/S
vs previous:0.0%
Anthropic
Claude Sonnet 4.5
anthropic-sonnet
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%
Claude Opus 4.5
anthropic-opus
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%
Gemini 2.5 Flash
google-flash
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%
Gemini 3 Pro
google-pro
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%
Groq
GPT-OSS 120B
groq-gpt-oss-120b
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%
Kimi K2
groq-kimi-k2
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%
Llama 3.3 70B
groq-llama-3.3-70b
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%
OpenRouter
Qwen3 235B
openrouter-qwen3-235b
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%