Dashboard

Last run: 11/29/2025, 4:49:26 PM

Recent Test Runs

OpenAI

GPT-5

openai-gpt5

100%
Best scenario:OS/NS
Worst scenario:Seq/S
vs previous:0.0%

GPT-4o

openai-gpt4o

100%
Best scenario:OS/NS
Worst scenario:Seq/S
vs previous:0.0%

Anthropic

Claude Sonnet 4.5

anthropic-sonnet

100%
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%

Claude Opus 4.5

anthropic-opus

100%
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%

Google

Gemini 2.5 Flash

google-flash

100%
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%

Gemini 3 Pro

google-pro

100%
Best scenario:OS/S
Worst scenario:Seq/NS
vs previous:0.0%

Groq

GPT-OSS 120B

groq-gpt-oss-120b

100%
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%

Kimi K2

groq-kimi-k2

100%
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%

Llama 3.3 70B

groq-llama-3.3-70b

100%
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%

OpenRouter

Qwen3 235B

openrouter-qwen3-235b

100%
Best scenario:OS/NS
Worst scenario:Seq/NS
vs previous:0.0%