← ← Back to Playground

Model Arena

Compare responses, cost, and latency across multiple AI models side-by-side.

Building

Configure

Fill in the inputs below

How to use: Enter a 'Test Prompt' you want to benchmark. Select at least 2 models. You will see side-by-side responses with analysis of strengths and weaknesses for each model.

AI Model

Different models have different capabilities and costs.

Test Prompt

This prompt will be sent to multiple models for comparison.

Models to Compare

openai

gpt-4o

anthropic

claude-opus-4-6claude-sonnet-4-6claude-haiku-4-5claude-sonnet-4-5

groq

llama-3.3-70b-versatilellama-3.1-8b-instantllama-4-scout-17b-16e-instructllama-4-maverick-17b-128e-instructopenai/gpt-oss-120bopenai/gpt-oss-20bqwen/qwen3-32bgroq/compound

google

gemini-2.5-progemini-2.5-flashgemini-2.5-flash-litegemini-3-pro-previewgemini-3-flash-previewgemini-2.0-flashgemini-2.0-flash-lite

Estimated Cost

$0.0600 / $5.00

~1 input tokensgpt-4o

Using the lab's shared budget: each request is capped at about $0.20 and there is a global daily limit of $5.00.

Ctrl+Enterto submitEscto close dialogs

◇

No result yet

Fill in the form on the left and click "Analyze" to get started.