← Back to Playground

Model Arena

Compare responses, cost, and latency across multiple AI models side-by-side.

Building

Configure

Fill in the inputs below

How to use: Enter a 'Test Prompt' you want to benchmark. Select at least 2 models. You will see side-by-side responses with analysis of strengths and weaknesses for each model.

Different models have different capabilities and costs.

This prompt will be sent to multiple models for comparison.

openai
anthropic
groq
google
Estimated Cost
$0.0600 / $5.00
~1 input tokensgpt-4o

Using the lab's shared budget: each request is capped at about $0.20 and there is a global daily limit of $5.00.

Ctrl+Enterto submitEscto close dialogs

No result yet

Fill in the form on the left and click "Analyze" to get started.