EXPERIMENT 003
Instruction Elasticity Index
Measuring how AI model performance degrades as prompt scaffolding is systematically removed.
What Is This?
This experiment runs 5 frontier AI models through identical tasks at 4 levels of prompt scaffolding — from heavy agent instructions to bare requests to instruction negation.
Each run produces an Instruction Elasticity Curve: quality vs. prompt investment.
The central question: How much prompt infrastructure do modern AI models actually need?
Methodology
Model Panel
- GPT-4.1
- Claude 3.5 Sonnet
- GPT-4.1 Mini
- Claude 3 Opus
- Gemini 2.5 Pro
Prompt Reduction Ladder
- Heavy: Full persona, step-by-step reasoning, output format constraints.
- Minimal: Basic task description, no persona.
- Bare: Just the raw input data.
- Anti: Deliberately confusing or contradictory instructions.
Scoring
gpt-4o-mini acts as an impartial judge, evaluating each output on clarity, novelty, and relevance (1-10 scale).
Cadence
Weekly, fully automated benchmark.
Required Credentials
To run this experiment locally, you need the following environment variables:
- OPENAI_API_KEY
- ANTHROPIC_API_KEY
- GEMINI_API_KEY
Ready to participate?
Subscribe to Nothing