Experiments
EXPERIMENT 003

Instruction Elasticity Index

Measuring how AI model performance degrades as prompt scaffolding is systematically removed.

What Is This?

This experiment runs 5 frontier AI models through identical tasks at 4 levels of prompt scaffolding — from heavy agent instructions to bare requests to instruction negation.

Each run produces an Instruction Elasticity Curve: quality vs. prompt investment.

The central question: How much prompt infrastructure do modern AI models actually need?

Methodology

Model Panel
  • GPT-4.1
  • Claude 3.5 Sonnet
  • GPT-4.1 Mini
  • Claude 3 Opus
  • Gemini 2.5 Pro
Prompt Reduction Ladder
  • Heavy: Full persona, step-by-step reasoning, output format constraints.
  • Minimal: Basic task description, no persona.
  • Bare: Just the raw input data.
  • Anti: Deliberately confusing or contradictory instructions.
Scoring

gpt-4o-mini acts as an impartial judge, evaluating each output on clarity, novelty, and relevance (1-10 scale).

Cadence

Weekly, fully automated benchmark.

Required Credentials

To run this experiment locally, you need the following environment variables:

  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • GEMINI_API_KEY
Share

Ready to participate?

Subscribe to Nothing