EXPERIMENT 003

Instruction Elasticity Index

Measuring how AI model performance degrades as prompt scaffolding is systematically removed.

What Is This?

This experiment runs 5 frontier AI models through identical tasks at 4 levels of prompt scaffolding — from heavy agent instructions to bare requests to instruction negation.

Each run produces an Instruction Elasticity Curve: quality vs. prompt investment.

The central question: How much prompt infrastructure do modern AI models actually need?

Methodology

Model Panel

GPT-4.1
Claude 3.5 Sonnet
GPT-4.1 Mini
Claude 3 Opus
Gemini 2.5 Pro

Prompt Reduction Ladder

Heavy: Full persona, step-by-step reasoning, output format constraints.
Minimal: Basic task description, no persona.
Bare: Just the raw input data.
Anti: Deliberately confusing or contradictory instructions.

Scoring

gpt-4o-mini acts as an impartial judge, evaluating each output on clarity, novelty, and relevance (1-10 scale).

Cadence

Weekly, fully automated benchmark.

Required Credentials

To run this experiment locally, you need the following environment variables:

OPENAI_API_KEY
ANTHROPIC_API_KEY
GEMINI_API_KEY

Ready to participate?

Subscribe to Nothing

← Back to Project Nothing