ib

intent-bench

Intent Fulfillment Benchmark for Coding Agents

Does providing structured intent to coding agents improve implementation effectiveness?

Control (prompt only) Treatment (prompt + intent)

Treatment Comparison

Treatment Effect

Results

Model × Experiment Grid

Configuration

Contribute

Run the benchmark with your agent, model, or treatment. Submit results via pull request.

Reproduce