Last week's recorded outcomes become this week's targets

The third deficiency becomes the input to the first.

Once outcomes are recorded - per change, per test, per agent - the data identifies the next thing worth fixing. Iteration is the loop that turns the recorded outcome from deficiency #3 into the next test selection in deficiency #1.

The problem - in Mark Walker's words
"We and every other software company in the world are outstripping our ability to test what we're building."

Why now: the velocity of agentic coding has decoupled from the velocity of testing, auditing, and validation - the knowledge and proof that AI agents did what they were tasked to perform, i.e. testing, in this case. An AI agent can produce more code in a day than a team used to write in a sprint. The test, audit, and compliance layers did not get faster at the same rate. The gap is structural and widens with every model release.

Three deficiencies - in every company today - that no software addresses:

  • determining which tests need to run for a particular release
  • checking whether they ran
  • recording the outcome

Mark Walker, nue.io - meeting transcript [00:46:36]

Iteration uses the recorded outcomes from previous runs to identify weaknesses. The third action becomes the input to the next first action.

Recorded outcomes feed the next iteration

Once outcomes are recorded - per change, per test, per agent - the data identifies what to fix next. Slow query? The recorded latency points to the source. Flaky test? The flake history points to the cause. Drift in a metric? Continuous probes opened a tracked issue.

Iteration is the loop that turns deficiency 3 (the recorded outcome) into the input to deficiency 1 (test selection for the improvement).

Probes that open tracked tasks

Probes that run today: test pass/fail rate, test flake rate, dependency drift. Cyclomatic complexity, bundle size, p95 latency, and model-output drift probes are part of the OBSERVE phase of the Universal Quality Development Harness per ADR-320. Probes exceeding thresholds open an iteration task with the metric, the threshold breached, and the suggested scope.

An engineer or agent picks up the iteration task and runs it through the standard pipeline: SDD if it crosses components, test selection, execution, recorded outcome.

Before-and-after, signed

Each iteration produces a before-and-after metric pair signed into the audit trail. The improvement claim is verifiable; reviewers see the magnitude, not just the direction.

The cumulative iteration log is the engineering health record over time. It defends 'we are getting better' with numbers a regulator can examine.