← Back to writing

Context-Behavior-Constraint: Acceptance Criteria That Actually Work

·5 min read

A colleague introduced me to Gherkin (Given/When/Then) and I liked the idea but hated the execution. Too verbose. No place for edge cases. Mixes setup with inputs.

So I made my own format. I call it Context-Behavior-Constraint (CBC).

testing structure

The Problem with Common Formats

FormatWhat's Wrong
ProseAmbiguous, buries edge cases, hard for LLMs to parse
GherkinVerbose, no constraint section, "should" language
ChecklistsNo structure, can't tell setup from outcome
User stories"As a... I want..." is motivation, not behavior

The insight: tests have a natural structure — Arrange-Act-Assert. Acceptance criteria should mirror that.

The Format

**context**
- actor: who/what triggers this behavior
- preconditions: state that must exist before (test fixtures)
- inputs: function arguments or request parameters

**behavior**
- when: the trigger action (function call)
- then: primary expected outcome
- and: additional outcomes (each becomes an assertion)

**constraints**
- edge case: boundary condition → expected handling
- non-goal: explicitly out of scope for this story

That's it. Every field maps directly to test code.

Field → Test Code Mapping

FieldPurposeTest Code
actorWho performs the actionTest class name
preconditionsState before testsetUp(), fixtures
inputsValues passed inFunction arguments
whenThe actionFunction call
thenPrimary assertionassert
andAdditional assertionsMore asserts
edge caseBoundary conditionsSeparate test methods
non-goalScope boundariesWhat NOT to test

Real Example

Here's an actual AC from a blend scoring feature:

### 1. Direct Signal Match

**context**
- actor: blend service
- preconditions: dan has SIGNALS edge to franklins with strength 90
- inputs: collection_id="austin_bbq", andee_id="dan"

**behavior**
- when: blend_collection(collection_id, andee_id)
- then: franklins.score = 90
- and: franklins.score_breakdown.direct_signal = 90
- and: franklins.reasons includes "You love this (90)"

**constraints**
- edge case: direct signal is negative → item ranks lowest
- non-goal: blending items dan created (always show those)

And the test it generates:

class TestBlendDirectSignalMatch:
    """AC 1: Direct Signal Match"""

    @pytest.fixture
    def setup_dan_signals(self, neo4j_service):
        """preconditions: dan has SIGNALS edge to franklins"""
        neo4j_service.create_signals_edge(
            andee_id="dan",
            content_id="franklins",
            strength=90
        )

    def test_direct_signal_match(self, setup_dan_signals, blend_service):
        """when: blend_collection → then: franklins.score = 90"""
        # inputs
        collection_id = "austin_bbq"
        andee_id = "dan"

        # when
        result = blend_service.blend_collection(collection_id, andee_id)

        # then
        assert result["franklins"].score == 90

        # and
        assert result["franklins"].score_breakdown.direct_signal == 90
        assert "You love this (90)" in result["franklins"].reasons

    def test_negative_signal_ranks_lowest(self, neo4j_service, blend_service):
        """edge case: direct signal is negative → item ranks lowest"""
        neo4j_service.create_signals_edge("dan", "bad_place", strength=-80)

        result = blend_service.blend_collection("austin_bbq", "dan")

        assert result["bad_place"].score == min(r.score for r in result.values())

Notice how the test basically writes itself. Each AC field has a home in the test.

Why Constraints Matter

The constraints section is the secret weapon. It captures two things:

Edge cases — boundary conditions that need their own tests. Without this, edge cases get buried in prose or forgotten entirely.

Non-goals — what's explicitly out of scope. This prevents scope creep. When someone asks "should we also handle X?" you can point to the non-goal and say "not in this story."

Why This Works for AI Coding

When I'm coding with Claude, the CBC format makes AC trivially parseable:

context = parse_section("context")
preconditions = context["preconditions"]  # Direct access

With prose, Claude has to infer: "Before the test runs, we need to... which means..."

Inference = drift. Explicit labels = precision.

This ties into my test-first enforcement — plans must include test files, and CBC makes those tests obvious to generate.

Why Not Gherkin?

I tried Gherkin. Here's the same behavior:

Given dan has a SIGNALS edge to franklins with strength 90
And the collection "austin_bbq" contains franklins
When I call blend_collection with collection_id="austin_bbq" and andee_id="dan"
Then franklins.score should be 90
And franklins.score_breakdown.direct_signal should be 90
And franklins.reasons should include "You love this (90)"

Problems:

  • Every line needs Given/When/Then prefix (verbose)
  • No separation between preconditions and inputs
  • No edge case section
  • "should be" is ambiguous (MUST vs SHOULD?)
  • Where do non-goals go?

Gherkin is fine for documentation. It's not great for test generation.

Template

Here's what I use for stories:

# X.Y Story Title

> **Priority:** P0/P1/P2

## Story

As [actor], I [action] so that [outcome].

## Why

| Without This | With This |
|--------------|-----------|
| Problem 1 | Solution 1 |

---

## Acceptance Criteria

### 1. First Behavior

**context**
- actor: ...
- preconditions: ...
- inputs: ...

**behavior**
- when: ...
- then: ...
- and: ...

**constraints**
- edge case: ...
- non-goal: ...

---

### 2. Second Behavior

(repeat)

---

## Definition of Done

- [ ] Tests cover all acceptance criteria
- [ ] Edge cases have dedicated tests

When to Use This

Use CBC for:

  • User stories that need tests
  • Bug reports with reproduction steps
  • Any behavior spec an LLM will implement

Don't use CBC for:

  • Design exploration (use prose)
  • API docs (use OpenAPI)
  • Architecture decisions (use ADRs)

The Tradeoff

CBC is more rigid than prose. Simple stories still need the full structure. But the payoff is:

  • Claude can generate tests directly from AC
  • Reviewers can scan quickly (labels as anchors)
  • Edge cases are explicit, not buried
  • Non-goals prevent "while we're at it..." creep

For my AI coding workflow, that tradeoff is worth it every time.


Thanks to a colleague who introduced me to Gherkin. The frustration led to something better.