AI Unit Test Generation That Survives Refactors
AI RNG: Practical Systems That Ship
Unit tests are supposed to make change safe. Yet many teams experience the opposite: refactors become painful because tests break for reasons unrelated to behavior. The suite becomes a second codebase, brittle and expensive, and the team starts treating tests like obstacles instead of protection.
The difference is not whether you write unit tests. The difference is what your tests attach to.
Refactor-resistant unit tests attach to contracts: observable behavior, invariants, and public interfaces. Brittle unit tests attach to implementation details: private methods, internal data layouts, incidental ordering, and temporary variables.
AI can speed up the writing, but correctness comes from how you define the contract and how you choose your assertions.
The contract-first mindset
Before generating any tests, write down what must remain true even if the internal design changes.
A contract can be:
- Input to output mapping for a pure function.
- Validation rules: what inputs are rejected and why.
- Invariants: properties that always hold.
- Error behavior: specific exceptions or error results.
- Side effects at an interface boundary: calls made, events emitted, data stored.
If a test does not express one of these, it is likely testing the implementation, not the contract.
A practical taxonomy of unit tests
Different test styles survive refactors at different rates.
| Test style | What it asserts | Refactor resilience | When it shines |
|---|---|---|---|
| Contract examples | specific input-output examples | High | stable business rules and parsing |
| Property checks | invariants across many inputs | High | transformations and math-like logic |
| State transitions | before and after conditions | Medium to high | reducers and domain models |
| Interaction checks | calls made to collaborators | Medium | orchestration where interaction is the contract |
| Snapshot or golden master | output matches stored baseline | Medium | stabilizing legacy behavior, with care |
| Internal structure checks | private fields or orderings | Low | almost always a trap |
The goal is not to avoid interaction checks entirely. The goal is to use them where the interaction is part of the contract, not where it is a convenience of the current design.
Mocking: the part that breaks most test suites
Many brittle unit tests are brittle because of mocking choices. Mocks are powerful, but they can turn tests into reenactments of the implementation.
A good rule is to mock boundaries, not details.
Mock candidates:
- External services
- Databases at a repository interface
- Clocks and random IDs
- Network calls
- File system access
Bad mock candidates:
- Internal helper classes that are likely to be refactored
- Pure functions that can be tested directly
- Collections and data structures that are incidental
When in doubt, ask: would the behavior still be meaningful if the implementation changed? If yes, the test is likely attached to the contract. If no, the test is attached to the current design.
How AI helps you write better unit tests
AI is most effective when you constrain it with the contract you want, not the code you currently have.
Good inputs for AI:
- A short description of the intended behavior.
- A list of edge cases you already know.
- The public interface signature.
- The error conditions and messages that matter.
- A few representative examples.
Useful asks:
- Propose a set of test cases that cover happy path, edge cases, and error conditions.
- For each test case, state the contract it verifies in one sentence.
- Suggest assertions that do not depend on internal implementation.
- Identify where mocks are appropriate and where real objects are better.
Risky asks:
- “Write unit tests for this file” without stating the contract.
- “Maximize coverage” without stating what behavior matters.
- “Mock everything” as a default.
When AI outputs tests, read them like a reviewer: do these tests verify behavior, or do they verify the current shape of the code?
Designing tests that survive refactors
Prefer stable interfaces and stable signals
If your function returns a domain object, assert on domain-relevant fields, not incidental serialization order. If your method emits events, assert on the event type and key attributes, not the exact formatting unless formatting is part of the contract.
Build helpers that represent domain intent
Instead of constructing fragile objects inline, create small builders or fixtures that reflect domain meaning. This reduces noise and keeps tests expressive. If the object shape changes, you update the builder once.
Use table-driven tests for rule-heavy logic
Rule systems are ideal for table-driven tests: inputs and expected outputs listed in a compact form. This keeps tests readable and makes it easy to add new cases.
Use invariants when examples are not enough
Some behavior is best expressed as a property:
- Idempotence: applying twice equals applying once.
- Round-trip: parse then format preserves meaning.
- Monotonicity: increasing input should not decrease output.
- Bounds: outputs stay within defined ranges.
Properties are often more stable than examples because they describe the heart of the behavior rather than one instance.
Avoid asserting on incidental order and timing
If order does not matter, do not assert order. If timing is not part of the contract, remove it from tests. These are common sources of false failures and wasted time.
A refactor-resilient unit test checklist
- Each test can be explained as a contract statement.
- Assertions depend on public behavior, not internals.
- Test data is minimal and meaningful.
- The test name describes intent, not implementation.
- Mocks exist only where the contract is interaction.
- The suite is deterministic and stable.
- Failures point to behavior changes, not incidental rewrites.
Turning legacy code into testable code without drama
Some code is hard to test because it mixes concerns. In those cases, you can still move forward safely:
- Start with characterization tests that capture current behavior at the boundary.
- Refactor in small steps while keeping the characterization tests passing.
- Introduce seams: extract pure functions, isolate IO, separate parsing from effects.
- Gradually replace characterization tests with contract-focused tests.
This approach makes refactors possible without breaking behavior, and it allows your test suite to become healthier over time.
Keep Exploring AI Systems for Engineering Outcomes
AI Debugging Workflow for Real Bugs
https://orderandmeaning.com/ai-debugging-workflow-for-real-bugs/
How to Turn a Bug Report into a Minimal Reproduction
https://orderandmeaning.com/how-to-turn-a-bug-report-into-a-minimal-reproduction/
Root Cause Analysis with AI: Evidence, Not Guessing
https://orderandmeaning.com/root-cause-analysis-with-ai-evidence-not-guessing/
Integration Tests with AI: Choosing the Right Boundaries
https://orderandmeaning.com/integration-tests-with-ai-choosing-the-right-boundaries/