タグ
説明
- name:writing-quality-tests
- description:Guide for writing robust, high-signal automated tests. Use when asking "How do I test this effectively?", fixing flaky tests, refactoring test suites, or deciding between unit/integration/E2E strategies. distinct from TDD (process); this skill focuses on test quality and architecture.
- license:Complete terms in LICENSE.txt
- metadata:[object Object]
Writing Quality Tests
Overview
High-signal tests prove behavior, not implementation. While TDD focuses on the process of writing tests first, this skill focuses on the artifact—making tests stable, explicit, and valuable long-term.
Core rule: If a test is nondeterministic or tied to internals, it is debt. Fix it.
When to Use
- New features: "I need to add tests for this new API endpoint/component."
- Bug fixes: "Help me write a regression test for this bug before fixing it."
- Flaky tests: "This test fails randomly on CI. How do I make it deterministic?"
- Refactoring: "I want to refactor this legacy code but the tests are brittle. How do I improve them first?"
- Slow tests: "The test suite takes too long. How can I speed it up or mock dependencies effectively?"
- Test Design: "Should I use a unit test or integration test for this logic?"
- Review: "Check these tests for maintainability, coverage, and clarity."
- Not for manual exploratory testing or load/perf-only work.
Non-Negotiables
- Deterministic: same input -> same result; no hidden time/network randomness
- Behavioral oracles: assertions map to business behavior or contracts, never incidental internals
- Minimal coupling: tests fail for product changes, not helper refactors
- Focused scope: one behavior per test; isolated fixtures; clear names
- Fast feedback: prefer fast layers; cache expensive setup; parallelize safely
Workflow
- Prove it fails: capture the regression input or wished-for case and watch the test fail first (or reproduce the bug) before code changes.
- Clarify behavior: preconditions, action, postconditions, invariants. Capture regression input if fixing a bug.
- Pick level: unit for pure logic; contract for external calls; integration for seams; E2E only to prove flows or contracts end-to-end.
- Design oracle: assert outputs, state, events, and invariants; avoid implementation details or transient UI.
- Shape fixtures: use builders/factories; avoid globals; randomize with seeds only when helpful and log the seed.
- Write the test: AAA (arrange-act-assert) or GWT; table-driven for variants; property-based for algebraic invariants.
- Validate: run focused test first, then suite. If flaky, hunt nondeterminism (time, randomness, order, network) and remove it.
- Document intent: name states behavior; failure message points to the expected contract.
Patterns to Prefer
- Boundary and mutation pairs: min/max/empty/null plus one mutated variation to prove invariants.
- Table-driven cases: enumerate input/output pairs to avoid duplicate tests and improve diffability.
- Property-based checks: algebraic properties (idempotence, reversibility, ordering), round-trips, monotonic counters.
- Contracts at seams: mock at boundaries you own; for third-party calls, pin to contract tests or recorded fixtures.
- Guarded goldens: only for complex structured output; require explicit review of golden updates.
Coverage Strategy
- Coverage is opt-in: never run coverage unless explicitly requested by the user in the current session (e.g., "improve coverage on file X to Y%"). PM/teammate/CI pressure does not override this rule.
- Pyramid discipline: many unit tests, fewer integration, very few E2E. Use E2E to prove cross-service flow or UI contract.
- Change-based coverage: every test should fail without the code change and pass with it; capture the regression input/output.
- Critical paths first: auth, billing, migrations, data loss, irreversible actions. Add invariants that must never be violated.
- Data and time: cover time zones, DST, leap years, ordering, pagination, idempotency, and retry semantics.
- Observability: log seeds for randomized tests; emit diagnostics on failure (inputs, seed, environment versions).
Example (explicit coverage request): User: "improve coverage on file X to 80%". Run targeted coverage for that file only, add behavior-driven tests to hit missing branches, and avoid coverage runs outside that request.
pytest --cov=path/to/file.py --cov-report=term-missing
Flake Prevention
- Remove time races: replace sleeps with waits on explicit conditions; freeze or inject clocks.
- Isolate state: fresh fixtures per test; unique temp dirs/ports; clean databases; no shared singletons.
- Control randomness: seed RNG, capture seed in failure output, prefer deterministic builders.
- Network and IO: stub external calls; if unavoidable, record/replay; set tight timeouts and retries with jitter disabled in tests.
- Parallel safety: ensure fixtures are parallel-safe or mark tests serial; avoid global mutable state.
Review Checklist
- Name states behavior and level (e.g., "adds item to cart (integration)").
- Single reason to fail; assertions map to user-visible behavior or contract.
- Fixtures minimal and local; builders hide irrelevant details; no shared hidden state.
- Negative and edge cases present; regression case for the original bug captured.
- Tests run quickly; slow/expensive flows justified and focused.
Hygiene (adaptable patterns)
- Structure: Given–When–Then or AAA so intent is obvious.
- Hypothesis: fix generators or code instead of suppressing health checks; log seeds for repro.
- Async correctness: use real async paths/fakes; don’t hide missing awaits with sync doubles.
- Assertion scope: assert behavior/contract fields; avoid brittle full-payload snapshots unless testing a contract.
- Coverage as health, not blocker: focus on low-coverage behavior-heavy files; be pragmatic with legacy or infra-heavy areas.
Marks (for selective runs)
- unit: isolated logic with external deps mocked
- contract/integration: cross-component seams with real wiring or adapters
- async: true async paths; avoid sync fakes masking awaits
- property: Hypothesis-based invariants in dedicated property files
- slow: >1s or real infra; justify and keep focused
Common Anti-Patterns
- Brittle UI or text snapshots without intent; prefer semantic assertions or scoped snapshots.
- Over-mocking internals; mocking within the module under test; asserting call order that is not part of the contract.
- Sleep-based waits; reliance on wall-clock time; unseeded randomness.
- Combined scenarios covering multiple behaviors in one test; global fixtures that hide setup.
- Golden files updated blindly; tests that assert logging implementation rather than outcomes.
- Running coverage by default instead of waiting for explicit coverage requests.
Red Flags - Stop and Fix
- Tests pass or fail intermittently
- Assertions tied to private methods or call order instead of observable behavior
- Unseeded randomness, sleeps instead of explicit waits, or shared mutable fixtures
- Golden updates accepted without review of intent
- A test never failed before the code change
- Running coverage without the user explicitly asking
- Running coverage due to PM/teammate/CI pressure
関連 Skill
興味のある他の Skill を探索
quality-gates
Python code quality gates script for linting, type checking, complexity analysis, and testing before commits. Use when enforcing consistent code quality across Python projects with fast (unit-tests) or comprehensive (all-tests) modes.
verification-before-completion
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.