Intermediate · 4-8 hours
Testing with AI Agents
Generate comprehensive test suites and achieve high coverage using AI coding agents.
Last reviewed Feb 27, 2026
Overview
AI agents are transforming software testing by writing comprehensive test suites, generating edge cases, and maintaining CI/CD pipelines automatically. This cookbook covers how to use AI agents to achieve dramatically better test coverage with far less manual effort. Key tools covered: Claude Code, Cursor, Playwright, GitHub Copilot, Jest Audience: Developers who want better test coverage without spending most of their time writing boilerplate tests.
Test-Driven Agentic Development (TDAD)
Test-Driven Agentic Development is TDD adapted for a world where AI writes the implementation. Instead of fighting AI hallucinations, you harness test constraints to guide the agent precisely.
The Updated Red-Green-Refactor Cycle
- Red — Human writes a failing test that describes the desired behavior
- Green — AI generates the implementation to make the test pass
- Refactor — AI cleans up logic: "Refactor this implementation for clarity and performance, but keep all existing tests green"
- Validate — Human reviews the output, approves or iterates
Why TDD Makes AI Coding Better
- Tests act as prompts. A well-written test is a precise specification that reduces ambiguity and hallucination.
- Builds confidence. Green tests give you evidence the AI did the right thing, not just plausible-looking code.
- Reduces scope creep. Tight test scopes prevent AI from over-engineering solutions.
- Catches regressions instantly. Refactors are safe because the test suite acts as a guardrail.
TDD Plan Generator Prompt Template
You are a senior software engineer practicing test-driven development.
Given the following feature description:
[FEATURE DESCRIPTION]
Generate a comprehensive TDD test plan that includes:
1. Unit tests for all core functions and edge cases
2. Integration tests for component interactions
3. E2E tests for critical user flows
4. Error/failure scenario tests
5. Performance boundary tests
For each test, provide:
- Test name (descriptive, behavior-focused)
- Input conditions
- Expected output/behavior
- Why this test matters
Do NOT write the implementation — only the test plan.
Tips for TDAD
- Start with high-value behaviors — focus on business logic, not utility functions
- Write descriptive test names —
it('should reject payments over the daily limit')notit('test payment') - Keep scopes tight — one behavior per test, no more
- Separate agents — use one agent for writing tests, a different session for implementation; they shouldn't share context
- Commit tests before implementation — this enforces the discipline and makes PRs reviewable
Automated Test Generation with Claude Code
Claude Code can set up an entire testing project from a single prompt, including framework configuration, test structure, and CI integration.
Setting Up a Playwright + Cucumber Project
Set up a complete Playwright + Cucumber BDD testing project for a Next.js e-commerce app.
Include:
- Playwright config with multiple browsers (Chrome, Firefox, Safari)
- Cucumber feature files directory structure
- Step definitions for common e-commerce flows
- Page Object Model pattern
- Test data fixtures
- NPM scripts for running tests
- GitHub Actions CI configuration
The app has: product listing, product detail, cart, checkout, and user auth flows.
Generate Tests for Specific Features with Edge Cases
Generate comprehensive tests for the checkout flow including:
- Happy path: successful purchase
- Empty cart handling
- Invalid payment card formats
- Expired card handling
- Network timeout during payment
- Concurrent session handling
- Price calculation with discounts + tax
- Address validation failures
- Stock exhaustion between cart add and checkout
Run the Full Quality Pipeline
Run the full quality pipeline:
1. Execute all tests and report coverage
2. Identify any coverage gaps below 80%
3. Run security scan with npm audit
4. Generate a summary report with pass/fail, coverage %, and any vulnerabilities
CLAUDE.md Template for Testing Standards
Place this in your repo root so Claude Code follows your testing standards automatically:
# Testing Standards
## Framework
- Unit/Integration: Jest + Testing Library
- E2E: Playwright
- API: Supertest
## Coverage Requirements
- Minimum 80% line coverage
- 100% coverage for payment and auth flows
- All new features require tests before merge
## Test Organization
- Unit tests: co-located with source files (*.test.ts)
- Integration tests: tests/integration/
- E2E tests: tests/e2e/
## Naming Conventions
- Use behavior descriptions: should [verb] when [condition]
- Group by feature with describe blocks
## Forbidden Patterns
- No test('it works') — be descriptive
- No .only in committed code
- No hardcoded test data — use factories
Playwright Agents (Planner / Generator / Healer)
Playwright now supports an agentic loop where three specialized agents collaborate on test creation and maintenance.
Setup
npx playwright init-agents --loop=claude
This creates three agent definition files in .playwright/agents/.
Planner Agent
The Planner explores the running application and produces a structured Markdown test plan.
- Navigates the app autonomously
- Identifies user flows, form interactions, and navigation paths
- Produces a
test-plan.mdwith scenarios, preconditions, and expected outcomes - Flags areas of complexity or risk
Customizing the Planner (
planner-agent.md):
You are a QA architect. When exploring the app:
- Focus on user-facing flows, not internal implementation
- Identify at least 3 edge cases per major flow
- Note any accessibility issues you observe
- Prioritize tests by business impact
Generator Agent
The Generator transforms the Markdown test plan into working Playwright tests.
- Reads
test-plan.mdand generates.spec.tsfiles - Verifies selectors by querying the live DOM
- Uses
getByRole,getByLabel,getByText(resilient selectors first) - Falls back to
data-testidattributes when semantic selectors aren't available - Adds assertions for visual state, network responses, and accessibility
Healer Agent
The Healer automatically fixes broken tests when UI changes break selectors.
- Monitors CI for test failures
- Identifies root cause: selector changed, flow changed, or genuine bug
- For selector changes: updates the test with correct selector
- For flow changes: rewrites the affected steps
- For genuine bugs: opens a GitHub issue and leaves the test failing
- Submits a PR with the fix for human review All three agents are defined as plain Markdown files — no code required. Fully customizable to your project's conventions.
Visual Regression Testing
Visual regression testing catches UI changes that functional tests miss — layout shifts, color changes, font rendering, and component drift.
| Tool | Accuracy | Key Feature | Pricing |
|---|---|---|---|
| Applitools | 99.9999% | Visual AI, cross-browser | Enterprise |
| Percy | High | GitHub/CI native | Free tier available |
| Chromatic | High | Storybook native | Free tier available |
| Lost Pixel | Good | Open source | Free |
| Applitools reports 99.9999% accuracy with a claimed 10x speedup in visual testing using their Visual AI model trained on millions of screenshots. |
// playwright.config.ts with Applitools
import { defineConfig } from '@playwright/test';
import { getAIConfig } from '@applitools/eyes-playwright';
export default defineConfig({
use: {
...getAIConfig({
apiKey: process.env.APPLITOOLS_API_KEY,
appName: 'My App',
batchName: 'CI Run'
})
}
});
Property-Based Testing & Fuzzing
Property-based testing generates hundreds of random inputs to find edge cases you'd never think to write manually. AI supercharges this by helping you define properties and interpret failures.
Tools
- fast-check — JavaScript property-based testing
- MLCheck — ML-specific property testing
- TensorFuzz — Neural network fuzzing
- ART (Adversarial Robustness Toolbox) — AI model robustness testing
AI Prompting Pattern for Property Tests
For the function calculateShippingCost(weight, distance, expedited),
identify 5-7 invariant properties that should ALWAYS be true,
regardless of input values. Then write fast-check property tests for each.
Example properties:
- Expedited should never cost less than standard
- Cost should increase monotonically with weight
- Cost should never be negative
When to Use Each Approach
| Approach | Best For |
|---|---|
| Example-based tests | Known business rules, specific user flows |
| Property-based tests | Pure functions, data transformations, mathematical invariants |
| Fuzzing | Security-sensitive inputs, parsers, file processors |
Agentic CI/CD Pipeline
Agentic CI/CD replaces rigid pipeline scripts with intelligent agents that make decisions based on code context.
The Five-Agent Pipeline
- Code Analysis Agent — Reads the diff, identifies changed modules, assesses risk level
- Test Selection Agent — Selects the minimal test set that covers changed code
- Execution Agent — Runs selected tests, collects results, identifies flaky tests
- Quality Decision Agent — Decides pass/fail based on coverage thresholds and risk level
- Adaptive Pipeline Agent — Updates pipeline configuration based on patterns
Results
Teams implementing agentic CI/CD pipelines have reported:
- 78% reduction in deployment time by eliminating unnecessary full test suite runs
- 3x faster delivery cycles through intelligent test selection
GitHub Copilot + Actions: Generate CI Workflows from Natural Language
GitHub Copilot can generate complete GitHub Actions workflows from a description:
Generate a GitHub Actions workflow for a Node.js + PostgreSQL app that:
- Runs on push to main and all PRs
- Caches node_modules between runs
- Runs ESLint, then Jest with coverage, then builds Docker image
- Only deploys to production on main branch
- Uses environment secrets for DATABASE_URL and DOCKER_REGISTRY
- Sends Slack notification on failure
Real Case Study: Elastic's Self-Healing CI
Elastic deployed Claude as a CI assistant to automatically fix broken PRs:
- 24 broken PRs fixed automatically in one month
- 20 engineer-days saved — time engineers would have spent debugging CI failures
- Most fixes involved: updated snapshots, changed import paths, API signature updates
The Self-Healing Prompt
You are a CI debugging assistant. A test suite is failing.
Failing tests:
[PASTE FAILING TEST OUTPUT]
Recent changes to the codebase:
[PASTE GIT DIFF]
Your task:
1. Identify the root cause of each failure
2. Determine if this is a test issue (test needs updating) or a code bug
3. For test issues: provide the exact fix
4. For code bugs: explain the bug but DO NOT fix it — flag for human review
5. Output your changes as a unified diff
Critical: Do not change test assertions to make tests pass.
Only fix tests when the underlying behavior intentionally changed.
Common Pitfalls
Generating tests after the fact — AI-generated tests written after implementation often just verify the current (possibly buggy) behavior. TDD is vastly better: write tests first, let AI write the implementation. Not reviewing AI-generated tests — AI tests can pass while testing the wrong thing. Always read them. Look for tautological tests like
expect(add(1,1)).toBe(add(1,1)). Brittle selectors in E2E tests — AI tends to generatepage.locator('#submit-btn-v2'). Enforce resilient selectors:getByRole('button', { name: 'Submit' }). Not running tests in CI — Tests that only run locally don't prevent broken deployments. Every test suite needs a CI job.
Further Reading
Related tools
MCP servers used
- No linked MCP servers yet.
Related cookbooks
AI-Powered Code Review & Quality
Automate code review and enforce quality standards using AI-powered tools and agentic workflows.
Building AI-Powered Applications
Build applications powered by LLMs, RAG, and AI agents using Claude Code, Cursor, and modern AI frameworks.
Building APIs & Backends with AI Agents
Design and build robust APIs and backend services with AI coding agents, from REST to GraphQL.