Turn a user-reported bug into a regression test.
When a customer hits a bug, your team can reproduce it and prove the fix. StepStitch never records their screen, their keystrokes, or their data.
import { test, expect } from '@playwright/test';
// StepStitch autogenerated reproduction (trace: trc_9f4c1ae2)
// Replayability: 0.76 (grade B)
// ⚠ templated_route_needs_fixture [step 1]: substitute a concrete id.
test('StepStitch reproduction', async ({ page }) => {
// TODO: authenticate as a synthetic test user if the flow requires it.
await page.goto('/accounts/:id');
await page.goto('/accounts/:id/transfer');
await page.locator('[data-testid=payee-select]').click();
await page.locator('[data-testid=amount-input]').click();
// Match the request by route + method so it resolves whether or not the
// bug is present, then assert on status. Red while broken, green once fixed.
const response0 = page.waitForResponse(
(r) => r.url().includes('/api/accounts/') && r.request().method() === 'POST',
);
await page.locator('[data-testid=review-transfer]').click();
// expected API failure: /api/accounts/:id/transfers (HTTP 500)
const res0 = await response0;
expect(res0.status(), 'no server error from /api/accounts/:id/transfers').toBeLessThan(500);
});In plain words
A customer hits a problem
Something breaks while they are using your app, like a payment that will not go through.
StepStitch records the steps, not the screen
It captures what they clicked and what failed. Never their screen, their typing, or any personal data.
Your team reproduces it and proves the fix
In one click it becomes a test that fails on the bug and passes once it is fixed, so it stays fixed.
Most engineering teams do not need another recording to watch. They need a user-reported bug that can become a regression test.
Session replay is a security camera
It records the screen, inputs, and PII. Useful, until an auditor asks why customer data left the building.
Error tracking is a crash sensor
It tells you where the code broke, but not the steps the user took to break it.
StepStitch is a flight recorder
It keeps the structural steps, no screens or values, and replays them as a test you can run.
Session replay
Captures the screen, then asks an engineer to watch it back.
- Records pixels, text, and input values by default
- Carries PII into a third-party tool you do not control
- Often banned outright in regulated environments
- Leaves you with a video, not a fix
StepStitch
Captures the structure of what broke, then compiles a test.
- Route templates, stable selectors, API status codes
- Scrubbed in the browser and again on the server
- Self-hosted, so the data never leaves your boundary
- Leaves you with a runnable Playwright reproduction
From one report to a merged fix
StepStitch perceives, scores, compiles, and drafts. It never plans or acts on its own. The autonomy stays in your stack.
Perceive
A user reports a bug. StepStitch stores a scrubbed, structural trace.
list_recent_traces
Score
A deterministic 0 to 1 score and an A to F grade say if it reproduces.
get_replayability_score
Reproduce
Fetch a deterministic Playwright test built from the trace. Text only.
generate_playwright_repro
Verify
Run it in your CI or sandbox. Red turns green once the fix lands.
get_verifications
Fix, human-gated
Open a pull request with the regression test. A reviewer merges, never the agent.
github_bridge
One report, two views
The same moment, from both sides
Your user keeps their screen, their inputs, and their data. Your engineers get structure, a score, and a runnable test. Step through the whole workflow.
A user hits a 500 on a transfer.
Transfer · review
Awaiting a report. Capture is off until consent.
Live demo
See exactly what your team gets
A real example, live from a running StepStitch service. Click through the tabs to follow what happened, how reproducible it is, what was kept private, and the test it wrote automatically.
Not session replay, not error tracking
Those tools tell you something broke. StepStitch hands you a test that proves it, with nothing sensitive leaving your boundary. Even the open-source replay tools still record the screen.
| Capability | Session replayFullStory, LogRocket | OpenReplayOpen-source replay | APM and errorsSentry, Datadog | StepStitchIssue-to-repro |
|---|---|---|---|---|
| Captures screens, page text, input values | By default | Records DOM | Often | Never |
| PII risk in the tool | High | Medium | Medium | Nothing sensitive captured |
| Proves the bug is reproducible | No | No | No | 0 to 1 score, A to F grade |
| Output is a regression test | A video | Exports a script | A stack trace | Asserting Playwright test |
| Self-hosted and auditable | SaaS only | Open source | SaaS only | Apache-2.0, self-host |
| Native to agent networks | No | No | No | MCP, 8 read-only tools |
What ships today
A capability surface, not a roadmap
Every piece below is in the open-source repository, backed by a named test. Nothing here is a promise.
Two-layer privacy boundary
The SDK redacts in the page, but the backend never trusts the client. Every trace is scrubbed again on the server before it is stored. Defense in depth, proven by a named test.
Replayability score
A deterministic 0 to 1 score with an A to F grade and warnings. Decide if a bug reproduces before anyone opens an editor.
Deployment profiles
A profile can only tighten the privacy boundary, never loosen it.
- financial-services-enterprise
- healthcare-strict
- internal-enterprise
- open-source-default
Drafts into your system of record
Flat, sanitized drafts. Draft-only, never an autonomous write.
Deterministic compiler
The same trace always compiles the same Playwright test. Text only, never run against production.
Repair loop and verified-fix corpus
A trace becomes a labeled GitHub issue and a regression-test pull request. A reviewer merges, never the agent. Only a pre-fail to post-pass transition is recorded as confirmed fixed.
Observability and kill switch
A zero-dependency Prometheus endpoint, audited reads, and an org-wide kill switch that fails safe on error.
Bring your own agentic network
StepStitch is a capability provider, not an agent orchestrator. One MCP server surfaces eight read-only and draft tools. Any agent network consumes them. The autonomy lives in your stack.
Eight Copilot-safe tools
list_recent_tracesget_trace_summaryget_replayability_scoreget_privacy_postureget_diagnostic_summarygenerate_playwright_reprocreate_export_previewcreate_fs_export_previewWorks with any MCP client
The same contract is surfaced three ways: an MCP server, an OpenAPI connector for Copilot Studio, and function specs for tool-calling models.
Built to be audited, not just trusted
The privacy boundary is open source. Your reviewers can read exactly what is captured and what is dropped, line by line, before anything is deployed.
The trust boundary is the code
Every component is Apache-2.0: the SDK, the service core, the MCP connector, and the adapters. Built for regulated and quality-focused teams that self-host.
Mapped to the regulations your reviewers cite
SEC Reg S-P (2024)
Safeguards and recordkeeping. Incident records retained five years.
2026 interagency MRM guidance
Auditability, ongoing monitoring, and human oversight of model use.
NIST AI RMF
Data governance, documentation, accountability, incident response.
Questions, answered
The things technical and compliance reviewers ask first.
Is StepStitch session replay?
No. It captures the structure of what broke (route templates, stable selectors, API status codes), never screens, input values, page text, or raw URLs. It is issue-to-repro infrastructure, not session replay.
Is the generated test a real regression test?
Yes. A captured API failure becomes an armed page.waitForResponse plus a status assertion; a captured client exception becomes a pageerror assertion. The test fails while the bug is present and passes once it is fixed, so it is safe to keep in CI as a regression guard.
How long does self-hosting take?
Minutes. The service ships as a Docker image with a one-command Railway deploy; the SDK is an npm install with zero runtime dependencies. See the Self-host guide.
Is it compatible with HIPAA / SEC Reg S-P?
StepStitch is self-hosted and never captures PII, so customer data never leaves your boundary. The healthcare-strict profile disables free text entirely; the financial-services profile scrubs and drops forbidden keys. See the Security page for the full crosswalk.
What frameworks does it work with?
Any web frontend. The SDK is framework-agnostic TypeScript that records structural footsteps; the compiled reproduction is standard Playwright.
Is StepStitch open source?
Yes, Apache-2.0 across the SDK, service core, MCP connector, and adapters. You can read exactly what is captured and what is dropped before deploying.
Book a pilot
Self-host the open-source core today, or talk to us about a managed pilot with white-glove integration and a compliance packet for your reviewers.