Skip to content
Open source, Apache-2.0

Turn a user-reported bug into a regression test.

When a customer hits a bug, your team can reproduce it and prove the fix. StepStitch never records their screen, their keystrokes, or their data.

generate_playwright_reproReplayability A
import { test, expect } from '@playwright/test';

// StepStitch autogenerated reproduction (trace: trc_9f4c1ae2)
// Replayability: 0.76 (grade B)
//   ⚠ templated_route_needs_fixture [step 1]: substitute a concrete id.
test('StepStitch reproduction', async ({ page }) => {
  // TODO: authenticate as a synthetic test user if the flow requires it.

  await page.goto('/accounts/:id');
  await page.goto('/accounts/:id/transfer');
  await page.locator('[data-testid=payee-select]').click();
  await page.locator('[data-testid=amount-input]').click();

  // Match the request by route + method so it resolves whether or not the
  // bug is present, then assert on status. Red while broken, green once fixed.
  const response0 = page.waitForResponse(
    (r) => r.url().includes('/api/accounts/') && r.request().method() === 'POST',
  );
  await page.locator('[data-testid=review-transfer]').click();
  // expected API failure: /api/accounts/:id/transfers (HTTP 500)
  const res0 = await response0;
  expect(res0.status(), 'no server error from /api/accounts/:id/transfers').toBeLessThan(500);
});
Generated from a scrubbed trace. Runs red, fix turns it green.

In plain words

1

A customer hits a problem

Something breaks while they are using your app, like a payment that will not go through.

2

StepStitch records the steps, not the screen

It captures what they clicked and what failed. Never their screen, their typing, or any personal data.

3

Your team reproduces it and proves the fix

In one click it becomes a test that fails on the bug and passes once it is fixed, so it stays fixed.

Most engineering teams do not need another recording to watch. They need a user-reported bug that can become a regression test.

Session replay is a security camera

It records the screen, inputs, and PII. Useful, until an auditor asks why customer data left the building.

Error tracking is a crash sensor

It tells you where the code broke, but not the steps the user took to break it.

StepStitch is a flight recorder

It keeps the structural steps, no screens or values, and replays them as a test you can run.

Session replay

Captures the screen, then asks an engineer to watch it back.

  • Records pixels, text, and input values by default
  • Carries PII into a third-party tool you do not control
  • Often banned outright in regulated environments
  • Leaves you with a video, not a fix

StepStitch

Captures the structure of what broke, then compiles a test.

  • Route templates, stable selectors, API status codes
  • Scrubbed in the browser and again on the server
  • Self-hosted, so the data never leaves your boundary
  • Leaves you with a runnable Playwright reproduction

From one report to a merged fix

StepStitch perceives, scores, compiles, and drafts. It never plans or acts on its own. The autonomy stays in your stack.

Perceive

A user reports a bug. StepStitch stores a scrubbed, structural trace.

list_recent_traces

Score

A deterministic 0 to 1 score and an A to F grade say if it reproduces.

get_replayability_score

Reproduce

Fetch a deterministic Playwright test built from the trace. Text only.

generate_playwright_repro

Verify

Run it in your CI or sandbox. Red turns green once the fix lands.

get_verifications

Fix, human-gated

Open a pull request with the regression test. A reviewer merges, never the agent.

github_bridge

One report, two views

The same moment, from both sides

Your user keeps their screen, their inputs, and their data. Your engineers get structure, a score, and a runnable test. Step through the whole workflow.

A user hits a 500 on a transfer.

What your user sees

Transfer · review

Something went wrong (500)
What the developer sees

Awaiting a report. Capture is off until consent.

Live demo

See exactly what your team gets

A real example, live from a running StepStitch service. Click through the tabs to follow what happened, how reproducible it is, what was kept private, and the test it wrote automatically.

loading…

Not session replay, not error tracking

Those tools tell you something broke. StepStitch hands you a test that proves it, with nothing sensitive leaving your boundary. Even the open-source replay tools still record the screen.

CapabilitySession replayFullStory, LogRocketOpenReplayOpen-source replayAPM and errorsSentry, DatadogStepStitchIssue-to-repro
Captures screens, page text, input values
By default
Records DOM
Often
Never
PII risk in the tool
High
Medium
Medium
Nothing sensitive captured
Proves the bug is reproducible
No
No
No
0 to 1 score, A to F grade
Output is a regression test
A video
Exports a script
A stack trace
Asserting Playwright test
Self-hosted and auditable
SaaS only
Open source
SaaS only
Apache-2.0, self-host
Native to agent networks
No
No
No
MCP, 8 read-only tools

What ships today

A capability surface, not a roadmap

Every piece below is in the open-source repository, backed by a named test. Nothing here is a promise.

Two-layer privacy boundary

The SDK redacts in the page, but the backend never trusts the client. Every trace is scrubbed again on the server before it is stored. Defense in depth, proven by a named test.

screenshotsinput valuespage textraw URLsrequest bodiescookies & headersSSNs & card numbers

Replayability score

A deterministic 0 to 1 score with an A to F grade and warnings. Decide if a bug reproduces before anyone opens an editor.

Deployment profiles

A profile can only tighten the privacy boundary, never loosen it.

  • financial-services-enterprise
  • healthcare-strict
  • internal-enterprise
  • open-source-default

Drafts into your system of record

Flat, sanitized drafts. Draft-only, never an autonomous write.

ServiceNowSalesforceGenesysJiraZendesk+ DraftAdapter SDK

Deterministic compiler

The same trace always compiles the same Playwright test. Text only, never run against production.

Repair loop and verified-fix corpus

A trace becomes a labeled GitHub issue and a regression-test pull request. A reviewer merges, never the agent. Only a pre-fail to post-pass transition is recorded as confirmed fixed.

pre: failtopost: pass=confirmed_fixed

Observability and kill switch

A zero-dependency Prometheus endpoint, audited reads, and an org-wide kill switch that fails safe on error.

Bring your own agentic network

StepStitch is a capability provider, not an agent orchestrator. One MCP server surfaces eight read-only and draft tools. Any agent network consumes them. The autonomy lives in your stack.

Eight Copilot-safe tools

list_recent_tracesget_trace_summaryget_replayability_scoreget_privacy_postureget_diagnostic_summarygenerate_playwright_reprocreate_export_previewcreate_fs_export_preview
Destructive operations stay off the agent surface. Delete, purge, kill switch, and direct writes are admin-only and human-gated.

Works with any MCP client

The same contract is surfaced three ways: an MCP server, an OpenAPI connector for Copilot Studio, and function specs for tool-calling models.

MMicrosoft Copilot Studio
OOpenAI
CClaude
LLangGraph
AAWS Bedrock
GGoogle Vertex

Built to be audited, not just trusted

The privacy boundary is open source. Your reviewers can read exactly what is captured and what is dropped, line by line, before anything is deployed.

The trust boundary is the code

Every component is Apache-2.0: the SDK, the service core, the MCP connector, and the adapters. Built for regulated and quality-focused teams that self-host.

test_scrubber.pytest_profiles.pytest_golden_path.pytest_repro_eval.py.importlintertest_compliance.py
Read the compliance evidence

Mapped to the regulations your reviewers cite

SEC Reg S-P (2024)

Safeguards and recordkeeping. Incident records retained five years.

2026 interagency MRM guidance

Auditability, ongoing monitoring, and human oversight of model use.

NIST AI RMF

Data governance, documentation, accountability, incident response.

Questions, answered

The things technical and compliance reviewers ask first.

Is StepStitch session replay?

No. It captures the structure of what broke (route templates, stable selectors, API status codes), never screens, input values, page text, or raw URLs. It is issue-to-repro infrastructure, not session replay.

Is the generated test a real regression test?

Yes. A captured API failure becomes an armed page.waitForResponse plus a status assertion; a captured client exception becomes a pageerror assertion. The test fails while the bug is present and passes once it is fixed, so it is safe to keep in CI as a regression guard.

How long does self-hosting take?

Minutes. The service ships as a Docker image with a one-command Railway deploy; the SDK is an npm install with zero runtime dependencies. See the Self-host guide.

Is it compatible with HIPAA / SEC Reg S-P?

StepStitch is self-hosted and never captures PII, so customer data never leaves your boundary. The healthcare-strict profile disables free text entirely; the financial-services profile scrubs and drops forbidden keys. See the Security page for the full crosswalk.

What frameworks does it work with?

Any web frontend. The SDK is framework-agnostic TypeScript that records structural footsteps; the compiled reproduction is standard Playwright.

Is StepStitch open source?

Yes, Apache-2.0 across the SDK, service core, MCP connector, and adapters. You can read exactly what is captured and what is dropped before deploying.

Book a pilot

Self-host the open-source core today, or talk to us about a managed pilot with white-glove integration and a compliance packet for your reviewers.