What is a replayability score?

A deterministic 0 to 1 score with an A to F grade and warnings that tells you whether a user-reported bug is reproducible, before anyone opens an editor.

How does StepStitch protect PII?

The SDK redacts in the browser and the server scrubs every trace again before storage. It never stores screenshots, input values, page text, request bodies, cookies, or identifiers like SSNs and card numbers.

Open source, Apache-2.0

Turn a user-reported bug into a regression test.

Q: Is StepStitch session replay?

No. StepStitch captures the structure of what broke (route templates, stable selectors, API status codes), never screens, input values, page text, or raw URLs. It is issue-to-repro infrastructure, not session replay.

Q: Is StepStitch open source?

Yes. The SDK, service core, MCP connector, and adapters are all Apache-2.0 and self-hostable, so your reviewers can audit exactly what is captured and what is dropped.

When a customer hits a bug, your team can reproduce it and prove the fix. StepStitch never records their screen, their keystrokes, or their data.

Book a pilot Self-host free

generate_playwright_reproReplayability A

import { test, expect } from '@playwright/test';

// StepStitch autogenerated reproduction (trace: trc_9f4c1ae2)
// Replayability: 0.76 (grade B)
//   ⚠ templated_route_needs_fixture [step 1]: substitute a concrete id.
test('StepStitch reproduction', async ({ page }) => {
  // TODO: authenticate as a synthetic test user if the flow requires it.

  await page.goto('/accounts/:id');
  await page.goto('/accounts/:id/transfer');
  await page.locator('[data-testid=payee-select]').click();
  await page.locator('[data-testid=amount-input]').click();

  // Match the request by route + method so it resolves whether or not the
  // bug is present, then assert on status. Red while broken, green once fixed.
  const response0 = page.waitForResponse(
    (r) => r.url().includes('/api/accounts/') && r.request().method() === 'POST',
  );
  await page.locator('[data-testid=review-transfer]').click();
  // expected API failure: /api/accounts/:id/transfers (HTTP 500)
  const res0 = await response0;
  expect(res0.status(), 'no server error from /api/accounts/:id/transfers').toBeLessThan(500);
});

Generated from a scrubbed trace. Runs red, fix turns it green.

In plain words

A customer hits a problem

Something breaks while they are using your app, like a payment that will not go through.

StepStitch records the steps, not the screen

It captures what they clicked and what failed. Never their screen, their typing, or any personal data.

Your team reproduces it and proves the fix

In one click it becomes a test that fails on the bug and passes once it is fixed, so it stays fixed.

Most engineering teams do not need another recording to watch. They need a user-reported bug that can become a regression test.

Session replay is a security camera

It records the screen, inputs, and PII. Useful, until an auditor asks why customer data left the building.

Error tracking is a crash sensor

It tells you where the code broke, but not the steps the user took to break it.

StepStitch is a flight recorder

It keeps the structural steps, no screens or values, and replays them as a test you can run.

Session replay

Captures the screen, then asks an engineer to watch it back.

Records pixels, text, and input values by default
Carries PII into a third-party tool you do not control
Often banned outright in regulated environments
Leaves you with a video, not a fix

StepStitch

Captures the structure of what broke, then compiles a test.

Route templates, stable selectors, API status codes
Scrubbed in the browser and again on the server
Self-hosted, so the data never leaves your boundary
Leaves you with a runnable Playwright reproduction

From one report to a merged fix

StepStitch perceives, scores, compiles, and drafts. It never plans or acts on its own. The autonomy stays in your stack.

Perceive

A user reports a bug. StepStitch stores a scrubbed, structural trace.

list_recent_traces

Score

A deterministic 0 to 1 score and an A to F grade say if it reproduces.

get_replayability_score

Reproduce

Fetch a deterministic Playwright test built from the trace. Text only.

generate_playwright_repro

Verify

Run it in your CI or sandbox. Red turns green once the fix lands.

get_verifications

Fix, human-gated

Open a pull request with the regression test. A reviewer merges, never the agent.

github_bridge

One report, two views

The same moment, from both sides

Your user keeps their screen, their inputs, and their data. Your engineers get structure, a score, and a runnable test. Step through the whole workflow.

A user hits a 500 on a transfer.

What your user sees

Transfer · review

Something went wrong (500)

What the developer sees

Awaiting a report. Capture is off until consent.

Live demo

See exactly what your team gets

A real example, live from a running StepStitch service. Click through the tabs to follow what happened, how reproducible it is, what was kept private, and the test it wrote automatically.

loading…

Not session replay, not error tracking

Those tools tell you something broke. StepStitch hands you a test that proves it, with nothing sensitive leaving your boundary. Even the open-source replay tools still record the screen.

Capability	Session replayFullStory, LogRocket	OpenReplayOpen-source replay	APM and errorsSentry, Datadog	StepStitchIssue-to-repro
Captures screens, page text, input values	By default	Records DOM	Often	Never
PII risk in the tool	High	Medium	Medium	Nothing sensitive captured
Proves the bug is reproducible	No	No	No	0 to 1 score, A to F grade
Output is a regression test	A video	Exports a script	A stack trace	Asserting Playwright test
Self-hosted and auditable	SaaS only	Open source	SaaS only	Apache-2.0, self-host
Native to agent networks	No	No	No	MCP, 8 read-only tools

What ships today

A capability surface, not a roadmap

Every piece below is in the open-source repository, backed by a named test. Nothing here is a promise.

Two-layer privacy boundary

The SDK redacts in the page, but the backend never trusts the client. Every trace is scrubbed again on the server before it is stored. Defense in depth, proven by a named test.

screenshotsinput valuespage textraw URLsrequest bodiescookies & headersSSNs & card numbers

Replayability score

A deterministic 0 to 1 score with an A to F grade and warnings. Decide if a bug reproduces before anyone opens an editor.

Deployment profiles

A profile can only tighten the privacy boundary, never loosen it.

financial-services-enterprise
healthcare-strict
internal-enterprise
open-source-default

Drafts into your system of record

Flat, sanitized drafts. Draft-only, never an autonomous write.

ServiceNowSalesforceGenesysJiraZendesk+ DraftAdapter SDK

Deterministic compiler

The same trace always compiles the same Playwright test. Text only, never run against production.

Repair loop and verified-fix corpus

A trace becomes a labeled GitHub issue and a regression-test pull request. A reviewer merges, never the agent. Only a pre-fail to post-pass transition is recorded as confirmed fixed.

pre: failtopost: pass=confirmed_fixed

Observability and kill switch

A zero-dependency Prometheus endpoint, audited reads, and an org-wide kill switch that fails safe on error.

Bring your own agentic network

StepStitch is a capability provider, not an agent orchestrator. One MCP server surfaces eight read-only and draft tools. Any agent network consumes them. The autonomy lives in your stack.

Eight Copilot-safe tools

list_recent_tracesget_trace_summaryget_replayability_scoreget_privacy_postureget_diagnostic_summarygenerate_playwright_reprocreate_export_previewcreate_fs_export_preview

Destructive operations stay off the agent surface. Delete, purge, kill switch, and direct writes are admin-only and human-gated.

Works with any MCP client

The same contract is surfaced three ways: an MCP server, an OpenAPI connector for Copilot Studio, and function specs for tool-calling models.

MMicrosoft Copilot Studio

OOpenAI

CClaude

LLangGraph

AAWS Bedrock

GGoogle Vertex

Built to be audited, not just trusted

The privacy boundary is open source. Your reviewers can read exactly what is captured and what is dropped, line by line, before anything is deployed.

The trust boundary is the code

Every component is Apache-2.0: the SDK, the service core, the MCP connector, and the adapters. Built for regulated and quality-focused teams that self-host.

test_scrubber.pytest_profiles.pytest_golden_path.pytest_repro_eval.py.importlintertest_compliance.py

Read the compliance evidence

Mapped to the regulations your reviewers cite

SEC Reg S-P (2024)

Safeguards and recordkeeping. Incident records retained five years.

2026 interagency MRM guidance

Auditability, ongoing monitoring, and human oversight of model use.

NIST AI RMF

Data governance, documentation, accountability, incident response.

Questions, answered

The things technical and compliance reviewers ask first.

Is StepStitch session replay?

No. It captures the structure of what broke (route templates, stable selectors, API status codes), never screens, input values, page text, or raw URLs. It is issue-to-repro infrastructure, not session replay.

Is the generated test a real regression test?

Yes. A captured API failure becomes an armed page.waitForResponse plus a status assertion; a captured client exception becomes a pageerror assertion. The test fails while the bug is present and passes once it is fixed, so it is safe to keep in CI as a regression guard.

How long does self-hosting take?

Minutes. The service ships as a Docker image with a one-command Railway deploy; the SDK is an npm install with zero runtime dependencies. See the Self-host guide.

Is it compatible with HIPAA / SEC Reg S-P?

StepStitch is self-hosted and never captures PII, so customer data never leaves your boundary. The healthcare-strict profile disables free text entirely; the financial-services profile scrubs and drops forbidden keys. See the Security page for the full crosswalk.

What frameworks does it work with?

Any web frontend. The SDK is framework-agnostic TypeScript that records structural footsteps; the compiled reproduction is standard Playwright.

Is StepStitch open source?

Yes, Apache-2.0 across the SDK, service core, MCP connector, and adapters. You can read exactly what is captured and what is dropped before deploying.

Book a pilot

Self-host the open-source core today, or talk to us about a managed pilot with white-glove integration and a compliance packet for your reviewers.