StreamMDX Perf Harness
This harness streams a fixture through the real StreamingMarkdown renderer in the docs app and collects:
- flush metrics (
RendererMetrics) - long tasks (filtered to the stream window)
- rAF cadence
- optional memory samples (Chromium only)
- optional CDP performance metrics (task/script/layout/recalc/paint deltas)
- optional DOM counters (nodes/event listeners)
It is local-only by design and is intended for iterative optimization with the regression suite as a hardline behavior lock.
Methodology contract
Keep the public benchmark story disciplined:
- parity workloads are the only direct cross-engine comparison set
- capability workloads exist to show StreamMDX behavior on richer feature mixes, not to manufacture apples-to-oranges claims
CI lockedis the claim-grade live modeExploreis the diagnosis mode- shipped client bundle, hosted worker asset, runtime loaded code, and peak memory are separate cost categories
Prereqs
Start the docs dev server (port 3000):
npm run perf:sync-fixtures
npm run docs:devRun the harness
npm run perf:harness -- --fixture naive-bayes --scenario S2_typical --scheduling aggressive --runs 3 --warmup 1If you want to compare syntax highlighting modes, add features.codeHighlighting to the streaming config
in the demo/harness (default is "incremental"). A typical workflow is:
- Run a baseline with
codeHighlighting: "final". - Run a candidate with
codeHighlighting: "incremental"or"live". - Compare summaries with
perf:compare.
Outputs land in:
tmp/perf-runs/<fixture>-<scenario>-<timestamp>/
run.json
summary.json
summary.txtFixtures and scenarios
- Fixtures:
tests/regression/fixtures/*.md - Scenarios:
tests/regression/scenarios/*.json
Sync them into the docs public folder when fixtures change:
npm run perf:sync-fixturesExample scenario format:
{
"id": "S2_typical",
"label": "Typical streaming",
"updateIntervalMs": 16,
"charRateCps": 1200,
"maxChunkChars": 256
}Scheduling presets
The harness accepts --scheduling (default, smooth, aggressive) and optional overrides. If omitted, it defaults to aggressive.
When you translate those harness knobs into the public benchmark surface, keep only these two interpretations:
CI locked: fixed scheduler behavior for reproducible local comparisonsExplore: freer tuning for diagnosis, never for public claim language
Raw preset names such as smooth or aggressive are implementation details; they are not public benchmark categories on their own.
Supported raw overrides:
--batch microtask|timeout|rAF
--frameBudgetMs 8
--maxBatchesPerFlush 8
--lowPriorityFrameBudgetMs 4
--maxLowPriorityBatchesPerFlush 2
--urgentQueueThreshold 3
--historyLimit 200
--startupMicrotaskFlushes 4
--adaptiveBudgeting true|false
--adaptiveSwitch true|false
--adaptiveQueueThreshold 12These map to StreamingSchedulerOptions in @stream-mdx/react.
Interpretation rule:
- use a fixed scheduler preset when comparing candidate vs baseline
- do not change scheduler knobs mid-comparison and then treat the result as a pure renderer win/loss
- if you need claim-grade browser comparisons, line them up with the locked methodology described in [
SCHEDULING_AND_JITTER.md](./SCHEDULING_AND_JITTER.md) - if you are running the rich capability workload, do not report it as a direct Streamdown/react-markdown parity result
Useful scheduler-specific commands:
npm run test:regression:scheduler-parity
STREAM_MDX_PERF_BASE_URL=http://127.0.0.1:3012 npm run perf:characterize:schedulertest:regression:scheduler-paritychecks that representative fixtures converge to the same final HTML undersmooth,timeout, andmicrotask.perf:characterize:schedulerwrites a localCI lockedvsExploresummary totmp/perf-runs/scheduler-characterization/.
Optional React profiler
Capture React commit durations for the StreamingMarkdown subtree:
--profilerThis adds profiler actual/base stats to the summary output.
Optional CPU throttling
You can apply a CDP CPU throttle to reduce variance (Chromium only):
--cpuThrottle 4Omit the flag to run unthrottled.
Compare / gate perf runs
npm run perf:compare -- --base tmp/perf-runs/<base>/summary.json --candidate tmp/perf-runs/<cand>/summary.jsonAdd --gate to fail on regressions. Defaults are conservative; override per metric:
--durationP95MaxPct 0.1
--firstFlushP95MaxPct 0.1
--longTaskP95MaxPct 0.25
--rafP95MaxPct 0.2
--memoryPeakP95MaxPct 0.15Gate both baselines in one command
npm run perf:gate -- --candidateS2 tmp/perf-runs/<s2-run> --candidateS3 tmp/perf-runs/<s3-run> --gateOptional edge-like long-run gate
To include an edge-like stress scenario in the same gate command, pass --candidateEdge
(alias: --candidateS6) and provide a matching baseline path:
npm run perf:gate -- \
--candidateS2 tmp/perf-runs/<s2-run> \
--candidateS3 tmp/perf-runs/<s3-run> \
--candidateEdge tmp/perf-runs/<table-large-s6-run> \
--baseEdge tmp/perf-baselines/S6_extreme_edge_like \
--gateIf candidateEdge is omitted, the edge-like gate is skipped.
Baselines
Canonical baselines live under:
tmp/perf-baselines/S2_typical
tmp/perf-baselines/S3_fast_reasonable
tmp/perf-baselines/S6_extreme_edge_like (optional edge-like stress baseline)Notes
- Memory sampling is only available in Chromium (
performance.memory). - CDP metrics and DOM counters come from
Performance.getMetricsandMemory.getDOMCounters(Chromium only). - This harness uses the docs worker (
/workers/markdown-worker.js) and demo registry. perf:demotargets the/demopage and is separate from this harness.- Treat shipped client bundle, hosted worker asset, runtime loaded code, and peak memory as different cost categories.
- The current public static benchmark set is intentionally five classes: four parity-friendly classes plus one explicitly marked capability stress class.