Benchmarks

Reproducible streaming and static markdown comparisons

This page distinguishes live incremental behavior from one-shot static rendering. The goal is not to publish a single vanity number, but to let you inspect how different renderers behave under the same fixtures, chunk cadence, and browser session.

Fixture drivenSeeded scenariosLive incrementalStatic content classesBrowser-local measurementsPerf harness methodology Streamdown comparison notes

How to read this page

Results are local and hardware-dependent. Compare engines under the same browser, fixture, and scheduler settings.
Live incremental numbers answer a different question than static rendering. Both matter and should be read separately.
Memory, bundle size, and worker-hosting tradeoffs belong alongside latency numbers; they are not interchangeable metrics.

Reproduce locally

npm run docs:dev

npm run perf:harness -- --fixture naive-bayes --scenario S2_typical --runs 3 --warmup 1

npm run perf:compare -- --base tmp/perf-runs/<base>/summary.json --candidate tmp/perf-runs/<candidate>/summary.json

Benchmark frame

Metric definitions

First visible render

Time from emitted delta to the first observable DOM mutation for an engine.

Why it matters: This is the most user-visible latency metric during streaming.

Final convergence

Time from emitted delta to the final stable DOM state for that update window.

Why it matters: This captures whether the renderer settles quickly or churns after visible output appears.

Patch-to-DOM latency

Measured time across the ingest, scheduling, and commit path before content becomes visible.

Why it matters: It exposes scheduler pressure and batching behavior under real incremental streams.

Static render timing

One-shot render timing for prose, tables, code, and mixed markdown fixtures.

Why it matters: It shows how engines behave outside the delta-stream case and catches content-class cliffs.

Fixture classes

Static content classes

Prose heavy

Narrative markdown with headings, nested lists, links, and inline emphasis.

Table heavy

Dense table markup where row/cell integrity and stable layout matter more than raw token count.

Code heavy

Multiple fenced blocks with different languages, where syntax-highlighting cost becomes visible.

Mixed rich markdown

A combined fixture with tables, tasks, inline code, links, and surrounding prose.

Rich feature stress

A capability workload with math, MDX, HTML, tables, code, and footnotes. It is not a parity fixture for every engine.

Runtime cost

Memory and bundle terminology

Shipped client bundle

The JavaScript transferred to the browser for a page route before optional worker assets are considered.

Hosted worker asset

The separately served worker bundle used by StreamMDX in production when parsing is isolated off the main thread.

Runtime loaded code

Everything the browser eventually executes during a benchmark session, including lazily loaded chunks and worker code.

Peak memory

The highest memory sample observed during a local browser run. It is environment-dependent and should only be compared inside the same session class.

Scheduling

Scheduler / jitter modes

CI locked

Claim-grade mode. Keeps chunk cadence, order, workload, and StreamMDX scheduling deterministic enough for reproducible local comparisons.

Explore

Diagnosis mode. Lets you vary chunking, interval, ordering, and workload to find cliffs without treating the results as published baselines.

The live comparison lab exposes these modes directly. Use CI locked for reproducible comparisons and Explore to characterize scheduler sensitivity without turning the result into a public claim.

Reading the results

Parity workloads vs capability workloads

The benchmark surface now includes both parity workloads and one rich feature stress workload. The parity workloads are the fair StreamMDX/Streamdown/react-markdown comparison set. The rich stress case exists to show how StreamMDX behaves when math, MDX, HTML, tables, code, and footnotes are all active in the same document. Unsupported cells are marked explicitly instead of being counted as comparable runs.

Parity workloads

Common-markdown fixtures used for direct StreamMDX/Streamdown/react-markdown comparisons under the same browser session, scheduler mode, and scenario.

Capability workloads

Richer workloads that exercise StreamMDX-specific features such as mixed MDX, math, HTML, footnotes, and worker-aware composition. These are shown for behavior inspection, not direct cross-engine claims.

Claim discipline

What this page can and cannot claim

It can compare renderers fairly on the shared parity fixtures under the same local browser session and scheduler mode.
It can show how StreamMDX behaves on richer feature workloads that other engines in this lab do not fully support.
It cannot justify universal cross-machine latency or memory superiority claims outside this methodology envelope.

Coverage

Why these five static classes are the public set

The current five-class public set is intentionally final for the active plan: four parity-friendly classes (prose, tables, code, mixed) plus one explicitly marked capability stress class (rich). Adding more public classes is deferred until a distinct behavior family appears that the current set does not already expose.

Live renderer comparison lab (sequential)

Renderers run one-by-one to minimize CPU contention. Each engine gets an unscored warmup pass before the scored pass. Metrics split into first-visible commit vs final-stable commit.

State: idle

Active engine: -

Phase: warmup

Run: 0 / 1

Measured passes: 3

Total passes incl. warmup: 6

Delta: 0 / 195

Chars: 0 / 8,165

Throughput: 0 chars/s

1. StreamMDX2. Streamdown3. react-markdown

Current order: -

Methodology mode

Freeform tuning enabled for diagnosis.

Active profile snapshot: chunk=42, interval=32ms, repeats=16, runs=1, order=fixed, workload=parity-gfm. Differs from CI profile.

A scored run is one full cycle across all renderers (then repeated for the configured run count).

Chunk size (chars)42

Emit interval (ms)32

Fixture repeats16

Scored run cycles1

Order mode

Benchmark profile

Chart metric

Split mode shows both metrics.

Chart layout

Per-delta latency (first visible commit - emit time)

Y max: 10 ms

Per-delta latency (final stable commit - emit time)

Y max: 10 ms

StreamMDXStreamdownreact-markdown

Renderer	First paint p50	First paint p95	Final stable p50	Final stable p95	Run p50	Throughput p50
StreamMDX	-	-	-	-	-	-
Streamdown	-	-	-	-	-	-
react-markdown	-	-	-	-	-	-

Metric leaders

First paint p50: -Final stable p50: -Run p50: -Throughput p50: -

StreamMDX: 0 / 4 metric winsStreamdown: 0 / 4 metric winsreact-markdown: 0 / 4 metric wins

CI gate (StreamMDX vs Streamdown)

PENDING • pass=0 fail=0 pending=4Run in CI locked mode before claiming results

First paint p50 <= StreamdownFinal stable p50 <= StreamdownRun p50 <= StreamdownThroughput p50 >= Streamdown

Renderer	Emit→ingest p50	Ingest→commit p50	Emit→commit p50	Append overhead p50	Timer drift p50 / p95
StreamMDX	-	-	-	-	- / -
Streamdown	-	-	-	-	- / -
react-markdown	-	-	-	-	- / -

StreamMDX

Incremental worker parser + patch renderer

No stream data yet.

Streamdown

Drop-in streaming replacement for react-markdown

No stream data yet.

react-markdown

Baseline markdown renderer (full re-render on updates)

No stream data yet.

Method note: each engine runs warmup then scored pass in isolation. Commit timing capture is unified across engines via layout-effect commit hooks to avoid instrumentation bias. Use CI locked mode for claim-grade runs; use explore mode to diagnose bottlenecks. \"First paint\" measures earliest visible commit; \"final stable\" reflects how quickly each delta settles after downstream formatting/render passes.

Static render comparison (content types)

Measures one-shot static rendering across common-markdown content classes plus one richer StreamMDX-only stress workload. Times are captured per engine as first mutation and final settled mutation.

State: idle

Iterations: 3

Progress: 0 / 0

Active fixture: -

Active engine: -

Timeouts: 0

Iterations per fixture3

Prose heavyTable heavyCode heavyMixed rich markdownRich feature stress

Content type	Description	StreamMDX first/final p50	Streamdown first/final p50	react-markdown first/final p50	Final p50 winner
Prose heavy	Long narrative text with headings, lists, and inline formatting.	- / -n=0	- / -n=0	- / -n=0	-
Table heavy	Large table blocks with short explanatory text.	- / -n=0	- / -n=0	- / -n=0	-
Code heavy	Multiple fenced code blocks with surrounding markdown.	- / -n=0	- / -n=0	- / -n=0	-
Mixed rich markdown	Lists, quotes, links, tasks, and tables in one document.	- / -n=0	- / -n=0	- / -n=0	-
Rich feature stress	Math, MDX, HTML, tables, and code in one workload. Timed only where the engine exposes those capabilities in this harness.	- / -n=0	unsupported in this harness	unsupported in this harness	-

StreamMDX

Streamdown

react-markdown