StreamMDX Perf Quality Changelog

This log tracks perf harness snapshots used to make scheduling decisions. Values are p95 unless noted. Paths point to local tmp/ outputs.

2026-01-16 Baseline refresh (naive-bayes + table-large)

Baselines compared:

Summary (p95, base -> candidate):

S1_slow_small: duration 113694.70 -> 113677.80 (-0.0%), first flush 373.20 -> 376.10 (+0.8%), longtask 74.00 -> 97.00 (+31.1%), raf delta 16.80 -> 16.80, memory peak 87.45 -> 110.63 (+26.5%).
S2_typical: duration 18982.50 -> 18982.60 (0.0%), first flush 258.40 -> 300.20 (+16.2%), longtask n/a -> 64.00, raf delta 16.80 -> 16.80, memory peak 87.45 -> 110.63 (+26.5%).
S4_chunky_network: duration 4750.10 -> 4766.30 (+0.3%), first flush 293.00 -> 222.40 (-24.1%), longtask 58.00 -> 75.00 (+29.3%), raf delta 16.80 -> 16.80, memory peak 87.45 -> 110.63 (+26.5%).
table-large S2_typical: duration 633.10 -> 633.20 (0.0%), first flush 217.20 -> 225.50 (+3.8%), raf delta 16.80 -> 16.80, memory peak 87.45 -> 110.63 (+26.5%).
table-large S6_extreme: duration 66.60 -> 66.50 (-0.2%), raf delta 16.70 -> 16.80 (+0.6%).

Notes:

All guardrails were within bounds; S2 first flush regressed +16.2% but stayed below thresholds.
Longtask p95 is newly reported for S2 (base was n/a); treat as new visibility rather than a regression.
Profiler run (LOCAL_BENCHMARKS): naive-bayes/S2_typical 2026-01-16T06-23-05-474Z actual p95 8.70 ms, base p95 128.40 ms.

Change:

Long-task stats now only consider tasks occurring between runStart and runEnd.
This removes page-load noise from perf results and makes stutter attribution more accurate.

Baselines refreshed from:

Baseline:

Candidate (aggressive default):

S2_typical vs baseline:

duration p95: 19065.10 ms -> 18982.60 ms (-0.4%)
first flush p95: 422.30 ms -> 239.30 ms (-43.3%)
longtask p95 (run p95s): 597.00 ms -> 404.00 ms (-32.3%)
raf delta p95 (run p95s): 16.80 ms -> 16.80 ms (0.0%)
memory peak p95: 87.45 MB -> 87.45 MB (0.0%)

S3_fast_reasonable vs baseline:

duration p95: 7632.10 ms -> 7627.60 ms (-0.1%)
first flush p95: 276.60 ms -> 268.50 ms (-2.9%)
longtask p95 (run p95s): 420.00 ms -> 422.00 ms (+0.5%)
raf delta p95 (run p95s): 16.80 ms -> 16.80 ms (0.0%)
memory peak p95: 87.45 MB -> 87.45 MB (0.0%)

Smooth vs aggressive comparison (3 runs + warmup):

smooth S2: tmp/perf-runs/naive-bayes-S2_typical-2026-01-09T20-36-48-998Z
smooth S3: tmp/perf-runs/naive-bayes-S3_fast_reasonable-2026-01-09T20-38-18-288Z

S2_typical smooth -> aggressive:

S3_fast_reasonable smooth -> aggressive:

Baselines updated from:

Smooth vs aggressive (new baseline) comparison:

smooth S2: tmp/perf-runs/naive-bayes-S2_typical-2026-01-09T23-45-11-719Z
smooth S3: tmp/perf-runs/naive-bayes-S3_fast_reasonable-2026-01-09T23-46-49-380Z

S2_typical smooth -> aggressive:

S3_fast_reasonable smooth -> aggressive:

Candidate runs:

S2_typical vs previous baseline:

S3_fast_reasonable vs previous baseline:

Baselines refreshed from the candidate runs above.

Baselines updated from: