Skip to content
Back to the finding

Receipts

Receipts: Same Technique, Opposite Results

Raw artifacts behind the published finding. The prompts, the outputs, the scoring, and the analysis.

The content-specificity effect is not a quirk of one model. The clean 2x2 specificity experiment, rerun on a second generator family, produces a virtually identical effect size at density.

Generatorspecificity effect at densitysource
xAI (grok-4-1-fast)Hedges g 1.651catching-your-own-overclaim
Gemini Flash (gemini-3-flash-preview)Cohen d 1.669 / Hedges g 1.636this kit

Same design both times: a 2x2 (specificity present/absent, quality demands present/absent), 10 runs per cell, the same Northvane strategic-analysis task, density normalization instead of a length cap.

The one thing that matters here: "at density"

Raw scores do not show this cleanly. On Gemini Flash the raw specificity effect is only d=0.67, because quality demands produce longer outputs and inflate the raw marker count (the same length confound the xAI experiment hit). Normalizing to markers-per-1k-words removes the confound and the specificity effect lands at d=1.67. So the cross-generator claim is specifically: specificity at density is cross-generator. It is not a raw-score claim.

Scope

  • Cross-generator means xAI and Gemini Flash. Gemini Pro was inconclusive (outputs truncated to about 60 words), not a confirming null.
  • This kit holds the Gemini Flash side (40 raw outputs, the computed analysis). The xAI side is its own published receipt, linked above.

Recompute

python3 script.py

Expect Gemini Flash specificity at density: Cohen d 1.67, Hedges g 1.64.

Limits

  • 10 runs per cell (40 outputs), one generator per kit. Directional, CIs exclude zero but are wide.
  • Programmatic marker scoring, not a domain-expert quality judgment. The companion xAI receipt records that a blind domain expert could not distinguish specific from generic outputs on quality, only on verifiable form.
  • One task, one domain (the fictional Northvane scenario). March 2026 models.

Files in this folder