Skip to content
Back to the finding

Receipts

Receipts: Three AIs, No Source, the Same Answer

Raw artifacts behind the published finding. The prompts, the outputs, the scoring, and the analysis.

These files are the raw artifacts behind the finding published at https://blog.clarethium.com/source-is-the-substrate.

The published claim is that the source material you put in the context is the dominant variable affecting numerical fidelity, larger than any prompt-engineering intervention. The same model, on the same prompt, produces numbers that mostly match a provided source when the source is present, and mostly does not when it is absent. This folder contains the three-model verification run from May 1, 2026 that establishes the effect.

What's here

FileWhat it is
preflight_v3.pyThe test as run. 1 topic (remote work) x 2 conditions (source-present, source-absent) x 2 versions x 3 generators = 12 generations. Measurement is programmatic number matching, zero LLM judgment.
preflight_v3_results.jsonPer-cell results and the aggregate summary. The match rates in the post re-derive from the summary block. Full model transcripts are not included; verbatim excerpts are in samples.md.
samples.mdVerbatim excerpts from the source-absent generations, one per model run, showing the Bloom / Ctrip convergence in the models' own words.
_config.pyDocumented stub for the provider-client helpers. Replace the NotImplementedError bodies with your own SDK calls to reproduce end to end.

How to read this

  • If you want to check the match rates: open preflight_v3_results.json and read the summary block. Each cell reports total_numbers, in_source, and source_match_rate. The +58 / +45 / +82pp deltas in the post are the source-present rate minus the source-absent rate per model. No API access needed; the numbers are already computed.
  • If you want to check the convergence claim: open samples.md. The post claims that all three model families, given no source, named the same study (Bloom's 2015 Ctrip trial) and that five of six runs produced the 13 percent figure. The excerpts are verbatim, so you can confirm both the attribution and the number against what the models actually wrote.
  • If you want to replicate: preflight_v3.py is the procedure exactly as run. It imports two helpers (load_source / analyze_numbers and the source-present / source-absent prompt builders) that live in the source-conditioning kit, which shares this experiment family, plus the _config.py provider stub here. The number-matching logic is pure Python and runs without any API.

What these receipts prove (and don't)

They prove: on this run, source presence moved the source-match rate from 12 to 41 percent up to 86 to 95 percent across three independent model families, and three independent training runs reconstructed the same canonical study and number from parametric memory with attribution.

They do not prove: that source grounding fixes non-numerical fidelity (entities, claims, reasoning are not measured here), that the effect size is stable to tight intervals (N=2 versions per cell, single topic), or that the mechanism holds for novel reasoning and creative generation where no source exists. The post states these limits; the data does not exceed them.

Errata

Corrections go to the LinkedIn DM linked from /about. One correction is already folded into the published post: an earlier draft claimed only one model named the study and the other two produced 13 percent without attribution. The data shows the opposite (all six runs named the study), and the post and these receipts reflect the corrected finding.

Related receipts

  • source-conditioning: the broader EXP-081 family. Prohibition versus monitoring, partial sources, cross-generator replication.
  • fabrication-architecture: temporal instability of unsourced numbers, the prior that source grounding corrects.

Files in this folder