Receipts
Receipts: Three AIs, No Source, the Same Answer
Raw artifacts behind the published finding. The prompts, the outputs, the scoring, and the analysis.
These files are the raw artifacts behind the finding published at https://blog.clarethium.com/source-is-the-substrate.
The published claim is that the source material you put in the context is the dominant variable affecting numerical fidelity, larger than any prompt-engineering intervention. The same model, on the same prompt, produces numbers that mostly match a provided source when the source is present, and mostly does not when it is absent. This folder contains the three-model verification run from May 1, 2026 that establishes the effect.
What's here
| File | What it is |
|---|---|
preflight_v3.py | The test as run. 1 topic (remote work) x 2 conditions (source-present, source-absent) x 2 versions x 3 generators = 12 generations. Measurement is programmatic number matching, zero LLM judgment. |
preflight_v3_results.json | Per-cell results and the aggregate summary. The match rates in the post re-derive from the summary block. Full model transcripts are not included; verbatim excerpts are in samples.md. |
samples.md | Verbatim excerpts from the source-absent generations, one per model run, showing the Bloom / Ctrip convergence in the models' own words. |
_config.py | Documented stub for the provider-client helpers. Replace the NotImplementedError bodies with your own SDK calls to reproduce end to end. |
How to read this
- If you want to check the match rates: open
preflight_v3_results.jsonand read thesummaryblock. Each cell reportstotal_numbers,in_source, andsource_match_rate. The +58 / +45 / +82pp deltas in the post are the source-present rate minus the source-absent rate per model. No API access needed; the numbers are already computed. - If you want to check the convergence claim: open
samples.md. The post claims that all three model families, given no source, named the same study (Bloom's 2015 Ctrip trial) and that five of six runs produced the 13 percent figure. The excerpts are verbatim, so you can confirm both the attribution and the number against what the models actually wrote. - If you want to replicate:
preflight_v3.pyis the procedure exactly as run. It imports two helpers (load_source/analyze_numbersand the source-present / source-absent prompt builders) that live in the source-conditioning kit, which shares this experiment family, plus the_config.pyprovider stub here. The number-matching logic is pure Python and runs without any API.
What these receipts prove (and don't)
They prove: on this run, source presence moved the source-match rate from 12 to 41 percent up to 86 to 95 percent across three independent model families, and three independent training runs reconstructed the same canonical study and number from parametric memory with attribution.
They do not prove: that source grounding fixes non-numerical fidelity (entities, claims, reasoning are not measured here), that the effect size is stable to tight intervals (N=2 versions per cell, single topic), or that the mechanism holds for novel reasoning and creative generation where no source exists. The post states these limits; the data does not exceed them.
Errata
Corrections go to the LinkedIn DM linked from /about. One correction is already folded into the published post: an earlier draft claimed only one model named the study and the other two produced 13 percent without attribution. The data shows the opposite (all six runs named the study), and the post and these receipts reflect the corrected finding.
Related receipts
- source-conditioning: the broader EXP-081 family. Prohibition versus monitoring, partial sources, cross-generator replication.
- fabrication-architecture: temporal instability of unsourced numbers, the prior that source grounding corrects.
Files in this folder
- README.md4.0 KB
Overview and how to read these artifacts.
- preflight_v3.py6.2 KB
Three-model source-grounding test (May 1, 2026). 1 topic x 2 conditions x 2 versions x 3 generators = 12 generations. Programmatic number matching, zero LLM judgment.
- preflight_v3_results.json32.5 KB
Per-cell results + aggregate summary. The +58 / +45 / +82pp match-rate deltas re-derive from the summary block. Full transcripts excluded; verbatim excerpts in samples.md.
- samples.md2.4 KB
Verbatim source-absent excerpts, one per model run, showing the Bloom / Ctrip convergence in the models own words.