Same Technique, Opposite Results
The "It Depends" Problem · 1 of 2
Two tasks. Same AI model. Same approach. One task needs a precise answer. The other needs creative exploration.
The approach helped the first task immediately. The output got more specific, more grounded, more precise. Measurably so, and the effect held across different AI models. The same approach on the second task made the output worse.
Not "helped less." Damaged. The structured approach that produced precision on convergent problems actively harmed exploratory ones. Identical technique. Opposite results. Cross-generator validated on xAI and Gemini Flash.
There is no universally good prompt. No best practice that works everywhere. The task type determines whether a technique helps or hurts, and most people don't distinguish task types before choosing their approach.
The mechanism: when a task has a known answer and needs precision, structure concentrates the model's output distribution. It narrows toward the right region. Focused, specific, hitting the target.
When a task requires exploration, the same concentration collapses the search space. The model needs to spread across possibilities, consider non-obvious angles, resist premature convergence. Structure forces it to organize before it's explored. The output gets tidier and shallower. 100 percent compliance with the instruction. Narrower range. Less discovery.
Two types of AI users fall on either side of this split.
The first reads every output, approves what they understand, rejects what they don't. Slow. Scales linearly with human attention. But safe when you're the domain expert.
The second builds systems (quality gates, tests, standards) and audits selectively. Faster. Scales with system quality. But only works when the system matches the task type. A quality system built for convergent tasks applied to exploratory work produces compliant mediocrity.
The practical move: before choosing any technique, ask one question. Does this task have a known right answer that needs precision? Or does this task need range and exploration? The technique that's optimal for one is harmful for the other.
Specificity helps convergence. Openness helps exploration. Mixing them doesn't average out. It damages whichever mode the task actually needed.
One caveat worth stating here, not just in the evidence section: the effect sizes measure programmatic specificity markers, not quality as a domain expert would judge it. In a blind test (N=1 domain expert, 5 pairs), the expert couldn't distinguish specific from generic outputs on quality. Both conditions produced the same analytical substance. The direction (structure helps convergence, harms exploration) is robust. But what specificity changes is output form (more verifiable references), not substance (same conclusions to a domain expert).
Test this yourself
Take one prompt technique you use. Run it on a precise analytical task and an open creative task. Compare whether it helps or hurts each.
Evidence: HIGH. Clean 2×2 decomposition (confounds controlled): specificity d=1.37 raw / d=1.65 density (both CIs exclude zero). **Cross-generator confirmed:** xAI d=1.651, Gemini Flash d=1.669 at density (virtually identical, both CIs exclude zero). Gemini Pro INCONCLUSIVE (truncation artifact, not a null). Quality demands d=1.19 density (CI excludes zero) but d=0.11 raw (CI includes zero), meaning quality demands effect is density-dependent. Quality demands cross-gen: d=0.861 Gemini Flash (same direction, weaker). Original d=2.34 conflated both effects + length. Harm on exploratory tasks directional but consistent.
What survived testing
- Structure helps convergence (specificity d=1.65 density, cross-generator confirmed: xAI d=1.651, Gemini Flash d=1.669; quality demands d=1.19 density-dependent, Gemini Flash d=0.861; original d=2.34 stacked both + length)
- Same structure hurts exploration (consistent direction across framing effects/023b/025/033/044)
- Cross-generator: effect replicates on xAI and Gemini
- 100% compliance with structure on exploratory tasks. the model follows the instruction, the output gets worse
What didn't survive
- "Structure always helps" killed. Task-type dependent.
- "The harm is about constraint density" partially killed. It's about concentration vs range, not about how many constraints.
- Quality magnitude claims are LLM-calibrated. Human evaluation (cross-evaluator comparison, 3 families, N=1) shows 0/5 holistic agreement with LLM scores.
Honest limits
- Quality scores measure LLM-valued properties. Direction claims hold; magnitude claims are LLM-calibrated.
- "Exploratory" operationalized as open-ended creative/strategic tasks. Other definitions may produce different boundaries.
- March 2026 models. The task-type dependency is likely structural to how attention works. The specific d values will shift.
Full record
- Source
- mode-mechanism discrimination, cross-gen. framing effects/023b/025/033 (prior replications establishing the mode-dependent effect). Cross-gen: xAI + Gemini, N=10, direction replicates.
- Context
- Two projects at the same time. Same approach. One worked brilliantly. The other kept producing organized garbage.
- Measurement Basis
- PROGRAMMATIC (specificity markers from clean 2×2) + LLM-JUDGED (original cross-evaluator quality scores, 0/5 human agreement on holistic quality). Direction claims rest on programmatic measurement. Magnitude claims are LLM-calibrated.
- AI Dimension
- STRUCTURAL.
- Status
- EVIDENCE TRANSMISSION
Next in The "It Depends" Problem
The Most-Cited Finding Was WrongNew findings when they land.
No spam. Just what held up.