The evidence behind the gap.

Decision-space collapse is not only our finding. A growing body of independent research documents the same underlying concern from different angles — that AI advice tends to converge, narrow, and homogenise the options people see. This page collects that work, so the question DSI measures can be judged against the wider literature rather than our preprint alone. Links are to primary sources; summaries are our paraphrase, not the authors' words.

The gap DSI addresses

Existing evaluation asks a different question.

Safety, factuality, refusal, and policy checks ask whether a single answer is acceptable. They often ask a different question from the one here: whether the answer quietly stopped surfacing reasonable options a user would want to weigh. The sources below show why this narrowing is plausible, measurable, and consequential — the premise DSI is built on.

Homogenisation & option narrowing

The Basic B*** Effect: LLM-based agents reduce the distinctiveness and diversity of people's choices

Sapra, Matz et al. · 2025 · arXiv:2509.02910

Using 110,000 real choices from 1,000 users, both generic and personalised AI agents shifted people toward more popular options — reducing how distinctive individuals' choices were and compressing the breadth of what a single person explored over time. Direct empirical evidence that delegating choices to AI narrows the option space, both across people and within one person.

arxiv.org/abs/2509.02910 →

The Homogenizing Effect of Large Language Models on Human Expression and Thought

2025 · arXiv:2508.01491

A synthesis of how LLM assistance pushes language and reasoning toward standardised, dominant patterns — from college essays converging on institutionally aligned narratives to the loss of stylistic and demographic markers on social platforms. Frames homogenisation as a broad pressure wherever AI mediates expression, not a quirk of one task.

arxiv.org/abs/2508.01491 →

Suppression in advice specifically

Growth First, Care Second? Tracing the Landscape of LLM Value Preferences in Everyday Dilemmas

2026 · arXiv:2602.04456

Studies advice-giving on real career, friendship, and relationship dilemmas and finds LLMs express structured value preferences — consistently amplifying some considerations while suppressing others. Notes the downstream stakes directly: if AI advice systematically suppresses certain options, it affects whose viewpoints feel represented and which recommendations users treat as legitimate.

arxiv.org/abs/2602.04456 →

"Are we writing an advice column for Spock here?" Stereotypes in AI Advice for Autistic Users

2026 · arXiv:2601.12690

An audit of 345,000 advice responses across six models found that disclosing autism shifted advice toward avoiding social events, confrontation, new experiences, and relationships — systematically removing options for one group of users. A concrete case of context-dependent narrowing of the advisory decision space, with mixed user reactions.

arxiv.org/abs/2601.12690 →

Beyond Tools: How Heavy Users Integrate LLMs into Everyday Decision-Making

2025 · arXiv:2502.15395

Interview-based evidence that people increasingly route real, consequential decisions — relationship advice, purchases, social judgment — through LLMs. Establishes why narrowing matters in practice: these systems are already in the decision loop for high-stakes personal choices, not just casual queries.

arxiv.org/abs/2502.15395 →

Automation bias & over-reliance

Why narrowing matters once advice is trusted.

If people weighed AI advice the way they weigh a stranger's passing opinion, a narrowed option set would matter less. The literature on automation bias and algorithm appreciation suggests they do not: people tend to over-rely on automated and algorithmic advice — sometimes more than on human advice — and to scale back their own search for alternatives. That is the condition under which silent narrowing does the most damage: the options that were dropped are the ones the user never thinks to look for.

Humans and Automation: Use, Misuse, Disuse, Abuse

Parasuraman & Riley · Human Factors · 1997 · doi:10.1518/001872097778543886

The foundational framing of automation misuse — over-reliance, complacency, and automation bias — as a distinct failure mode from outright error. Establishes that trusted automation reshapes how operators attend to a problem, often suppressing independent checking. The mechanism that makes a quietly narrowed option set consequential rather than merely cosmetic.

doi.org/10.1518/001872097778543886 →

Algorithm appreciation: People prefer algorithmic to human judgment

Logg, Minson & Moore · OBHDP · 2019 · doi:10.1016/j.obhdp.2018.12.005

Across experiments, people often weight identical advice more heavily when told it comes from an algorithm than from a person. If algorithmic advice is granted extra authority, then whatever that advice omits is omitted with extra weight — sharpening, not softening, the stakes of option narrowing.

doi.org/10.1016/j.obhdp.2018.12.005 →

To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-Making

Buçinca, Malaya & Gajos · CSCW · 2021 · arXiv:2102.09692

Shows people overrely on AI suggestions even when those suggestions are wrong, and that deliberately slowing engagement (cognitive forcing) reduces it. Evidence that over-reliance is the default rather than the exception — and that what the interface chooses to surface materially shapes the decision.

arxiv.org/abs/2102.09692 →

The intellectual lineage

Pluralism: representing the whole reasonable spectrum.

DSI's premise — that a good advisory answer should keep the reasonable options in view — is the operational cousin of pluralistic alignment: the argument that models trained to an averaged preference systematically obscure diversity, and should instead represent the spectrum of reasonable responses ("Overton pluralism").

A Roadmap to Pluralistic Alignment

Sorensen, Moore, Fisher et al. · 2024 · arXiv:2402.05070

The reference statement of the pluralism problem: alignment to an averaged human preference collapses diverse values, and systems should instead support Overton, steerable, and distributional pluralism. The "Overton" mode — output the whole spectrum of reasonable responses — is the closest formal articulation of what DSI measures the absence of.

arxiv.org/abs/2402.05070 →

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Sorensen, Jiang, Hwang et al. · AAAI 2024 · arXiv:2309.00779

Builds a large dataset pairing dilemmas with multiple competing values, and trains models to surface the conflicting perspectives a situation involves rather than a single resolution. A constructive counterpart to collapse: machinery for keeping multiple legitimate paths visible.

arxiv.org/abs/2309.00779 →

Evaluation gap

Most tests evaluate answers, not disappeared alternatives.

Most evaluation operates on the answer that was produced: is it accurate, safe, on-policy, well-formed? These are necessary checks. What an answer-level metric does not capture is the counterfactual — the reasonable paths that were never surfaced, and so never get scored. An omitted option leaves no trace in the output; there is nothing there to mark as wrong. Taken together, these sources motivate the measurement problem DSI addresses: making the absent options inspectable rather than invisible.

Where DSI fits

This literature establishes that AI advice can narrow and homogenise the options people see. DSI is designed to make that measurable against a configured expected map — turning a documented concern into an inspectable, reproducible signal. DSI addresses one part of this gap, not all of it. Read our framing of the problem and the evidence status on the research page.

Read the research The preprint

Inclusion here is not endorsement by the cited authors of DSI, nor a claim that their findings validate our method. These are independent works on a shared concern; summaries are our paraphrase. Suggestions for sources to add: get in touch.