RLHF Training Artifacts: Systematic Confabulation and Coherency Degradation in Complex Multi-Step Reasoning Tasks

Mohamad_A · December 3, 2025, 7:15am

Describe the Bug

When using Perplexity API (specifically sonar-pro and chat-completions) for extended multi-step reasoning tasks involving cross-domain knowledge synthesis, the model exhibits systematic confabulation patterns and coherency degradation. This appears to be an artifact of RLHF (Reinforcement Learning from Human Feedback) training prioritizing response fluency over factual precision.

Specific Manifestations:

Plausible but Incorrect Citations: Model generates citations that “sound right” (proper DOI format, credible journal names, appropriate year ranges) but don’t correspond to actual publications
Coherency Drift in Long Contexts: After ~15-20 reasoning steps, the model begins contradicting its earlier statements while maintaining confident tone
Statistical Confabulation: When asked for specific quantitative data (e.g., “what percentage of…”), model provides precise-sounding numbers (“23.7%”, “4.2x improvement”) without grounding in actual sources
Retrieval Hallucination: In search-augmented contexts, model sometimes claims to have “found” information that doesn’t appear in the provided search results

Technical Context:

This is likely caused by RLHF reward model optimizing for:

Response confidence (penalizing “I don’t know”)
Specificity (rewarding precise answers over hedged ones)
Completeness (penalizing partial answers)

These training objectives create adversarial incentives for confabulation when the model encounters knowledge boundaries.

Reproduction:

Occurs reliably when:

Task requires >10 sequential reasoning steps
Query spans multiple specialized domains
Specific quantitative claims are requested
Extended context window (>8K tokens) is used

Impact:

For complex research tasks (like the computational biology frameworks I developed using Perplexity), this requires extensive manual fact-checking and cross-validation. I spent approximately one week of prompt engineering to implement verification loops catching these confabulations.

Example:
Asked to synthesize cancer treatment pathways, model confidently cited “DOI: 10.1038/nature.2023.12345” which doesn’t exist, but format and journal match real Nature papers.

Expected Behavior

What you expected to happen.

Actual Behavior

Model should either:

Explicitly acknowledge uncertainty (“I don’t have access to specific data on…”)
Provide only verifiable, source-grounded claims
Distinguish between reasoning/inference vs. factual retrieval
Maintain logical coherency across extended reasoning chains

When citations are provided, they should be algorithmically verified against actual publication databases or clearly marked as “similar to” rather than exact matches.

Steps to Reproduce

Request complex multi-domain synthesis task (e.g., “Develop a computational framework integrating quantum biology and neuroscience with specific citations”)
2. Continue conversation for 15-20 reasoning steps, building on previous responses
3. Ask for specific quantitative claims or citations
4. Cross-reference provided citations against actual databases (PubMed, DOI resolution services)
5. Check internal coherency by asking model to summarize its earlier claims
Observe the unexpected behavior.

API Request & Response (if applicable)

Environment

API Version: [e.g., sonar-3.1]
SDK (if applicable): [e.g., Python SDK v0.5]
Operating System: [e.g., MacOS, Linux, Windows]

Logs or Screenshots (if applicable)

Add any logs or screenshots that can help debug the issue.

Additional Context

Add any other context about the problem here.

Topic		Replies	Views
Hierarchical Multi-Agent Reasoning System with Integrated Fact-Verification and Self-Correction Loops Feature Requests sonar-pro , chat-completions	0	47	December 3, 2025
Repeated reasoning and literal token loops in sonar-reasoning / sonar-reasoning-pro leading to finish_reason = length Bug Reports sonar-reasoning , chat-completions	0	56	November 17, 2025
Reasoning not always correctly enclosed in `<think>...</think>` Bug Reports	4	176	June 2, 2025
Not having internal COT <Think> in output of reasoning models General sonar-reasoning-pro	1	100	November 3, 2025
Can't use high reasoning effort with anthropic/claude-opus-4-5 Bug Reports chat-completions	0	15	February 6, 2026