Pereplexity Deep Research Report Cut-Off Midway

SuitToSweats · October 14, 2025, 6:25am

Describe the Bug

Setting max_tokens to 125,000 results in the response being cut off at ~11.7k tokens mid way through the report with the following final response from the API. The report is cut-off mid sentence (full output not included below):

{"finalResult": {"id": "chatcmpl_1760422639028","model": "sonar-deep-research",
"usage": {"prompt_tokens": 2370,"completion_tokens": 11669, "total_tokens": 14039,"citation_tokens": 50188,"num_search_queries": 30,"reasoning_tokens": 284307,"cost": {"input_tokens_cost": 0.005,"output_tokens_cost": 0.093,"citation_tokens_cost": 0.1,"reasoning_tokens_cost": 0.853,"search_queries_cost": 0.15,"total_cost": 1.201}}},"responseText": {"length": 53460,"preview": "\n\n# Taiwan's Semiconductor Industry: A Comprehensive Analysis of Political History, Economic Dominance, and Geopolitical Risk\n\nTaiwan stands at the intersection of global technology and great power po...","fullText": "\n\n# Taiwan's Semiconductor Industry: A Comprehensive Analysis of Political History, Economic Dominance, and Geopolitical Risk\n\nTaiwan stands at the intersection of global technology and great power politics, embodying one of the most complex and consequential situations in contemporary international relations. The island democracy...

Expected Behavior

Final answer to be completed and not cut off mid-way at significantly below the max_tokens set.

Actual Behavior

Final text cut off at 11.7k tokens despite max_tokens being at 125k tokens.

Steps to Reproduce

Call the API with the following request: 125,000 max_tokens, sonar-deep-research model asking for a report about any detailed topic.
Observe the unexpected behavior.

API Request & Response (if applicable)

Environment

API Version: sonar-deep-research
SDK (if applicable): Node.JS
Operating System: N/A

Kesku · October 22, 2025, 2:21am

Setting a very high max_tokens value doesn’t guarantee the model will actually generate that many tokens — it just defines the upper limit. The model decides when to stop based on internal factors like completion quality, confidence, and context length.

In other words, even if you set max_tokens to 125,000, the model may stop earlier if it determines the response is complete or has reached its internal generation boundary.

SuitToSweats · October 22, 2025, 2:59am

It stops midway with an incomplete answer - so the model should have continued

Kesku · October 22, 2025, 3:10am

could you share an example request (including model, parameters, and payload) so we can test this on our end?

Topic		Replies	Views
Deep Research API General sonar-deep-research	0	34	January 9, 2026
Deep Research API response content is cut off unexpectedly while having finish reason "stop" Bug Reports	13	523	October 30, 2025
Document analysis General sonar-deep-research	2	178	August 5, 2025
New Async Mode for Sonar Deep Research! Announcements	0	305	June 1, 2025
Not having internal COT <Think> in output of reasoning models General sonar-reasoning-pro	1	100	November 3, 2025