Pereplexity Deep Research Report Cut-Off Midway

:bug: Describe the Bug

Setting max_tokens to 125,000 results in the response being cut off at ~11.7k tokens mid way through the report with the following final response from the API. The report is cut-off mid sentence (full output not included below):

{"finalResult": {"id": "chatcmpl_1760422639028","model": "sonar-deep-research",
"usage": {"prompt_tokens": 2370,"completion_tokens": 11669, "total_tokens": 14039,"citation_tokens": 50188,"num_search_queries": 30,"reasoning_tokens": 284307,"cost": {"input_tokens_cost": 0.005,"output_tokens_cost": 0.093,"citation_tokens_cost": 0.1,"reasoning_tokens_cost": 0.853,"search_queries_cost": 0.15,"total_cost": 1.201}}},"responseText": {"length": 53460,"preview": "\n\n# Taiwan's Semiconductor Industry: A Comprehensive Analysis of Political History, Economic Dominance, and Geopolitical Risk\n\nTaiwan stands at the intersection of global technology and great power po...","fullText": "\n\n# Taiwan's Semiconductor Industry: A Comprehensive Analysis of Political History, Economic Dominance, and Geopolitical Risk\n\nTaiwan stands at the intersection of global technology and great power politics, embodying one of the most complex and consequential situations in contemporary international relations. The island democracy...

:white_check_mark: Expected Behavior

Final answer to be completed and not cut off mid-way at significantly below the max_tokens set.

:cross_mark: Actual Behavior

Final text cut off at 11.7k tokens despite max_tokens being at 125k tokens.

:counterclockwise_arrows_button: Steps to Reproduce

  1. Call the API with the following request: 125,000 max_tokens, sonar-deep-research model asking for a report about any detailed topic.
  2. Observe the unexpected behavior.

:pushpin: API Request & Response (if applicable)

:globe_showing_europe_africa: Environment

  • API Version: sonar-deep-research
  • SDK (if applicable): Node.JS
  • Operating System: N/A

Setting a very high max_tokens value doesn’t guarantee the model will actually generate that many tokens — it just defines the upper limit. The model decides when to stop based on internal factors like completion quality, confidence, and context length.

In other words, even if you set max_tokens to 125,000, the model may stop earlier if it determines the response is complete or has reached its internal generation boundary.

It stops midway with an incomplete answer - so the model should have continued

could you share an example request (including model, parameters, and payload) so we can test this on our end?