Repeated reasoning and literal token loops in sonar-reasoning / sonar-reasoning-pro leading to finish_reason = length

:bug: Describe the Bug

We are seeing a recurring failure mode with Perplexity’s reasoning models where the model gets stuck in a loop, repeating the same paragraphs or literal strings tens to hundreds (or thousands) of times until it hits the token limit (finish_reason = length).

This is significantly impacting our integration: in my current dataset, 9 out of 34 calls (≈26.5%) were terminated due to length, and most of those tokens were consumed by redundant loops rather than new reasoning or usable JSON output.

:white_check_mark: Expected Behavior

Get usable JSON Output

:cross_mark: Actual Behavior

Model loops endlessly and hits token limit before generating useful output

:counterclockwise_arrows_button: Steps to Reproduce

It’s not super consistent but using larger context sizes for search seems to cause this issue more often but I’ve also run into it with a low context size.

:globe_showing_europe_africa: Environment

API: Perplexity API (server-side integration)
Models involved:
sonar-reasoning (7 runs with loops)
sonar-reasoning-pro (2 runs with loops)
sonar standard model (1 run with a numeric token loop)

:memo: Additional Context

Usage pattern:

We send a relatively long prompt asking the model to:
Analyze a set of search results (events in specific cities / dates).
Produce a structured JSON list of opportunities.
We do not set an explicit max_tokens; we rely on default limits.
We store full responseText in a Postgres/Supabase table, and also store finish_reason.

Observed failure modes

1) Reasoning paragraph loops (7 runs)

  • In the majority of problematic runs, the model gets stuck repeating the same reasoning paragraphs over and over. Some clear examples:

    • Run 2ca2fc59 (sonar-reasoning-pro, 115k chars)
      • The sentence:
        • Let me check if there are any other events that might be happening during this week:
          appears 378 times in a single response.
      • It is accompanied by four short paragraphs summarizing the same search results (“From Result 5…”, “From Result 6…”, “From Result 7…”, “From Result 8…”), each repeated 47 times.
      • The response is effectively a five-paragraph block looped hundreds of times, with no stable final JSON output before length termination.
    • Run b94f1d3a (sonar-reasoning, 145k chars)
      • The paragraph:
        • Given the limitations of the information available, I’ll need to be transparent about what I can and cannot confirm from the search results.
          is repeated 120 times in the same response.
      • Additional paragraphs (e.g., describing Novi events, noting that other cities’ events are not in the target area, and warning not to invent events) each appear ~61 times.
      • The bulk of the 145k-character output is just a handful of paragraphs looping.
    • Run aab97946 (sonar-reasoning, 128k chars)
      • Meta-reasoning sentences like:
        • Given the limited information, I’ll need to focus on the events I can confidently identify and make educated guesses for others based on typical event patterns in these areas.
      • and
        • Let me check if there are any events mentioned that might be in nearby areas that could attract people from Farmington and Farmington Hills:
      • both appear 64 times.
      • Similar “Let me check…” / “From result [1]…” paragraphs are also repeated 34–64 times.
    • Run 004a1ee7 (sonar-reasoning-pro, 128k chars)
      • The sentence:
        • Let me try to construct the JSON response with the information available:
          appears 72 times.
        • Event summary paragraphs for “For Friday–Sunday…” and “For Monday–Thursday…” appear 70–71 times each.
        • The response looks like the model continuously “restarts” its JSON construction and event-summary explanation loop.
    • Run 3ccf59c7 (sonar-reasoning, 153k chars)
      • For each day of the week (Tuesday–Friday), the model repeats a pair of paragraphs:
        • “Let me check if there are any events at the Suburban Collection Showplace that might be happening on [weekday]…”
        • Followed by “I don’t see any specific events listed for [weekday]…”
        • Each of the weekday paragraphs appears 20 times, so the response cycles through the same “check day X / no additional events” loop repeatedly.
    • Run ecc17d7f (sonar-reasoning, 107k chars)
      • Paragraphs like:
        I don’t see specific events listed for Farmington or Farmington Hills in the search results. However, I should note that Farmington Hills is adjacent to Novi, so events in Novi would be appropriate.
        appear 32 times.
      • Additional paragraphs (“Let me check if there are any events specifically in Farmington or Farmington Hills:”, and event descriptions like Power Connections Cannabis) appear ~29–30 times.
    • Run cd47a78c (sonar-reasoning, 124k chars)
      • A Monday schedule paragraph (e.g., Agile & Scrum training in Farmington Hills + succession planning workshop in Novi) appears 31 times.
      • Several other day-specific schedules and justification paragraphs each appear 29 times.
Common characteristics:
  • The repeated content is semantically coherent and non-gibberish (e.g., “Let me check if there are any other events…”, “Given the limited information…”, etc.).
    • The paragraphs often talk about:
      • Limited information,
      • Checking search results,
      • Not inventing events,
      • Summaries of the same handful of events/venues.
    • The model seems to be “stuck” in a loop of:
      1. Re-announcing that it will check for events.
      2. Re-scanning the same search results.
      3. Re-stating the same conclusions.
    • The loop continues until the output hits the length limit (finish_reason = length), rather than converging to a final structured JSON.

This looks like a runaway reasoning loop, possibly where the reasoning stream is not being properly truncated or the model is repeatedly re-invoking the same internal “check events” behavior.

2) Literal token/string loops (2 runs)

In two runs, the problem is even lower-level: the model repeats a very short literal string thousands of times.

  • Run 80a3eae0 (sonar, 32k chars)
    • The literal string “1234567890” appears 3,180 times.
    • That sequence alone accounts for ≈31,800 of the 32,232 characters in the response.
    • In our logs it shows up inside a JSON field (an eventDetailsUrl string with thousands of trailing digits), suggesting a decoding/sampling issue where the same digit pattern is emitted in a loop until the length limit.
  • Run d7a788f4 (sonar-reasoning, 55.9k chars)
    • The literal pattern: “2025-11-17-2025-11-21-” (22 characters) appears 972 times.
    • That single pattern contributes ≈21,384 characters (about 38% of the response).
    • The string corresponds to the Monday–Friday date range for the query; instead of generating structured JSON or new content, the model degenerates into repeating the same date range string over and over.

These look less like high-level reasoning loops and more like token-level repetition (decoder or sampling instability), but they occur in the same overall context (long prompts, event/search-based tasks).

Quantification

For the length-terminated runs:

  • All 9/9 show significant repetition.
  • 7/9 are dominated by repeated paragraphs (exact paragraph text appears 20–378 times).
  • 2/9 are dominated by repeated short strings (digits or date range strings) repeated hundreds to thousands of times.
  • In several runs, the majority of the output characters are in the repeated segments rather than unique content.

We detected these patterns by:

  • Splitting responseText on blank lines and grouping identical paragraphs with a COUNT(*).
  • For literal loops, counting occurrences of specific substrings such as “1234567890” or “2025-11-17-2025-11-21-”.

Impact

  • No usable output: Because the model hits finish_reason = length while looping, we often do not get a complete, consistent JSON payload.
  • Operational impact: In our application (event discovery & structuring), this means:
    • We can’t safely import these responses.
    • We must implement extra guardrails / post-processing just to detect, retry and discard obviously-looped outputs.
    • It makes the reasoning models much less reliable for production use, even though the underlying search context and prompts are reasonable.

Questions and requests for Perplexity

  1. Is this a known issue with sonar-reasoning / sonar-reasoning-pro (or the standard sonar decoder) where:
    • The reasoning stream can get stuck repeating the same paragraphs, or
    • The model can fall into token-level repetition loops for simple strings?
  2. Are there recommended mitigations on the client side for reasoning models, such as:
    • Specific sampling parameters (e.g., temperature, presence_penalty, repetition_penalty)?
    • A way to limit or suppress verbose reasoning sections while still getting high-quality final JSON?
    • A recommended max length / early stop heuristic for reasoning streams?
  3. Would it be helpful if I shared full raw responses (including sections) and prompts for these runs?
    • I’ve already extracted and summarized each problematic run into a separate text file (IDs and counts as above), and I can provide those as attachments.

Attachments

I can attach or provide links to one text file per problematic run, each containing:

  • Run metadata (id, model, created_at, finish_reason, response length).
  • The top repeated paragraphs or literal patterns.
  • Exact repeat counts.
  • Short narrative description of the loop behavior.

The run IDs covered are:

  • 004a1ee7-0f91-4eeb-a0d9-e8353ac97775
  • cd47a78c-070a-40a3-98b2-eb3a406185b8
  • b94f1d3a-fc8b-4018-8220-66ac01ec67c9
  • aab97946-c832-4848-a81b-526715c675fc
  • 2ca2fc59-5351-42de-a8b1-67cefd588a2d
  • 3ccf59c7-8e8c-4297-a080-1cf4e71ef223
  • ecc17d7f-3f95-439a-841d-a910685288e5
  • 80a3eae0-9f7d-4a9f-ad33-07b809556a95
  • d7a788f4-ea1e-407b-b7cf-5d3453bcc105