Async chat completion stuck in IN_PROGRESS for hours while sync returns successfully

We are observing inconsistent and unexpected behavior between sync and async chat completion modes for the same query and model.

  • In sync mode, the request completes successfully and returns the expected response in approximately 20 seconds.

  • In async mode, the same request remains in IN_PROGRESS status for an unusually long duration (2–3 hours), without failure or intermediate status updates.

  • When the async request finally completes, the timestamps indicate that the actual execution time was approximately 1 minute, suggesting that the job was queued or stalled for hours before being executed.


Details

  • Model: sonar-deep-research

  • Query: Same exact user query used in both sync and async modes

Sync Mode Result

  • Status: COMPLETED

  • Response time: ~20 seconds

Async Mode Result

  • Status: IN_PROGRESS for ~2–3 hours

  • No failure or error reported during this period

  • Eventually transitions to COMPLETED


Key Observation (Important)

From the async job metadata:

"created_at": 1768991705,
"started_at": 1768991705,
"completed_at": 1768991768

This indicates:

  • The async job completed in ~63 seconds

  • However, the client observed the job in IN_PROGRESS state for multiple hours before this completion was surfaced

This suggests the async job may have been:

  • Queued or delayed for an extended period before actual execution, or

  • Executed earlier but completion status was not updated or returned promptly


Async Job State During Delay

During the extended delay:

  • status: IN_PROGRESS

  • completed_at: null

  • failed_at: null

  • error_message: null

No indication was provided that the job was queued, paused, or delayed.

This is happening to me too.

It seems to happen if I poll before it completes..

It’s killing my app in production which is incredibly frustrating to happen from a larger platform like this.. Please can someone look at this! I’m happy to provide specific details.

The solution I found is to use cache=false in the polling API. With this, it returns a successfully completed response when the task finishes. This appears to be a caching issue in the Perplexity API.