API not returning multiple responses with top_k > 1 and top_p ≈ 1

Issue Description
The Perplexity AI API is not returning multiple responses when the top_k parameter is set to a value greater than 1 and the top_p parameter is close to 1. This behavior is inconsistent with the expected functionality of these parameters.

Steps to Reproduce

  1. Set up a request to the Perplexity AI API endpoint (https://api.perplexity.ai/chat/completions).
  2. Configure the request payload with the following parameters:
    • model: Any available model (tested with all models)
    • messages: A list of message objects (system and user messages)
    • max_tokens: Set to a small value (e.g., 1) for testing
    • temperature: Set to 1 (or any other value)
    • top_k: Set to a value greater than 1 (e.g., 5)
    • top_p: Not explicitly set, but assumed to be close to 1 by default
  3. Send the request to the API.

Example python Code

import requests

url = "https://api.perplexity.ai/chat/completions"

payload = {
    "model": "llama-3.1-sonar-large-128k-chat",
    "messages": [
        {
            "content": """You are a personal assistant classifying emails.  
            1. Meeting: proposing a meeting date or modifying it
            2. Important. Important emails that need to be dealt urgently
            3. Spam: propomotional emails from recruiters, vendors, consultants, etc. 
            Otherwise return 0 (you are not sure or no categories are matching).
            Answer only using numbers, letters or other characters are not allowed""",
            "role": "system"
        },
        {
            "content": "Hello, I saw you are hiring. I have a good candidate for you, interested in knowing more?",
            "role": "user"
        }
    ],
    "max_tokens": 1,
    "temperature": 1,
    "top_k": 5
}
headers = {
    "Authorization": "Bearer YOUR_API_KEY_HERE",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Expected Behavior
When top_k is set to a value greater than 1, the API should return multiple choices in the response, representing different possible completions.

Actual Behavior
The API consistently returns only a single choice in the response, regardless of the top_k value. The response structure looks like this:
json

{
  "id": "a8db132d-fd49-488a-9eee-53d8f2f27ff1",
  "model": "llama-3.1-sonar-large-128k-chat",
  "created": 1727552222,
  "usage": {
    "prompt_tokens": 124,
    "completion_tokens": 1,
    "total_tokens": 125
  },
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "length",
      "message": {
        "role": "assistant",
        "content": "3"
      },
      "delta": {
        "role": "assistant",
        "content": ""
      }
    }
  ]
}

Additional Information
This behavior has been observed across all available models.
The top_p parameter was not explicitly set in the request, but it’s assumed to be close to 1 by default.

Questions

  1. Is this behavior intentional, or is it a bug in the API implementation?
  2. If it’s intentional, could you provide documentation or guidance on how to obtain multiple responses from the API?
  3. If it’s a bug, what is the expected timeline for a fix?

Impact
This issue impacts applications that rely on generating multiple alternative responses from the Perplexity AI API, limiting the diversity of outputs and potentially affecting the quality of downstream tasks that depend on varied AI-generated content.

Hi, you’re misunderstanding the meaning of Top K, it doesn’t have anything to do with returning multiple choices. See https://www.perplexity.ai/search/how-is-top-k-filtering-differe-W32IKed7SpWojTugdrrx0g
One API request always results in 1 response today generally