Issue Description
The Perplexity AI API is not returning multiple responses when the top_k parameter is set to a value greater than 1 and the top_p parameter is close to 1. This behavior is inconsistent with the expected functionality of these parameters.
Steps to Reproduce
- Set up a request to the Perplexity AI API endpoint (https://api.perplexity.ai/chat/completions).
- Configure the request payload with the following parameters:
- model: Any available model (tested with all models)
- messages: A list of message objects (system and user messages)
- max_tokens: Set to a small value (e.g., 1) for testing
- temperature: Set to 1 (or any other value)
- top_k: Set to a value greater than 1 (e.g., 5)
- top_p: Not explicitly set, but assumed to be close to 1 by default
- Send the request to the API.
Example python Code
import requests
url = "https://api.perplexity.ai/chat/completions"
payload = {
"model": "llama-3.1-sonar-large-128k-chat",
"messages": [
{
"content": """You are a personal assistant classifying emails.
1. Meeting: proposing a meeting date or modifying it
2. Important. Important emails that need to be dealt urgently
3. Spam: propomotional emails from recruiters, vendors, consultants, etc.
Otherwise return 0 (you are not sure or no categories are matching).
Answer only using numbers, letters or other characters are not allowed""",
"role": "system"
},
{
"content": "Hello, I saw you are hiring. I have a good candidate for you, interested in knowing more?",
"role": "user"
}
],
"max_tokens": 1,
"temperature": 1,
"top_k": 5
}
headers = {
"Authorization": "Bearer YOUR_API_KEY_HERE",
"Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)
Expected Behavior
When top_k is set to a value greater than 1, the API should return multiple choices in the response, representing different possible completions.
Actual Behavior
The API consistently returns only a single choice in the response, regardless of the top_k value. The response structure looks like this:
json
{
"id": "a8db132d-fd49-488a-9eee-53d8f2f27ff1",
"model": "llama-3.1-sonar-large-128k-chat",
"created": 1727552222,
"usage": {
"prompt_tokens": 124,
"completion_tokens": 1,
"total_tokens": 125
},
"object": "chat.completion",
"choices": [
{
"index": 0,
"finish_reason": "length",
"message": {
"role": "assistant",
"content": "3"
},
"delta": {
"role": "assistant",
"content": ""
}
}
]
}
Additional Information
This behavior has been observed across all available models.
The top_p parameter was not explicitly set in the request, but it’s assumed to be close to 1 by default.
Questions
- Is this behavior intentional, or is it a bug in the API implementation?
- If it’s intentional, could you provide documentation or guidance on how to obtain multiple responses from the API?
- If it’s a bug, what is the expected timeline for a fix?
Impact
This issue impacts applications that rely on generating multiple alternative responses from the Perplexity AI API, limiting the diversity of outputs and potentially affecting the quality of downstream tasks that depend on varied AI-generated content.