API: Unable to use the "llama-3.1-sonar-huge-128k-online" model

Summary

I’m having trouble calling the “llama-3.1-sonar-huge-128k-online” model.
When I make the call, I see that the request is being made. It takes about 60 seconds, then returns an error.
Note: Using the “llama-3.1-sonar-large-128k-online” model works fine!

Is anyone experiencing something similar ?

:cross_mark: Output of the huge-online model:

INPUT TOKENS: 3428

Error querying Perplexity API: SyntaxError: Unexpected token '<', "<html clas"... is not valid JSON
    at JSON.parse (<anonymous>)
    at parseJSONFromBytes (node:internal/deps/undici/undici:5329:19)
    at successSteps (node:internal/deps/undici/undici:5300:27)
    at fullyReadBody (node:internal/deps/undici/undici:1447:9)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async specConsumeBody (node:internal/deps/undici/undici:5309:7)
SyntaxError: Unexpected token '<', "<html clas"... is not valid JSON
    at JSON.parse (<anonymous>)
    at parseJSONFromBytes (node:internal/deps/undici/undici:5329:19)
    at successSteps (node:internal/deps/undici/undici:5300:27)
    at fullyReadBody (node:internal/deps/undici/undici:1447:9)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async specConsumeBody (node:internal/deps/undici/undici:5309:7)
error Command failed with exit code 1.

:white_check_mark: Output of the large-online model:

INPUT TOKENS:  3447
OUTPUT TOKENS:  1273
TOTAL TOKENS:  4720
citations:
  [...]
usage:
 { prompt_tokens: 3576, completion_tokens: 1306, total_tokens: 4882 }
{...response...}

Code:

enum PerplexityModel {
  SONAR_SMALL_ONLINE = "llama-3.1-sonar-small-128k-online",
  SONAR_LARGE_ONLINE = "llama-3.1-sonar-large-128k-online",
  SONAR_HUGE_ONLINE = "llama-3.1-sonar-huge-128k-online",
}

export async function callPerplexity(
  primer: string,
  query: string
): Promise<string> {
  const messages = [
    { role: "system", content: "You are a helpful researcher." },
    { role: "user", content: primer },
    { role: "assistant", content: "Understood. I'm ready to research." },
    { role: "user", content: query },
  ];

  const inputTokens = getTokenCount(messages);
  console.log("INPUT TOKENS: ", inputTokens);

  const options = {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.PERPLEXITY_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      //   model: PerplexityModel.SONAR_LARGE_ONLINE, // works fine
      model: PerplexityModel.SONAR_HUGE_ONLINE,
      messages,
      temperature: 0.2,
      top_p: 0.9,
      top_k: 0,
      stream: false,
      presence_penalty: 0,
      frequency_penalty: 0.3,
    }),
  };

  try {
    const response = await fetch("https://api.perplexity.ai/chat/completions", options);
    const data = await response.json();

    const textOutput = data.choices[0].message.content;
    const citations = data.citations;
    const usage = data.usage;

    const outputTokens = getTokenCount(textOutput);
    console.log("OUTPUT TOKENS: ", outputTokens);
    console.log("TOTAL TOKENS: ", inputTokens + outputTokens);

    console.log("citations:\n", citations);
    console.log("usage:\n", usage);

    return textOutput;
  } catch (error) {
    console.error("Error querying Perplexity API:", error);
    throw error;
  }
}

Thanks for flagging this. We will take a look.