Stream responses are 5s slow to get the first token

Hello, for my usecase I need very low latencies with llama-3.1-sonar-huge-128k-online but currently the model is very slow to respond in streaming requests. The 70b would already be enough with speedness but is it not comparable to the state of the art models like gpt4o

Hey, we know that sonar-huge is rather slow. I do recommend using sonar-large instead especially if latency is a priority for you; hopefully answer quality does not substantially degrade.
You can expect sonar-huge to remain around the current speed for the foreseeable future.