Hello, for my usecase I need very low latencies with llama-3.1-sonar-huge-128k-online but currently the model is very slow to respond in streaming requests. The 70b would already be enough with speedness but is it not comparable to the state of the art models like gpt4o
Hey, we know that sonar-huge is rather slow. I do recommend using sonar-large instead especially if latency is a priority for you; hopefully answer quality does not substantially degrade.
You can expect sonar-huge to remain around the current speed for the foreseeable future.