We’re currently using the llama-3.1-sonar-small-128k-online
model, which is set to be deprecated on 2025-02-22. We’re moving over to the new sonar
model but noticing it is significantly slower than the legacy small model - it seems to have similar speed to the legacy large model.
For our use case speed it rather critical… are there any plans to release a smaller, faster variation of the sonar
model?
I also ask for the mini model when I test the answers in the current Sonar, for my language (Polish) the better answers were in the older model than in Sonar and it actually works much faster, when I have a csv file with 150 queries the earlier model worked faster with the current one it takes a little too long to do so
+1 for this feature request, real need for a 7/8B parameter model for low latency responses, or the existing sonar model inference significantly increased.