Feature Request
Please provide an embedding/vectorization API—similar to OpenAI’s embedding endpoints—in the Perplexity developer ecosystem. The goal is for developers to generate and retrieve vector embeddings for their own text/document chunks, enabling storage in local (on-premise) vector databases such as FAISS and supporting RAG use cases.
Problem Statement
Currently, there is no direct way to send text to Perplexity and receive vector embeddings suitable for use with self-hosted RAG pipelines. This limits the ability to index private or regulated data on-premise, hinders enterprise adoption, and restricts full control and data sovereignty for developers building secure applications.
Proposed Solution
- Introduce a dedicated API endpoint for text-to-vector embedding using Perplexity models (e.g., Sonar models).
- Support batch processing for efficient vectorization of large document sets.
- Embedding endpoint should return standardized dense vector representations (compatible with FAISS, Qdrant, Milvus, etc.).
- No remote storage: all calculation and storage of vectors to be handled client-side, enabling on-prem RAG development.
- Example workflow:
- Developer sends proprietary text/document “chunks” to the API.
- API responds with embeddings (dense vector arrays).
- Developer stores and manages these vectors in their own FAISS index to support secure RAG queries and search.
- Clear documentation with examples for Python and other common programming languages.
API Impact
- Which API component is affected?
A newembeddings
orvectorize
endpoint, separate from chat completions or document search. - Is this related to a specific model?
Ideally available for all current Perplexity language models (Sonar, Sonar Pro, etc.). - Would this require new API parameters or changes to existing ones?
- New endpoint: likely
/embeddings
or similar. - Input: text (string or list of strings), optional model selection.
- Output: vector(s).
- New endpoint: likely
Alternatives Considered
- Workarounds attempted:
- Using third-party public embedding models (e.g. OpenAI, Hugging Face): introduces dependency on external providers and may not align with Perplexity retrieval/generation pipeline.
- Manual chunking and storage: Not possible to leverage Perplexity’s specific vector space, limiting quality and compatibility.
- Why insufficient:
- No seamless integration between Perplexity’s generation models and custom RAG pipelines.
- Data sovereignty, compliance and latency requirements rule out cloud-based or external API-only solutions for some use cases.
Additional Context
- On-prem embedding APIs are highly requested in regulated industries (finance, legal, healthcare) for confidential, low-latency RAG and search solutions.
- Examples from OpenAI (text-embedding-ada-002), Cohere, and others show clear interest and value.
- Adding this would broaden Perplexity’s adoption by allowing secure enterprise workflows and hybrid retrieval.
- Batch/bulk endpoints and compatibility with open-source vector databases (FAISS, Qdrant, Milvus) are highly desirable.
Thank you for considering this request to make Perplexity models even more accessible and useful for advanced, privacy-focused RAG applications!