We’re releasing two embedding model families today: pplx-embed-v1 and pplx-embed-context-v1.
These are built specifically for real-world, web-scale retrieval — the same kind of retrieval that powers Perplexity search. Open weights under MIT license on Hugging Face, and available now via the API.
Models
| Model | Dimensions | Price ($/1M tokens) | Best For |
|---|---|---|---|
pplx-embed-v1-0.6b |
1024 | $0.004 | Cost-efficient semantic search |
pplx-embed-v1-4b |
2560 | $0.030 | High-accuracy retrieval |
pplx-embed-context-v1-0.6b |
1024 | $0.008 | RAG with passage disambiguation |
pplx-embed-context-v1-4b |
2560 | $0.050 | Production RAG pipelines |
Standard Embeddings
from perplexity import Perplexity
client = Perplexity()
response = client.embeddings.create(
model="pplx-embed-v1-4b",
input=["Hello world", "Another text"]
)
print(response.data[0].embedding)
Contextualized Embeddings (for RAG)
The pplx-embed-context models disambiguate passages using surrounding document context — meaning chunks that share ambiguous terms get distinct embeddings based on their actual meaning in context.
response = client.embeddings.create(
model="pplx-embed-context-v1-4b",
input=[
["Doc A chunk 1", "Doc A chunk 2"],
["Doc B chunk 1", "Doc B chunk 2"]
]
)
Technical Highlights
- Built on diffusion-pretrained Qwen3 architecture with bidirectional attention
- No instruction prefix required — just pass your text
- MRL dimension reduction, INT8/BINARY quantization supported
- 32K token context window
- 81.96% on ConTEB benchmark (vs Voyage voyage-context-3 at 79.45%)