Introducing pplx-embed — State-of-the-Art Embedding Models for Web-Scale Retrieval

We’re releasing two embedding model families today: pplx-embed-v1 and pplx-embed-context-v1.

These are built specifically for real-world, web-scale retrieval — the same kind of retrieval that powers Perplexity search. Open weights under MIT license on Hugging Face, and available now via the API.

Models

Model Dimensions Price ($/1M tokens) Best For
pplx-embed-v1-0.6b 1024 $0.004 Cost-efficient semantic search
pplx-embed-v1-4b 2560 $0.030 High-accuracy retrieval
pplx-embed-context-v1-0.6b 1024 $0.008 RAG with passage disambiguation
pplx-embed-context-v1-4b 2560 $0.050 Production RAG pipelines

Standard Embeddings

from perplexity import Perplexity

client = Perplexity()
response = client.embeddings.create(
    model="pplx-embed-v1-4b",
    input=["Hello world", "Another text"]
)
print(response.data[0].embedding)

Contextualized Embeddings (for RAG)

The pplx-embed-context models disambiguate passages using surrounding document context — meaning chunks that share ambiguous terms get distinct embeddings based on their actual meaning in context.

response = client.embeddings.create(
    model="pplx-embed-context-v1-4b",
    input=[
        ["Doc A chunk 1", "Doc A chunk 2"],
        ["Doc B chunk 1", "Doc B chunk 2"]
    ]
)

Technical Highlights

  • Built on diffusion-pretrained Qwen3 architecture with bidirectional attention
  • No instruction prefix required — just pass your text
  • MRL dimension reduction, INT8/BINARY quantization supported
  • 32K token context window
  • 81.96% on ConTEB benchmark (vs Voyage voyage-context-3 at 79.45%)

Get Started