Sentence Transformers

Name: Sentence Transformers
Author: orchestra-research

orchestra-research/ai-research-skills

401 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

sentence-transformers is a coding-agent skill that helps developers choose a sentence-transformers embedding model matching RAG latency, quality, and language requirements before connecting a vector store.

About

sentence-transformers is an orchestra-research/ai-research-skills model selection guide for RAG embedding backends. The skill compares all-MiniLM-L6-v2 at 384 dimensions and roughly 2000 sentences per second for prototyping, all-mpnet-base-v2 at 768 dimensions and roughly 600 sentences per second for production RAG, and all-roberta-large-v1 at 1024 dimensions and roughly 300 sentences per second for highest accuracy. It also covers paraphrase-multilingual-MiniLM-L12-v2 supporting 50+ languages at 384 dimensions. Developers reach for sentence-transformers when embedding choice—not vector database ops—is the bottleneck before ingestion into Chroma, Qdrant, or similar stores.

Tiered recommendations: all-MiniLM-L6-v2 (384-dim, ~2000 sentences/sec), mpnet-base-v2 for production RAG, roberta-large
Multilingual coverage: MiniLM/mpnet multilingual variants (50+ languages) and LaBSE (109 languages)
Domain-specific picks: SPECTER for papers, Legal-BERT for legal text, CodeBERT for code similarity
Selection matrix maps task → model with dimensions, speed, and quality tradeoffs

Sentence Transformers by the numbers

401 all-time installs (skills.sh)
+35 installs in the week ending Jul 18, 2026 (Skillselion tracking)
Ranked #1,932 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill sentence-transformers

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/sentence-transformers.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/sentence-transformers)

Installs	401
repo stars	★ 11.2k
Security audit	3 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

Which sentence-transformers model fits RAG latency and quality?

Choose a sentence-transformers embedding model that fits your RAG latency, quality, and language needs before you wire up a vector store.

Who is it for?

Developers sizing RAG embedding models who need concrete dimension, speed, and multilingual tradeoffs before vector store integration.

Skip if: Teams already committed to API-only embeddings like OpenAI text-embedding-3 without running local sentence-transformers models.

When should I use this skill?

A RAG pipeline needs embedding model selection balancing 384–1024 dimensions, throughput, and 50+ language coverage.

What you get

Chosen embedding model spec, dimension and throughput targets, and multilingual model selection for ingestion.

Selected embedding model
Dimension and throughput spec
Multilingual model choice

By the numbers

all-MiniLM-L6-v2: 384 dimensions, ~2000 sentences/sec
all-mpnet-base-v2: 768 dimensions, ~600 sentences/sec
paraphrase-multilingual-MiniLM-L12-v2: 50+ languages, 384 dimensions

Files

SKILL.mdMarkdownGitHub ↗

Sentence Transformers - State-of-the-Art Embeddings

Python framework for sentence and text embeddings using transformers.

When to use Sentence Transformers

Use when:

Need high-quality embeddings for RAG
Semantic similarity and search
Text clustering and classification
Multilingual embeddings (100+ languages)
Running embeddings locally (no API)
Cost-effective alternative to OpenAI embeddings

Metrics:

15,700+ GitHub stars
5000+ pre-trained models
100+ languages supported
Based on PyTorch/Transformers

Use alternatives instead:

OpenAI Embeddings: Need API-based, highest quality
Instructor: Task-specific instructions
Cohere Embed: Managed service

Quick start

Installation

pip install sentence-transformers

Basic usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = [
    "This is an example sentence",
    "Each sentence is converted to a vector"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

# Cosine similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")

Popular models

General purpose

# Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')

# Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')

Multilingual

# 50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

# 100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

Domain-specific

# Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')

# Scientific papers
model = SentenceTransformer('allenai/specter')

# Code
model = SentenceTransformer('microsoft/codebert-base')

Semantic search

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

# Corpus
corpus = [
    "Python is a programming language",
    "Machine learning uses algorithms",
    "Neural networks are powerful"
]

# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)

# Query
query = "What is Python?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)
print(hits)

Similarity computation

# Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)

# Dot product
similarity = util.dot_score(embedding1, embedding2)

# Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)

Batch encoding

# Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000

embeddings = model.encode(
    sentences,
    batch_size=32,
    show_progress_bar=True,
    convert_to_tensor=False  # or True for PyTorch tensors
)

Fine-tuning

from sentence_transformers import InputExample, losses
from torch.utils.data import DataLoader

# Training data
train_examples = [
    InputExample(texts=['sentence 1', 'sentence 2'], label=0.8),
    InputExample(texts=['sentence 3', 'sentence 4'], label=0.3),
]

train_dataloader = DataLoader(train_examples, batch_size=16)

# Loss function
train_loss = losses.CosineSimilarityLoss(model)

# Train
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=10,
    warmup_steps=100
)

# Save
model.save('my-finetuned-model')

LangChain integration

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Use with vector stores
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings
)

LlamaIndex integration

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

from llama_index.core import Settings
Settings.embed_model = embed_model

# Use in index
index = VectorStoreIndex.from_documents(documents)

Model selection guide

Model	Dimensions	Speed	Quality	Use Case
all-MiniLM-L6-v2	384	Fast	Good	General, prototyping
all-mpnet-base-v2	768	Medium	Better	Production RAG
all-roberta-large-v1	1024	Slow	Best	High accuracy needed
paraphrase-multilingual	768	Medium	Good	Multilingual

Best practices

1. Start with all-MiniLM-L6-v2 - Good baseline 2. Normalize embeddings - Better for cosine similarity 3. Use GPU if available - 10× faster encoding 4. Batch encoding - More efficient 5. Cache embeddings - Expensive to recompute 6. Fine-tune for domain - Improves quality 7. Test different models - Quality varies by task 8. Monitor memory - Large models need more RAM

Performance

Model	Speed (sentences/sec)	Memory	Dimension
MiniLM	~2000	120MB	384
MPNet	~600	420MB	768
RoBERTa	~300	1.3GB	1024

Resources

GitHub: https://github.com/UKPLab/sentence-transformers ⭐ 15,700+
Models: https://huggingface.co/sentence-transformers
Docs: https://www.sbert.net
License: Apache 2.0

Sentence Transformers Models Guide

Guide to selecting and using sentence-transformers models.

Top recommended models

General purpose

all-MiniLM-L6-v2 (Default recommendation)

Dimensions: 384
Speed: ~2000 sentences/sec
Quality: Good
Use: Prototyping, general tasks

all-mpnet-base-v2 (Best quality)

Dimensions: 768
Speed: ~600 sentences/sec
Quality: Better
Use: Production RAG

all-roberta-large-v1 (Highest quality)

Dimensions: 1024
Speed: ~300 sentences/sec
Quality: Best
Use: When accuracy critical

Multilingual (50+ languages)

paraphrase-multilingual-MiniLM-L12-v2

Languages: 50+
Dimensions: 384
Speed: Fast
Use: Multilingual semantic search

paraphrase-multilingual-mpnet-base-v2

Languages: 50+
Dimensions: 768
Speed: Medium
Use: Better multilingual quality

LaBSE (109 languages)

Languages: 109
Dimensions: 768
Speed: Medium
Use: Maximum language coverage

Domain-specific

allenai/specter (Scientific papers)

Domain: Academic papers
Use: Paper similarity, citations

nlpaueb/legal-bert-base-uncased (Legal)

Domain: Legal documents
Use: Legal document analysis

microsoft/codebert-base (Code)

Domain: Source code
Use: Code similarity, search

Model selection matrix

Task	Model	Dimensions	Speed	Quality
Quick prototyping	MiniLM-L6	384	Fast	Good
Production RAG	mpnet-base	768	Medium	Better
Highest accuracy	roberta-large	1024	Slow	Best
Multilingual	paraphrase-multi-mpnet	768	Medium	Good
Scientific papers	specter	768	Medium	Domain
Legal docs	legal-bert	768	Medium	Domain

Performance benchmarks

Speed comparison (CPU)

Model	Sentences/sec	Memory
MiniLM-L6	2000	120 MB
MPNet-base	600	420 MB
RoBERTa-large	300	1.3 GB

Quality comparison (STS Benchmark)

Model	Cosine Similarity	Spearman
MiniLM-L6	82.4	-
MPNet-base	84.1	-
RoBERTa-large	85.4	-

Usage examples

Load and use model

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('all-mpnet-base-v2')

# Generate embeddings
sentences = ["This is a sentence", "This is another sentence"]
embeddings = model.encode(sentences)

Compare different models

models = {
    'MiniLM': 'all-MiniLM-L6-v2',
    'MPNet': 'all-mpnet-base-v2',
    'RoBERTa': 'all-roberta-large-v1'
}

for name, model_name in models.items():
    model = SentenceTransformer(model_name)
    embeddings = model.encode(["Test sentence"])
    print(f"{name}: {embeddings.shape}")

Resources

Models: https://huggingface.co/sentence-transformers
Docs: https://www.sbert.net/docs/pretrained_models.html

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use sentence-transformers for local embedding model pickers; pair with chroma or qdrant skills once the embedding backend choice is settled.

FAQ

Which sentence-transformers model is the default recommendation?

The sentence-transformers skill recommends all-MiniLM-L6-v2 as the default with 384 dimensions, roughly 2000 sentences per second throughput, and good quality suited to prototyping and general tasks.

What multilingual option does sentence-transformers document?

The sentence-transformers skill documents paraphrase-multilingual-MiniLM-L12-v2 supporting 50+ languages at 384 dimensions with fast speed for multilingual semantic search workloads.

Is Sentence Transformers safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingllmresearchautomation

About

Sentence Transformers by the numbers

Add your badge

Which sentence-transformers model fits RAG latency and quality?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Sentence Transformers - State-of-the-Art Embeddings

When to use Sentence Transformers

Quick start

Installation

Basic usage

Popular models

General purpose

Multilingual

Domain-specific

Semantic search

Similarity computation

Batch encoding

Fine-tuning

LangChain integration

LlamaIndex integration

Model selection guide

Best practices

Performance

Resources

Sentence Transformers Models Guide

Top recommended models

General purpose

Multilingual (50+ languages)

Domain-specific

Model selection matrix

Performance benchmarks

Speed comparison (CPU)

Quality comparison (STS Benchmark)

Usage examples

Load and use model

Compare different models

Resources

Related skills

How it compares

FAQ

Which sentence-transformers model is the default recommendation?

What multilingual option does sentence-transformers document?

Is Sentence Transformers safe to install?

This week in AI coding