Storing And Querying Vectors

Canonical shelf is Build backend because the skill teaches how to architect vector indexes, ingestion, and query paths for a product feature. Backend subphase matches multi-tenant index design, batch PutVectors ingestion, and query scoping that ships as application infrastructure.

Also useful

Also useful

Where it fits

Example use

Compare S3 Vectors latency and cost against a managed vector DB before committing your RAG MVP architecture.

Example use

Pick per-tenant indexes so each customer’s embeddings stay isolated in a shared vector bucket.

Example use

Tune parallel PutVectors workers with backoff after ServiceUnavailable spikes during a bulk reindex.

How it compares

Architecture patterns for S3 Vectors at scale, not a drop-in replacement for Pinecone or OpenSearch Serverless setup wizards.

Common Questions / FAQ

Who is storing-and-querying-vectors for?

Solo builders and small teams implementing retrieval or RAG backends on AWS who want agent guidance on S3 Vectors tenancy, ingestion, and query tradeoffs.

When should I use storing-and-querying-vectors?

Use it during Build backend when choosing vector storage, during Operate infra when scaling ingestion workers, and during Validate scope when estimating AWS retrieval costs versus in-memory options.

Is storing-and-querying-vectors safe to install?

It is documentation-style guidance without embedded secrets; review the Security Audits panel on this Prism page and follow your own IAM least-privilege policies for S3 Vectors APIs.

SKILL.md

READMESKILL.md - Storing And Querying Vectors

# Patterns for S3 Vectors at Scale

For current limits: search AWS docs for `"S3 Vectors limitations and restrictions"`

## When to Use S3 Vectors

Use S3 Vectors for large, long-term vector data that doesn't require the
high-throughput performance of in-memory vector databases. S3 Vectors provides a
cost-optimized data foundation with query performance optimized for long-term
storage and infrequent access of data. You also benefit from a storage
architecture with strong consistency guarantees, ensuring subsequent queries
always include your most recently added data.

S3 Vectors delivers subsecond latency for infrequent queries and as low as 100ms
for more frequent queries.

## Multi-Tenant Patterns

**Per-tenant index** (recommended for isolation):

- Each tenant gets their own index within a shared vector bucket
- Queries naturally scoped to one tenant
- Easy to delete a tenant's data (delete the index)
- Use when: tenants need strict isolation, different schemas, or independent scaling

**Single index with metadata filtering** (simpler):

- All tenants share one index, filter by `tenant_id` metadata
- Simpler to manage, single query endpoint
- Use when: tenants have identical schemas and moderate scale
- Risk: noisy neighbor if one tenant dominates the index

## Batch Ingestion Pattern

For large-scale ingestion (millions of vectors):

1. Batch vectors into groups of up to 500 per PutVectors call
2. Use parallel workers with backoff on `ServiceUnavailableException`
3. For sustained throughput beyond per-index limits, shard across multiple indexes
4. Search AWS docs for `"S3 Vectors limitations and restrictions"` for current per-call and per-second limits

## SSE-KMS Encryption

To create a vector bucket with SSE-KMS:

```bash
aws s3vectors create-vector-bucket \
  --vector-bucket-name <BUCKET_NAME> \
  --encryption-configuration '{"sseType":"aws:kms","kmsKeyArn":"arn:aws:kms:<REGION>:<ACCOUNT>:key/<KEY_ID>"}'
```

You MUST use the full KMS key ARN (not alias or key ID). The KMS key policy MUST grant
`kms:GenerateDataKey` and `kms:Decrypt` to the S3 Vectors service principal `indexing.s3vectors.amazonaws.com`.
Encryption cannot be changed after bucket or index creation.

For full KMS policy examples, search AWS docs for `"S3 Vectors data encryption KMS"`.

## Migration Pattern

When migrating from another vector DB (pgVector, AOSS, etc.):

1. Create vector bucket and index matching source dimensions + distance metric
2. Export vectors from source (with metadata)
3. Batch PutVectors into S3 Vectors
4. Verify with QueryVectors using known test vectors
5. S3 Vectors only supports `cosine` and `euclidean` — if source used dotProduct,
   use `cosine` on normalized vectors as equivalent


# Metadata Filtering

For full docs: search AWS docs for `"S3 Vectors metadata filtering"`

## Filterable vs Non-filterable

- **Filterable** (default): All metadata is filterable unless explicitly declared otherwise.
  Can be used in query `--filter` expressions. Limited to 2 KB per vector.
- **Non-filterable**: Declared at index creation via `--metadata-configuration`. Search AWS docs for `"S3 Vectors non-filterable metadata"` for JSON syntax.
  Cannot be used in filters but can store larger data. Total metadata per vector
  (filterable + non-filterable combined) is limited to 40 KB. Ideal for text
  chunks, descriptions, raw content. Immutable — cannot change after index
  creation. Max 10 non-filterable keys per index.

## Filter Operators

| Operator | Input types | Description |
|----------|------------|-------------|
| `$eq` | string, number, boolean | Exact match (default when no operator specified) |
| `$ne` | string, number, boolean | Not equal |
| `$gt` | number | Greater than |
| `$gte` | number | Greater than or equal |
| `$lt` | number | Less than |
| `$lte` | number | Less than or equal |
| `$in` | array of primitives | Match any value in array |
| `$nin` | array of primitives | Match none of the values |
| `$exists` | boolean | Ch

What is this skill?

Positions S3 Vectors for large, long-term, cost-optimized embeddings with strong read-after-write consistency

Latency guidance: subsecond for infrequent access and as low as 100ms for hotter query paths

Per-tenant index pattern for isolation versus single-index metadata filtering by tenant_id

Batch ingestion up to 500 vectors per PutVectors call with parallel workers and ServiceUnavailable backoff

Points to current AWS documentation for S3 Vectors limits rather than hard-coding quotas

Batch up to 500 vectors per PutVectors API call

Query latency as low as 100ms for more frequent access; subsecond for infrequent queries

Documents per-tenant index versus single-index tenant_id metadata filtering patterns

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You leave with a concrete tenancy model, batch ingestion approach, and query expectations aligned to S3 Vectors limits documented in AWS.

Chosen multi-tenant vector index strategy with isolation rationale

Batch ingestion plan with worker parallelism and backoff handling

Documented latency and cost expectations for S3 Vectors versus in-memory stores

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Compare S3 Vectors latency and cost against a managed vector DB before committing your RAG MVP architecture.

Example use

Pick per-tenant indexes so each customer’s embeddings stay isolated in a shared vector bucket.

Example use