
Sf Datacloud Retrieve
Configure and retrieve Salesforce Data Cloud hybrid search indexes on structured DMOs for agent and app retrieval pipelines.
Overview
sf-datacloud-retrieve is an agent skill for the Build phase that helps configure Salesforce Data Cloud hybrid search retrieval indexes on structured DMOs.
Install
npx skills add https://github.com/jaganpro/sf-skills --skill sf-datacloud-retrieveWhat is this skill?
- Hybrid search index JSON scaffold on structured Data Cloud DMOs
- Chunk DMO and vector DMO naming patterns for index and chunk artifacts
- Passage extraction chunking with max_tokens 512 and strip_html option
- e5_large_v2 embeddings with 1024 dimensions and HNSW index configuration
- Part of sf-datacloud-* family with shared CREDITS and UPSTREAM maintenance docs
- Hybrid searchType with e5_large_v2 embedding (1024 dimension example)
- Passage extraction max_tokens 512 in chunking example
Adoption & trust: 872 installs on skills.sh; 418 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a correct Data Cloud hybrid index definition across source, chunk, and vector DMOs without guessing embedding and chunking JSON.
Who is it for?
Indie builders or consultants shipping Salesforce Data Cloud search or RAG features who already work in the Data Cloud metadata model.
Skip if: Greenfield apps with no Salesforce org, or teams that only need generic vector DB setup outside Data Cloud.
When should I use this skill?
Defining or troubleshooting Salesforce Data Cloud retrieve/hybrid search index metadata on structured DMOs.
What do I get? / Deliverables
You leave with a structured index configuration template—search type, chunking, and vector embedding blocks—ready to adapt to your DMO and text fields.
- Hybrid search index configuration JSON
- Chunk and vector DMO naming pattern for your index
Recommended Skills
Journey fit
How it compares
Salesforce Data Cloud–specific retrieve scaffolding—not a generic Pinecone or pgvector integration skill.
Common Questions / FAQ
Who is sf-datacloud-retrieve for?
It is for developers and admins building hybrid search on Salesforce Data Cloud structured DMOs, especially alongside other sf-datacloud-* skills.
When should I use sf-datacloud-retrieve?
Use it in Build while defining chunk and vector indexes, embedding models, and field-level chunking before you connect agents or apps to Data Cloud retrieve APIs.
Is sf-datacloud-retrieve safe to install?
Treat it as configuration guidance for production orgs; review the Security Audits panel on this page and validate JSON against your org policies before deploy.
SKILL.md
READMESKILL.md - Sf Datacloud Retrieve
# Credits & Acknowledgments Primary contributor: **Gnanasekaran Thoppae** This skill is part of the `sf-datacloud-*` family. Shared attribution, upstream source mapping, and maintenance notes live in: - [../sf-datacloud/CREDITS.md](../sf-datacloud/CREDITS.md) - [../sf-datacloud/UPSTREAM.md](../sf-datacloud/UPSTREAM.md) { "label": "<INDEX_NAME>", "developerName": "<INDEX_NAME>", "description": "Hybrid search index on a structured Data Cloud DMO", "sourceDmoDeveloperName": "<SOURCE_DMO>__dlm", "chunkDmoName": "<INDEX_NAME> chunk", "chunkDmoDeveloperName": "<INDEX_NAME>_chunk", "vectorDmoName": "<INDEX_NAME> index", "vectorDmoDeveloperName": "<INDEX_NAME>_index", "searchType": "HYBRID", "vectorEmbedding": { "vectorEmbeddingRelatedFields": [] }, "rankingConfigurations": [], "chunkingConfiguration": { "fieldLevelConfigurations": [ { "sourceDmoDeveloperName": "<SOURCE_DMO>__dlm", "sourceDmoFieldDeveloperName": "<TEXT_FIELD>__c", "config": { "id": "passage_extraction", "userValues": [ { "id": "max_tokens", "value": "512" }, { "id": "strip_html", "value": "true" } ] } } ] }, "vectorEmbeddingConfiguration": { "embeddingModel": { "id": "e5_large_v2", "userValues": [ { "id": "dimension", "value": "1024" }, { "id": "max_token_limit", "value": "512" } ] }, "index": { "id": "HNSW", "userValues": [] }, "similarityMetric": "COSINE" } } { "label": "My_kav", "developerName": "My_kav", "sourceDmoDeveloperName": "ssot__KnowledgeArticleVersion__dlm", "chunkDmoName": "My_kav chunk", "chunkDmoDeveloperName": "My_kav_chunk", "vectorDmoName": "My_kav index", "vectorDmoDeveloperName": "My_kav_index", "searchType": "VECTOR", "vectorEmbedding": { "vectorEmbeddingRelatedFields": [] }, "chunkingConfiguration": { "fieldLevelConfigurations": [ { "sourceDmoDeveloperName": "ssot__KnowledgeArticleVersion__dlm", "sourceDmoFieldDeveloperName": "ssot__Name__c", "config": { "id": "passage_extraction", "userValues": [ { "id": "strip_html", "value": "true" }, { "id": "max_tokens", "value": "512" } ] } } ] }, "vectorEmbeddingConfiguration": { "embeddingModel": { "id": "e5_large_v2", "userValues": [ { "id": "dimension", "value": "1024" }, { "id": "max_token_limit", "value": "512" } ] }, "index": { "id": "HNSW", "userValues": [] }, "similarityMetric": "COSINE" }, "rankingConfigurations": [] } MIT License Copyright (c) 2024-2025 Jag Valaiyapathy Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # sf-datacloud-retrieve Query and search workflows for Salesforce Data Cloud. ## Use this skill for - quick SQL counts - paginated SQL (`sqlv2`) - async query lifecycles - table describe - vector search -