
Dv Data
Import multi-table relational data into Microsoft Dataverse in FK order with upserts and alternate keys so retries never duplicate rows.
Install
npx skills add https://github.com/microsoft/dataverse-skills --skill dv-dataWhat is this skill?
- Four-step sequence: tables with source ID columns, alternate keys, lookups, then dependency-ordered UpsertItem
- Upsert-by-alternate-key from day one for safe retries and partial failures
- DB sources: map source PK to alternate key; Excel/CSV: propose unique column and confirm with user
- Cross-references dv-metadata for table creation, keys, and relationships
- Python examples use auth client, UpsertItem, and CSV import patterns
Adoption & trust: 24 installs on skills.sh; 136 GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Entra App Registrationmicrosoft/azure-skills
Azure Aigatewaymicrosoft/azure-skills
Lark Openapi Explorerlarksuite/cli
Supabasesupabase/agent-skills
Firebase Auth Basicsfirebase/agent-skills
Firebase Data Connectfirebase/agent-skills
Journey fit
Primary fit
Bulk Dataverse loading is core product integration work during build, not a launch or growth activity. Foreign-key ordered upserts against Power Platform APIs are classic backend integration, paired with dv-metadata for schema.
Common Questions / FAQ
Is Dv Data safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Dv Data
# Multi-Table Import with FK Dependencies When importing data across multiple tables with foreign key relationships, follow this sequence: 1. **Create tables** with source ID columns (`prefix_Src*Id`) — see **dv-metadata** 2. **Create alternate keys** on the source ID columns — see **dv-metadata** "Alternate Keys" section 3. **Create lookup relationships** — see **dv-metadata** 4. **Import data** in dependency order using `UpsertItem` with alternate keys (safe for re-runs) Using upsert from the start means partial failures, retries, and re-runs never create duplicates. The alternate key lets Dataverse match records by the source system's ID instead of GUIDs. **Deciding which alternate key to create:** - **Database source (SQLite, SQL Server):** Read the schema to identify primary keys. The source PK maps directly to the Dataverse alternate key. Agent can decide without asking. - **Excel/CSV source:** Inspect the data for columns with all-unique values (`df[col].nunique() == len(df)`). Look for naming conventions (`*_ID`, `*_Code`). **Propose the candidate to the user and confirm** — "Column `Employee_ID` has 500 unique values across 500 rows. Use this as the key?" Do not create the key without confirmation, since uniqueness in current data doesn't guarantee it's the intended business key. ```python import os, sys, csv, time sys.path.insert(0, os.path.join(os.getcwd(), "scripts")) from auth import get_client from PowerPlatform.Dataverse.models.upsert import UpsertItem from PowerPlatform.Dataverse.core.errors import HttpError from concurrent.futures import ThreadPoolExecutor, as_completed # get_client sets a plugin attribution context on the User-Agent header. # Do not modify the context value — it is a closed schema for server-side # telemetry (app/skill/agent). Never include secrets or PII. client = get_client("dv-data") def bind(entity_set, guid): """Build an @odata.bind value. entity_set must be the actual EntitySetName, not a guess.""" return f"/{entity_set}({guid})" # IMPORTANT: EntitySetName is NOT always logical_name + 's'. # Dataverse uses English pluralization: country -> countries, city -> cities, # winby -> winbies, extraruns -> extrarunses. # Always query the actual names before building @odata.bind values: # GET /api/data/v9.2/EntityDefinitions?$select=LogicalName,EntitySetName def bulk_upsert(logical_name, items, chunk_size=1000, retries=3): """Upsert items in adaptive chunks with retry. Starts at chunk_size, doubles on success (up to max_size), halves on size/timeout failure. Caps at last successful size to avoid oscillation. Safe for re-runs.""" import requests as req_lib # for timeout exception types current_size = chunk_size max_size = 4000 i = 0 while i < len(items): chunk = items[i:i + current_size] for attempt in range(retries): try: client.records.upsert(logical_name, chunk) print(f" {logical_name}: {i + len(chunk)}/{len(items)} (chunk={current_size})", flush=True) i += len(chunk) current_size = min(current_size * 2, max_size) # ramp up break except HttpError as e: if e.status_code == 429 and attempt < retries - 1: time.sleep(5 * (attempt + 1)) continue if e.status_code in (413, 500) and current_size > 100: current_size = max(current_size // 2, 100) max_size = current_size print(f" {logical_name}: chunk capped at {current_size}", flush=True) break # retry same offset with smaller chunk raise except req_lib.exceptions.RequestException: # Network timeout — SDK default is 120s for POST if current_size > 100: current_size = max(current_size // 2, 100) max_size = current_size