
Creating Data Lake Table
Configure least-privilege IAM and S3 Tables policies so an agent can create and query federated Glue catalog tables on AWS data lakes.
Install
npx skills add https://github.com/aws/agent-toolkit-for-aws --skill creating-data-lake-tableWhat is this skill?
- Documents s3tables actions (GetTableBucket through GetTableData) with exact ARN resource patterns
- Glue federated catalog IAM: catalog root, s3tablescatalog, database, and table resource wildcards
- SSE-KMS requirements: kms:Decrypt and kms:GenerateDataKey for querying principals
- Bucket policy apply via aws s3tables put-table-bucket-policy with JSON resource policy
- Cross-references Glue ETL service role guidance in table-creation-glue-etl.md
Adoption & trust: 1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Data lake table setup is core product and platform work before analytics or ETL jobs run in production. The skill wires AWS S3 Tables, Glue catalog ARNs, and CLI commands—classic third-party cloud integration during build.
Common Questions / FAQ
Is Creating Data Lake Table safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Creating Data Lake Table
# S3 Tables Access Control You MUST use least-privilege permissions when configuring access to S3 Tables. ## Bucket Policy (s3tables actions) Actions: `s3tables:GetTableBucket`, `s3tables:GetNamespace`, `s3tables:GetTable`, `s3tables:GetTableMetadataLocation`, `s3tables:GetTableData` Resources: - `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}` - `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}/table/*` Set with `aws s3tables put-table-bucket-policy --table-bucket-arn <ARN> --resource-policy '<POLICY_JSON>'`. ## IAM Policy (glue actions) Actions: `glue:GetCatalog`, `glue:GetDatabase`, `glue:GetTable` Resources (all three actions on each): - `arn:aws:glue:{region}:{account_id}:catalog` (root -- required for federated catalog resolution) - `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog` - `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog/*` - `arn:aws:glue:{region}:{account_id}:database/s3tablescatalog/*/*` - `arn:aws:glue:{region}:{account_id}:table/s3tablescatalog/*/*/*` ## SSE-KMS If the table bucket uses SSE-KMS, the querying principal also needs `kms:Decrypt` and `kms:GenerateDataKey` on the KMS key. ## Glue ETL Service Role See `table-creation-glue-etl.md` for the Glue job service role permissions. ## Additional Resources For latest IAM guidance, search AWS docs for `"S3 Tables identity-based policies IAM"`, `"S3 Tables access management"`, and `"S3 Tables Glue catalog prerequisites"`. # Creating Tables via Athena DDL Alternative to the S3 Tables API. Use when the user specifically wants SQL DDL or needs schema evolution via ALTER TABLE after creation. ## Prerequisites - Glue catalog (`s3tablescatalog`) MUST be registered (see Step 5 in SKILL.md) - Athena workgroup MUST use engine version 3 (required for Iceberg support) - Output S3 bucket MUST exist in the same region as the table bucket for Athena query results. If Athena has never been used in this region, the user MUST first configure a query result location in the Athena workgroup settings or via `--result-configuration` on each query. ## CREATE TABLE The catalog reference goes in `--query-execution-context`, NOT in the SQL statement. Use `<database>.<table>` format in SQL: ```sql CREATE TABLE <namespace>.<table_name> ( <column_definitions> ) PARTITIONED BY (<partition_columns>) TBLPROPERTIES ('table_type' = 'ICEBERG') ``` **CRITICAL: Do NOT include a LOCATION clause.** S3 Tables manages storage automatically. This differs from regular Athena external tables. **CRITICAL: Do NOT put the catalog name in the SQL.** Athena cannot parse `s3tablescatalog/<bucket>` as a catalog identifier in DDL. It goes in the execution context only. ## Execute via Athena ```bash aws athena start-query-execution \ --query-string "<DDL>" \ --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \ --work-group "<WORKGROUP>" \ --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}' ``` Check status with `aws athena get-query-execution --query-execution-id <ID>`. The results bucket MUST be in the same region as the table bucket. ## Querying Use the same execution context pattern for SELECT queries: ```bash aws athena start-query-execution \ --query-string "SELECT * FROM <namespace>.<table_name> LIMIT 10" \ --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \ --work-group "<WORKGROUP>" \ --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}' ``` ## Constraints - All table and column names MUST be lowercase - You MUST NOT include a LOCATION clause - You MUST NOT put catalog name in the SQL -- use execution context - Output S3 bucket MUST be in the same region - The querying principal needs `athena:StartQueryExecution`, `athena:GetQueryExecution`, `athena:GetQueryResults` plus S3 access to the results bucket. Also requires S3 Tables and Glue permissions — see `acce