Creating Data Lake Table

Name: Creating Data Lake Table
Author: aws

aws/agent-toolkit-for-aws

Configure least-privilege IAM and S3 Tables policies so an agent can create and query federated Glue catalog tables on AWS data lakes.

Install

npx skills add https://github.com/aws/agent-toolkit-for-aws --skill creating-data-lake-table

What is this skill?

Documents s3tables actions (GetTableBucket through GetTableData) with exact ARN resource patterns
Glue federated catalog IAM: catalog root, s3tablescatalog, database, and table resource wildcards
SSE-KMS requirements: kms:Decrypt and kms:GenerateDataKey for querying principals
Bucket policy apply via aws s3tables put-table-bucket-policy with JSON resource policy
Cross-references Glue ETL service role guidance in table-creation-glue-etl.md

Adoption & trust: 1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Recommended Skills

Azure Deploymicrosoft/azure-skills

Azure Deploy is a Microsoft agent skill that executes cloud releases for applications that are already planned and valid…374k installs·1.2k stars

Azure Preparemicrosoft/azure-skills

Azure Prepare is Microsoft's skill for getting applications ready to run on Azure—writing the deployment plan, generatin…374k installs·1.2k stars

Azure Storagemicrosoft/azure-skills

Azure Storage skill helps agents pick the right Azure storage service—Blob for objects, Files for SMB shares, Queues for…374k installs·1.2k stars

Azure Validatemicrosoft/azure-skills

Microsoft-guided preflight validation for Azure deployments including IaC, identity, and service-specific readiness.374k installs·1.2k stars

Appinsights Instrumentationmicrosoft/azure-skills

appinsights-instrumentation is a Microsoft Azure-skills package that walks solo builders through enabling Application In…374k installs·1.2k stars

Azure Resource Lookupmicrosoft/azure-skills

Azure Resource Lookup is a Microsoft agent skill that helps solo builders and small teams answer “what do I have in Azur…373k installs·1.2k stars

Journey fit

Primary fit

BuildIntegrations & version control

Data lake table setup is core product and platform work before analytics or ETL jobs run in production. The skill wires AWS S3 Tables, Glue catalog ARNs, and CLI commands—classic third-party cloud integration during build.

Common Questions / FAQ

Is Creating Data Lake Table safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Creating Data Lake Table

# S3 Tables Access Control

You MUST use least-privilege permissions when configuring access to S3 Tables.

## Bucket Policy (s3tables actions)

Actions: `s3tables:GetTableBucket`, `s3tables:GetNamespace`, `s3tables:GetTable`, `s3tables:GetTableMetadataLocation`, `s3tables:GetTableData`

Resources:

- `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}`
- `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}/table/*`

Set with `aws s3tables put-table-bucket-policy --table-bucket-arn <ARN> --resource-policy '<POLICY_JSON>'`.

## IAM Policy (glue actions)

Actions: `glue:GetCatalog`, `glue:GetDatabase`, `glue:GetTable`

Resources (all three actions on each):

- `arn:aws:glue:{region}:{account_id}:catalog` (root -- required for federated catalog resolution)
- `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog`
- `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog/*`
- `arn:aws:glue:{region}:{account_id}:database/s3tablescatalog/*/*`
- `arn:aws:glue:{region}:{account_id}:table/s3tablescatalog/*/*/*`

## SSE-KMS

If the table bucket uses SSE-KMS, the querying principal also needs `kms:Decrypt` and `kms:GenerateDataKey` on the KMS key.

## Glue ETL Service Role

See `table-creation-glue-etl.md` for the Glue job service role permissions.

## Additional Resources

For latest IAM guidance, search AWS docs for `"S3 Tables identity-based policies IAM"`, `"S3 Tables access management"`, and `"S3 Tables Glue catalog prerequisites"`.


# Creating Tables via Athena DDL

Alternative to the S3 Tables API. Use when the user specifically wants SQL DDL or needs schema evolution via ALTER TABLE after creation.

## Prerequisites

- Glue catalog (`s3tablescatalog`) MUST be registered (see Step 5 in SKILL.md)
- Athena workgroup MUST use engine version 3 (required for Iceberg support)
- Output S3 bucket MUST exist in the same region as the table bucket for Athena query results. If Athena has never been used in this region, the user MUST first configure a query result location in the Athena workgroup settings or via `--result-configuration` on each query.

## CREATE TABLE

The catalog reference goes in `--query-execution-context`, NOT in the SQL statement. Use `<database>.<table>` format in SQL:

```sql
CREATE TABLE <namespace>.<table_name> (
  <column_definitions>
)
PARTITIONED BY (<partition_columns>)
TBLPROPERTIES ('table_type' = 'ICEBERG')
```

**CRITICAL: Do NOT include a LOCATION clause.** S3 Tables manages storage automatically. This differs from regular Athena external tables.

**CRITICAL: Do NOT put the catalog name in the SQL.** Athena cannot parse `s3tablescatalog/<bucket>` as a catalog identifier in DDL. It goes in the execution context only.

## Execute via Athena

```bash
aws athena start-query-execution \
  --query-string "<DDL>" \
  --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \
  --work-group "<WORKGROUP>" \
  --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}'
```

Check status with `aws athena get-query-execution --query-execution-id <ID>`.

The results bucket MUST be in the same region as the table bucket.

## Querying

Use the same execution context pattern for SELECT queries:

```bash
aws athena start-query-execution \
  --query-string "SELECT * FROM <namespace>.<table_name> LIMIT 10" \
  --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \
  --work-group "<WORKGROUP>" \
  --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}'
```

## Constraints

- All table and column names MUST be lowercase
- You MUST NOT include a LOCATION clause
- You MUST NOT put catalog name in the SQL -- use execution context
- Output S3 bucket MUST be in the same region
- The querying principal needs `athena:StartQueryExecution`, `athena:GetQueryExecution`, `athena:GetQueryResults` plus S3 access to the results bucket. Also requires S3 Tables and Glue permissions — see `acce

What is this skill?

Documents s3tables actions (GetTableBucket through GetTableData) with exact ARN resource patterns

Glue federated catalog IAM: catalog root, s3tablescatalog, database, and table resource wildcards

SSE-KMS requirements: kms:Decrypt and kms:GenerateDataKey for querying principals

Bucket policy apply via aws s3tables put-table-bucket-policy with JSON resource policy

Cross-references Glue ETL service role guidance in table-creation-glue-etl.md

Adoption & trust: 1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Creating Data Lake Table

# S3 Tables Access Control

You MUST use least-privilege permissions when configuring access to S3 Tables.

## Bucket Policy (s3tables actions)

Actions: `s3tables:GetTableBucket`, `s3tables:GetNamespace`, `s3tables:GetTable`, `s3tables:GetTableMetadataLocation`, `s3tables:GetTableData`

Resources:

- `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}`
- `arn:aws:s3tables:{region}:{account_id}:bucket/{bucket_name}/table/*`

Set with `aws s3tables put-table-bucket-policy --table-bucket-arn <ARN> --resource-policy '<POLICY_JSON>'`.

## IAM Policy (glue actions)

Actions: `glue:GetCatalog`, `glue:GetDatabase`, `glue:GetTable`

Resources (all three actions on each):

- `arn:aws:glue:{region}:{account_id}:catalog` (root -- required for federated catalog resolution)
- `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog`
- `arn:aws:glue:{region}:{account_id}:catalog/s3tablescatalog/*`
- `arn:aws:glue:{region}:{account_id}:database/s3tablescatalog/*/*`
- `arn:aws:glue:{region}:{account_id}:table/s3tablescatalog/*/*/*`

## SSE-KMS

If the table bucket uses SSE-KMS, the querying principal also needs `kms:Decrypt` and `kms:GenerateDataKey` on the KMS key.

## Glue ETL Service Role

See `table-creation-glue-etl.md` for the Glue job service role permissions.

## Additional Resources

For latest IAM guidance, search AWS docs for `"S3 Tables identity-based policies IAM"`, `"S3 Tables access management"`, and `"S3 Tables Glue catalog prerequisites"`.


# Creating Tables via Athena DDL

Alternative to the S3 Tables API. Use when the user specifically wants SQL DDL or needs schema evolution via ALTER TABLE after creation.

## Prerequisites

- Glue catalog (`s3tablescatalog`) MUST be registered (see Step 5 in SKILL.md)
- Athena workgroup MUST use engine version 3 (required for Iceberg support)
- Output S3 bucket MUST exist in the same region as the table bucket for Athena query results. If Athena has never been used in this region, the user MUST first configure a query result location in the Athena workgroup settings or via `--result-configuration` on each query.

## CREATE TABLE

The catalog reference goes in `--query-execution-context`, NOT in the SQL statement. Use `<database>.<table>` format in SQL:

```sql
CREATE TABLE <namespace>.<table_name> (
  <column_definitions>
)
PARTITIONED BY (<partition_columns>)
TBLPROPERTIES ('table_type' = 'ICEBERG')
```

**CRITICAL: Do NOT include a LOCATION clause.** S3 Tables manages storage automatically. This differs from regular Athena external tables.

**CRITICAL: Do NOT put the catalog name in the SQL.** Athena cannot parse `s3tablescatalog/<bucket>` as a catalog identifier in DDL. It goes in the execution context only.

## Execute via Athena

```bash
aws athena start-query-execution \
  --query-string "<DDL>" \
  --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \
  --work-group "<WORKGROUP>" \
  --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}'
```

Check status with `aws athena get-query-execution --query-execution-id <ID>`.

The results bucket MUST be in the same region as the table bucket.

## Querying

Use the same execution context pattern for SELECT queries:

```bash
aws athena start-query-execution \
  --query-string "SELECT * FROM <namespace>.<table_name> LIMIT 10" \
  --query-execution-context '{"Catalog": "s3tablescatalog/<BUCKET_NAME>", "Database": "<NAMESPACE>"}' \
  --work-group "<WORKGROUP>" \
  --result-configuration '{"OutputLocation": "s3://<RESULTS_BUCKET>/output/"}'
```

## Constraints

- All table and column names MUST be lowercase
- You MUST NOT include a LOCATION clause
- You MUST NOT put catalog name in the SQL -- use execution context
- Output S3 bucket MUST be in the same region
- The querying principal needs `athena:StartQueryExecution`, `athena:GetQueryExecution`, `athena:GetQueryResults` plus S3 access to the results bucket. Also requires S3 Tables and Glue permissions — see `acce

Install

What is this skill?

Recommended Skills

Journey fit

Is Creating Data Lake Table safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Creating Data Lake Table safe to install?

SKILL.md