Connecting To Data Source

Name: Connecting To Data Source
Author: aws

aws/agent-toolkit-for-aws

Wire AWS Glue native BigQuery connections using GCP service accounts and Secrets Manager so Glue jobs can read BigQuery datasets.

Overview

Connecting to Data Source is an agent skill for the Build phase that sets up AWS Glue BIGQUERY connections authenticated via GCP service accounts in Secrets Manager.

Install

npx skills add https://github.com/aws/agent-toolkit-for-aws --skill connecting-to-data-source

What is this skill?

AWS Glue connection type BIGQUERY with GCP service account auth
Minimum IAM: bigquery.dataViewer on dataset and bigquery.jobUser on project
Secrets Manager stores base64-encoded service account JSON as the raw secret value
Cross-project reads require roles granted in each source GCP project
Includes connection JSON template and further-reading links to current GCP docs
Documents two minimum GCP roles: bigquery.dataViewer and bigquery.jobUser

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your Glue job must query BigQuery but authentication and secret format for the native connection are unclear across AWS and GCP.

Who is it for?

Solo builders running AWS Glue ETL or ingestion who source tables from BigQuery and want copy-paste-correct secret handling.

Skip if: Teams ingesting only within AWS (S3, Redshift) with no BigQuery footprint, or write-heavy BigQuery pipelines not covered by read-only role guidance.

When should I use this skill?

User needs BigQuery as a Glue data source: service account, Secrets Manager encoding, or BIGQUERY connection JSON.

What do I get? / Deliverables

A working Glue BigQuery connection JSON with correctly stored base64 service account credentials and documented GCP IAM for read jobs.

Base64-encoded secret stored in Secrets Manager
Glue BIGQUERY connection configuration JSON
Documented IAM grants on dataset and project

Recommended Skills

Azure Deploymicrosoft/azure-skills

Azure Deploy is a Microsoft agent skill that executes cloud releases for applications that are already planned and valid…374k installs·1.2k stars

Azure Preparemicrosoft/azure-skills

Azure Prepare is Microsoft's skill for getting applications ready to run on Azure—writing the deployment plan, generatin…374k installs·1.2k stars

Azure Storagemicrosoft/azure-skills

Azure Storage skill helps agents pick the right Azure storage service—Blob for objects, Files for SMB shares, Queues for…374k installs·1.2k stars

Azure Validatemicrosoft/azure-skills

Microsoft-guided preflight validation for Azure deployments including IaC, identity, and service-specific readiness.374k installs·1.2k stars

Appinsights Instrumentationmicrosoft/azure-skills

appinsights-instrumentation is a Microsoft Azure-skills package that walks solo builders through enabling Application In…374k installs·1.2k stars

Azure Resource Lookupmicrosoft/azure-skills

Azure Resource Lookup is a Microsoft agent skill that helps solo builders and small teams answer “what do I have in Azur…373k installs·1.2k stars

Journey fit

Primary fit

BuildIntegrations & version control

Cross-cloud data source connections are implemented during product build when you integrate analytics pipelines, not during early idea research alone. Glue-to-BigQuery is a classic backend integration subproblem—credentials, connection JSON, and IAM on both clouds.

Also useful

OperateInfrastructure & cost

How it compares

Infrastructure connection runbook for Glue—not a dbt modeling or warehouse design skill.

Common Questions / FAQ

Who is connecting-to-data-source for?

Developers shipping hybrid AWS+GCP data paths who configure Glue jobs from an AI coding assistant and need correct BigQuery auth details.

When should I use connecting-to-data-source?

During Build integrations when you add or fix a Glue BIGQUERY connection before writing or deploying the job that queries a dataset.

Is connecting-to-data-source safe to install?

It guides handling of service account keys and Secrets Manager; review the Security Audits panel on this Prism page and rotate keys if scripts are shared broadly.

SKILL.md

READMESKILL.md - Connecting To Data Source

# BigQuery Connection Setup

AWS Glue native BigQuery connection (type `BIGQUERY`). Authentication is via a GCP service account; credentials flow through AWS Secrets Manager.

## Contents

- [Prerequisites](#prerequisites)
- [Service Account Setup](#service-account-setup)
- [Secrets Manager Storage](#secrets-manager-storage)
- [Connection JSON Template](#connection-json-template)
- [Further Reading](#further-reading)

## Prerequisites

- GCP project with BigQuery enabled
- Service account in that project with BigQuery access (typically `roles/bigquery.dataViewer` plus `roles/bigquery.jobUser` for running jobs)
- Service account JSON key file from GCP
- AWS Secrets Manager secret in the same region as the Glue job

## Service Account Setup

Service account and key generation happen in GCP, not AWS. For current steps see [GCP service account docs](https://cloud.google.com/iam/docs/service-accounts-create) and [BigQuery access control](https://cloud.google.com/bigquery/docs/access-control).

Minimum GCP IAM roles for read-only ingestion:

- `roles/bigquery.dataViewer` on the target dataset
- `roles/bigquery.jobUser` on the project (to run queries)

For cross-project reads, grant both roles in each source project.

## Secrets Manager Storage

Base64-encode the service account JSON and store in Secrets Manager. The Glue BigQuery connection expects the secret value to be the base64 string directly, not a JSON wrapper.

```bash
base64 -i <service-account>.json | tr -d '\n' > sa.b64
aws secretsmanager create-secret \
  --name glue/bigquery/<project-id>/credentials \
  --secret-string file://sa.b64 \
  --region <region>
rm sa.b64
```

Rotate by creating a new key in GCP and updating the secret value. Glue picks up the new value on next job run.

## Connection JSON Template

```json
{
  "Name": "bigquery-<project-id>",
  "ConnectionType": "BIGQUERY",
  "ConnectionProperties": {
    "SECRET_ID": "glue/bigquery/<project-id>/credentials"
  }
}
```

Glue's BigQuery connection talks to Google APIs over the internet. No `PhysicalConnectionRequirements` needed unless the Glue job itself must run in a specific VPC for other reasons (e.g., also reading from a private RDS). In that case, ensure the subnet has NAT gateway egress so Glue can reach `bigquery.googleapis.com`.

## Further Reading

- [AWS Glue: Creating a BigQuery connection](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-connection.html)
- [AWS Glue: Creating a BigQuery source node](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-source-node.html)
- [GCP service account keys](https://cloud.google.com/iam/docs/keys-create-delete)


# Credential Security

Order of preference for authenticating Glue connections to data sources:

1. IAM database authentication (where supported)
2. AWS Secrets Manager (`SECRET_ID`)
3. Plaintext `USERNAME`/`PASSWORD` in connection properties (not recommended)

## Contents

- [IAM Database Authentication](#iam-database-authentication)
- [AWS Secrets Manager](#aws-secrets-manager)
- [Plaintext Credentials](#plaintext-credentials)
- [Rotation](#rotation)

## IAM Database Authentication

Supported sources:

- Aurora MySQL, Aurora PostgreSQL
- RDS MySQL, RDS PostgreSQL
- Amazon Redshift (via `GetClusterCredentials` / `GetCredentials`)

Benefits:

- No long-lived database passwords
- No secret to rotate
- Database access controlled by IAM policies
- Audit trail via CloudTrail

### RDS / Aurora Setup

1. Enable IAM DB auth on the cluster or instance:

   ```bash
   aws rds modify-db-instance \
     --db-instance-identifier <ID> \
     --enable-iam-database-authentication \
     --apply-immediately
   ```

2. Create a DB user that authenticates via IAM (MySQL):

   ```sql
   CREATE USER 'etl_user'@'%' IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS';
   GRANT SELECT ON app_db.* TO 'etl_user'@'%';
   ```

   PostgreSQL:

   ```sql
   CREATE USER etl_user;
   GRANT rds_iam TO etl_user;
   GRANT SELECT ON ALL TABLES IN SCHEMA public TO etl_u

What is this skill?

AWS Glue connection type BIGQUERY with GCP service account auth

Minimum IAM: bigquery.dataViewer on dataset and bigquery.jobUser on project

Secrets Manager stores base64-encoded service account JSON as the raw secret value

Cross-project reads require roles granted in each source GCP project

Includes connection JSON template and further-reading links to current GCP docs

Documents two minimum GCP roles: bigquery.dataViewer and bigquery.jobUser

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

OperateInfrastructure & cost

SKILL.md

READMESKILL.md - Connecting To Data Source

# BigQuery Connection Setup

AWS Glue native BigQuery connection (type `BIGQUERY`). Authentication is via a GCP service account; credentials flow through AWS Secrets Manager.

## Contents

- [Prerequisites](#prerequisites)
- [Service Account Setup](#service-account-setup)
- [Secrets Manager Storage](#secrets-manager-storage)
- [Connection JSON Template](#connection-json-template)
- [Further Reading](#further-reading)

## Prerequisites

- GCP project with BigQuery enabled
- Service account in that project with BigQuery access (typically `roles/bigquery.dataViewer` plus `roles/bigquery.jobUser` for running jobs)
- Service account JSON key file from GCP
- AWS Secrets Manager secret in the same region as the Glue job

## Service Account Setup

Service account and key generation happen in GCP, not AWS. For current steps see [GCP service account docs](https://cloud.google.com/iam/docs/service-accounts-create) and [BigQuery access control](https://cloud.google.com/bigquery/docs/access-control).

Minimum GCP IAM roles for read-only ingestion:

- `roles/bigquery.dataViewer` on the target dataset
- `roles/bigquery.jobUser` on the project (to run queries)

For cross-project reads, grant both roles in each source project.

## Secrets Manager Storage

Base64-encode the service account JSON and store in Secrets Manager. The Glue BigQuery connection expects the secret value to be the base64 string directly, not a JSON wrapper.

```bash
base64 -i <service-account>.json | tr -d '\n' > sa.b64
aws secretsmanager create-secret \
  --name glue/bigquery/<project-id>/credentials \
  --secret-string file://sa.b64 \
  --region <region>
rm sa.b64
```

Rotate by creating a new key in GCP and updating the secret value. Glue picks up the new value on next job run.

## Connection JSON Template

```json
{
  "Name": "bigquery-<project-id>",
  "ConnectionType": "BIGQUERY",
  "ConnectionProperties": {
    "SECRET_ID": "glue/bigquery/<project-id>/credentials"
  }
}
```

Glue's BigQuery connection talks to Google APIs over the internet. No `PhysicalConnectionRequirements` needed unless the Glue job itself must run in a specific VPC for other reasons (e.g., also reading from a private RDS). In that case, ensure the subnet has NAT gateway egress so Glue can reach `bigquery.googleapis.com`.

## Further Reading

- [AWS Glue: Creating a BigQuery connection](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-connection.html)
- [AWS Glue: Creating a BigQuery source node](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-source-node.html)
- [GCP service account keys](https://cloud.google.com/iam/docs/keys-create-delete)


# Credential Security

Order of preference for authenticating Glue connections to data sources:

1. IAM database authentication (where supported)
2. AWS Secrets Manager (`SECRET_ID`)
3. Plaintext `USERNAME`/`PASSWORD` in connection properties (not recommended)

## Contents

- [IAM Database Authentication](#iam-database-authentication)
- [AWS Secrets Manager](#aws-secrets-manager)
- [Plaintext Credentials](#plaintext-credentials)
- [Rotation](#rotation)

## IAM Database Authentication

Supported sources:

- Aurora MySQL, Aurora PostgreSQL
- RDS MySQL, RDS PostgreSQL
- Amazon Redshift (via `GetClusterCredentials` / `GetCredentials`)

Benefits:

- No long-lived database passwords
- No secret to rotate
- Database access controlled by IAM policies
- Audit trail via CloudTrail

### RDS / Aurora Setup

1. Enable IAM DB auth on the cluster or instance:

   ```bash
   aws rds modify-db-instance \
     --db-instance-identifier <ID> \
     --enable-iam-database-authentication \
     --apply-immediately
   ```

2. Create a DB user that authenticates via IAM (MySQL):

   ```sql
   CREATE USER 'etl_user'@'%' IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS';
   GRANT SELECT ON app_db.* TO 'etl_user'@'%';
   ```

   PostgreSQL:

   ```sql
   CREATE USER etl_user;
   GRANT rds_iam TO etl_user;
   GRANT SELECT ON ALL TABLES IN SCHEMA public TO etl_u

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is connecting-to-data-source for?

When should I use connecting-to-data-source?

Is connecting-to-data-source safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is connecting-to-data-source for?

When should I use connecting-to-data-source?

Is connecting-to-data-source safe to install?

SKILL.md