
Connecting To Data Source
Wire AWS Glue native BigQuery connections using GCP service accounts and Secrets Manager so Glue jobs can read BigQuery datasets.
Overview
Connecting to Data Source is an agent skill for the Build phase that sets up AWS Glue BIGQUERY connections authenticated via GCP service accounts in Secrets Manager.
Install
npx skills add https://github.com/aws/agent-toolkit-for-aws --skill connecting-to-data-sourceWhat is this skill?
- AWS Glue connection type BIGQUERY with GCP service account auth
- Minimum IAM: bigquery.dataViewer on dataset and bigquery.jobUser on project
- Secrets Manager stores base64-encoded service account JSON as the raw secret value
- Cross-project reads require roles granted in each source GCP project
- Includes connection JSON template and further-reading links to current GCP docs
- Documents two minimum GCP roles: bigquery.dataViewer and bigquery.jobUser
Adoption & trust: 1.1k installs on skills.sh; 819 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your Glue job must query BigQuery but authentication and secret format for the native connection are unclear across AWS and GCP.
Who is it for?
Solo builders running AWS Glue ETL or ingestion who source tables from BigQuery and want copy-paste-correct secret handling.
Skip if: Teams ingesting only within AWS (S3, Redshift) with no BigQuery footprint, or write-heavy BigQuery pipelines not covered by read-only role guidance.
When should I use this skill?
User needs BigQuery as a Glue data source: service account, Secrets Manager encoding, or BIGQUERY connection JSON.
What do I get? / Deliverables
A working Glue BigQuery connection JSON with correctly stored base64 service account credentials and documented GCP IAM for read jobs.
- Base64-encoded secret stored in Secrets Manager
- Glue BIGQUERY connection configuration JSON
- Documented IAM grants on dataset and project
Recommended Skills
Journey fit
Cross-cloud data source connections are implemented during product build when you integrate analytics pipelines, not during early idea research alone. Glue-to-BigQuery is a classic backend integration subproblem—credentials, connection JSON, and IAM on both clouds.
How it compares
Infrastructure connection runbook for Glue—not a dbt modeling or warehouse design skill.
Common Questions / FAQ
Who is connecting-to-data-source for?
Developers shipping hybrid AWS+GCP data paths who configure Glue jobs from an AI coding assistant and need correct BigQuery auth details.
When should I use connecting-to-data-source?
During Build integrations when you add or fix a Glue BIGQUERY connection before writing or deploying the job that queries a dataset.
Is connecting-to-data-source safe to install?
It guides handling of service account keys and Secrets Manager; review the Security Audits panel on this Prism page and rotate keys if scripts are shared broadly.
SKILL.md
READMESKILL.md - Connecting To Data Source
# BigQuery Connection Setup AWS Glue native BigQuery connection (type `BIGQUERY`). Authentication is via a GCP service account; credentials flow through AWS Secrets Manager. ## Contents - [Prerequisites](#prerequisites) - [Service Account Setup](#service-account-setup) - [Secrets Manager Storage](#secrets-manager-storage) - [Connection JSON Template](#connection-json-template) - [Further Reading](#further-reading) ## Prerequisites - GCP project with BigQuery enabled - Service account in that project with BigQuery access (typically `roles/bigquery.dataViewer` plus `roles/bigquery.jobUser` for running jobs) - Service account JSON key file from GCP - AWS Secrets Manager secret in the same region as the Glue job ## Service Account Setup Service account and key generation happen in GCP, not AWS. For current steps see [GCP service account docs](https://cloud.google.com/iam/docs/service-accounts-create) and [BigQuery access control](https://cloud.google.com/bigquery/docs/access-control). Minimum GCP IAM roles for read-only ingestion: - `roles/bigquery.dataViewer` on the target dataset - `roles/bigquery.jobUser` on the project (to run queries) For cross-project reads, grant both roles in each source project. ## Secrets Manager Storage Base64-encode the service account JSON and store in Secrets Manager. The Glue BigQuery connection expects the secret value to be the base64 string directly, not a JSON wrapper. ```bash base64 -i <service-account>.json | tr -d '\n' > sa.b64 aws secretsmanager create-secret \ --name glue/bigquery/<project-id>/credentials \ --secret-string file://sa.b64 \ --region <region> rm sa.b64 ``` Rotate by creating a new key in GCP and updating the secret value. Glue picks up the new value on next job run. ## Connection JSON Template ```json { "Name": "bigquery-<project-id>", "ConnectionType": "BIGQUERY", "ConnectionProperties": { "SECRET_ID": "glue/bigquery/<project-id>/credentials" } } ``` Glue's BigQuery connection talks to Google APIs over the internet. No `PhysicalConnectionRequirements` needed unless the Glue job itself must run in a specific VPC for other reasons (e.g., also reading from a private RDS). In that case, ensure the subnet has NAT gateway egress so Glue can reach `bigquery.googleapis.com`. ## Further Reading - [AWS Glue: Creating a BigQuery connection](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-connection.html) - [AWS Glue: Creating a BigQuery source node](https://docs.aws.amazon.com/glue/latest/dg/creating-bigquery-source-node.html) - [GCP service account keys](https://cloud.google.com/iam/docs/keys-create-delete) # Credential Security Order of preference for authenticating Glue connections to data sources: 1. IAM database authentication (where supported) 2. AWS Secrets Manager (`SECRET_ID`) 3. Plaintext `USERNAME`/`PASSWORD` in connection properties (not recommended) ## Contents - [IAM Database Authentication](#iam-database-authentication) - [AWS Secrets Manager](#aws-secrets-manager) - [Plaintext Credentials](#plaintext-credentials) - [Rotation](#rotation) ## IAM Database Authentication Supported sources: - Aurora MySQL, Aurora PostgreSQL - RDS MySQL, RDS PostgreSQL - Amazon Redshift (via `GetClusterCredentials` / `GetCredentials`) Benefits: - No long-lived database passwords - No secret to rotate - Database access controlled by IAM policies - Audit trail via CloudTrail ### RDS / Aurora Setup 1. Enable IAM DB auth on the cluster or instance: ```bash aws rds modify-db-instance \ --db-instance-identifier <ID> \ --enable-iam-database-authentication \ --apply-immediately ``` 2. Create a DB user that authenticates via IAM (MySQL): ```sql CREATE USER 'etl_user'@'%' IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS'; GRANT SELECT ON app_db.* TO 'etl_user'@'%'; ``` PostgreSQL: ```sql CREATE USER etl_user; GRANT rds_iam TO etl_user; GRANT SELECT ON ALL TABLES IN SCHEMA public TO etl_u