
Spark SQL
Let Claude Code or Cursor run Spark SQL against Thrift/HiveServer2 clusters (Spark, EMR, Hive, Impala) without leaving the agent session.
Overview
Spark SQL MCP Server is a MCP server for the Build phase that executes and explores Spark SQL on Thrift/HiveServer2-backed clusters from AI coding agents.
What is this MCP server?
- stdio MCP server (PyPI spark-sql-mcp-server v0.1.2) for Spark SQL over Thrift/HiveServer2
- Compatible with Spark, AWS EMR, Hive, and Impala-style endpoints
- Configurable SPARK_HOST (required), SPARK_PORT (default 10000), SPARK_DATABASE, and SPARK_AUTH (NONE, LDAP, KERBEROS, CU
- LDAP support via SPARK_USERNAME and secret SPARK_PASSWORD environment variables
- Agent-facing SQL execution and schema exploration without a separate JDBC desktop client
- Server version 0.1.2 on PyPI identifier spark-sql-mcp-server
- Default Thrift port documented as 10000 when SPARK_PORT is unset
- Five SPARK_AUTH modes: NONE, LDAP, KERBEROS, CUSTOM, NOSASL
What problem does it solve?
Agents cannot safely answer questions about your Spark, EMR, Hive, or Impala data without a governed SQL bridge and cluster credentials.
Who is it for?
Indie builders or tiny data teams who already operate a Thrift SQL endpoint and want agent-assisted querying during feature and pipeline work.
Skip if: Greenfield projects with no Spark infrastructure, or teams that need full DDL admin and cluster lifecycle from MCP alone.
What do I get? / Deliverables
After you configure SPARK_* env vars and add the stdio server to your agent, the assistant can run warehouse SQL and use live results in implementation and debugging.
- Agent-callable Spark SQL query and metadata access over stdio MCP
- Documented cluster connection via SPARK_HOST, port, database, and auth env vars
- Faster warehouse-grounded answers during integration and debugging sessions
Recommended MCP Servers
Journey fit
Data warehouse access is wired during product build when agents need live warehouse context for features, ETL debugging, and integration work. Integrations is the canonical shelf because the server is a bridge to external Spark SQL infrastructure via stdio MCP, not a standalone analytics app.
How it compares
MCP database integration for Spark SQL, not an in-repo agent skill or local SQLite file browser.
Common Questions / FAQ
Who is Spark SQL MCP Server for?
It is for solo builders and small teams using Claude Code, Cursor, or similar agents who need read-oriented Spark SQL access through HiveServer2/Thrift.
When should I use Spark SQL MCP Server?
Use it during build and operate work when you are integrating features, validating ETL, or debugging queries against an existing Spark, EMR, Hive, or Impala endpoint.
How do I add Spark SQL MCP Server to my agent?
Install the PyPI package spark-sql-mcp-server, set SPARK_HOST and optional SPARK_PORT, SPARK_DATABASE, SPARK_AUTH, and LDAP secrets, then register the stdio MCP entry in your client config.