Databricks Workspace

The Databricks Workspace connector enables you to interact with your Databricks workspace programmatically, allowing you to manage clusters, execute queries, and work with your data assets.

Pre-requisites

Before setting up the connector, ensure you have:

A Databricks workspace with appropriate permissions
Access to create Service Principals in your Databricks workspace
Administrative privileges to grant permissions to the Service Principal

Step 1: Setting up Authentication

1.1 Create a Service Principal

Navigate to your Databricks workspace: https://[YOUR_INSTANCE_URL].cloud.databricks.com
Go to Settings → Identity and Access → Service Principals
Click Add Service Principal and provide a name (e.g., "Abstra Integration")
After creation, select your newly created Service Principal

Databricks workspace get credentials

1.2 Generate Client Credentials

In the Service Principal details, navigate to the Secrets tab
Click Generate Secret to create your client ID and secret
Important: Save these credentials securely as they won't be shown again

1.3 Configure the Connection in Abstra

Open your Abstra Console and select the
Navigate to Connectors and select Databricks Workspace
Provide the following credentials:
- Instance URL: Your Databricks workspace URL (e.g., https://[YOUR_INSTANCE_URL].cloud.databricks.com)
- Client ID: The client ID from your Service Principal
- Client Secret: The client secret from your Service Principal

Step 2: Configuring Permissions

Your Service Principal needs appropriate permissions to access Databricks resources. The required permissions depend on your use case. You can grant access at different levels: entire catalog, specific schema, or individual tables. Execute the following SQL queries in your Databricks workspace:

For Data Access (Unity Catalog)

-- Grant catalog access
GRANT USE CATALOG ON CATALOG `<YOUR_CATALOG_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

-- Grant schema access
GRANT USE SCHEMA ON SCHEMA `<YOUR_CATALOG_NAME>`.`<YOUR_SCHEMA_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

-- Grant table permissions
GRANT SELECT ON TABLE `<YOUR_CATALOG_NAME>`.`<YOUR_SCHEMA_NAME>`.`<YOUR_TABLE_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

Databricks workspace permissions

For Cluster Management

Assign the Service Principal to appropriate workspace groups
Grant cluster creation/management permissions through workspace admin settings

Step 3: Using the Connector in your workflow

Abstra offers two approaches to integrate Databricks functionality into your workflows:

info

For a complete list of available actions and their parameters, visit Databricks Workspace Actions.

Option A: AI-Assisted Development

You can use natural language prompts with Abstra's AI to generate Databricks operations. Here are some example prompts:

Query Data:

Select the first 10 rows from table "customers" in catalog "sales_data" and schema "production" using the databricks-workspace connector

Cluster Management:

List all active clusters in my Databricks workspace using the databricks-workspace connector

Job Operations:

Create a new job that runs daily at 9 AM to process data from the "raw_data" table using the databricks-workspace connector

Permission Troubleshooting

If you encounter permission errors, verify that your Service Principal has the necessary grants. Common issues include:

Missing catalog/schema access: Ensure you've granted USE CATALOG and USE SCHEMA permissions
Insufficient table permissions: Grant appropriate permissions (SELECT, INSERT, UPDATE, DELETE) based on your operations
Cluster access: Verify the Service Principal can access or create clusters for job execution

Use the SQL commands in Step 2 to resolve permission issues.

Option B: Manual Python Development

For developers who prefer direct control, you can use Python code to interact with Databricks through the connector API from your Abstra Editor:

Example 1: Query Data

# Execute a SQL query
from abstra.connectors import run_connection_action

connection_name = "databricks-workspace"
action_name = "post_api_2.0_sql_statements"
params = {
    "statement": "SELECT * FROM sales_data.production.customers LIMIT 10",
    "warehouse_id": "your_warehouse_id",  # specify warehouse
    "catalog": "workspace",
    "schema": "information_schema",
    "wait_timeout": "30s",
    "format": "JSON_ARRAY",
    "disposition": "INLINE"
}

result = run_connection_action(connection_name, action_name, params)
print("Query results:")
print(result)

Example 2: List Clusters

# Get all clusters
from abstra.connectors import run_connection_action

connection_name = "databricks-workspace"
action_name = "get_api_2.1_clusters_list"
params = {}

clusters = run_connection_action(connection_name, action_name, params)
print("Available clusters:")
print(clusters)

Troubleshooting

Connection Issues:

Verify your instance URL format (should include https://)
Check that your Service Principal credentials are correct
Ensure your workspace allows API access

Permission Errors:

Review the SQL grant statements in Step 2
Verify Service Principal has appropriate workspace roles
Check Unity Catalog permissions if using Unity Catalog tables

Query Failures:

Ensure the specified warehouse/cluster is running
Verify table and column names exist
Check SQL syntax for Databricks compatibility

Pre-requisites​

Step 1: Setting up Authentication​

1.1 Create a Service Principal​

1.2 Generate Client Credentials​

1.3 Configure the Connection in Abstra​

Step 2: Configuring Permissions​

For Data Access (Unity Catalog)​

For Cluster Management​

Step 3: Using the Connector in your workflow​

Option A: AI-Assisted Development​

Option B: Manual Python Development​

Troubleshooting​