Skip to main content

Databricks Workspace

The Databricks Workspace connector enables you to interact with your Databricks workspace programmatically, allowing you to manage clusters, execute queries, and work with your data assets.

Pre-requisites

Before setting up the connector, ensure you have:

  • A Databricks workspace with appropriate permissions
  • Access to create Service Principals in your Databricks workspace
  • Administrative privileges to grant permissions to the Service Principal

Step 1: Setting up Authentication

1.1 Create a Service Principal

  1. Navigate to your Databricks workspace: https://[YOUR_INSTANCE_URL].cloud.databricks.com
  2. Go to SettingsIdentity and AccessService Principals
  3. Click Add Service Principal and provide a name (e.g., "Abstra Integration")
  4. After creation, select your newly created Service Principal

Databricks workspace get credentials

1.2 Generate Client Credentials

  1. In the Service Principal details, navigate to the Secrets tab
  2. Click Generate Secret to create your client ID and secret
  3. Important: Save these credentials securely as they won't be shown again

1.3 Configure the Connection in Abstra

  1. Open your Abstra Console and select the
  2. Navigate to Connectors and select Databricks Workspace
  3. Provide the following credentials:
    • Instance URL: Your Databricks workspace URL (e.g., https://[YOUR_INSTANCE_URL].cloud.databricks.com)
    • Client ID: The client ID from your Service Principal
    • Client Secret: The client secret from your Service Principal

Step 2: Configuring Permissions

Your Service Principal needs appropriate permissions to access Databricks resources. The required permissions depend on your use case. You can grant access at different levels: entire catalog, specific schema, or individual tables. Execute the following SQL queries in your Databricks workspace:

For Data Access (Unity Catalog)

-- Grant catalog access
GRANT USE CATALOG ON CATALOG `<YOUR_CATALOG_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

-- Grant schema access
GRANT USE SCHEMA ON SCHEMA `<YOUR_CATALOG_NAME>`.`<YOUR_SCHEMA_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

-- Grant table permissions
GRANT SELECT ON TABLE `<YOUR_CATALOG_NAME>`.`<YOUR_SCHEMA_NAME>`.`<YOUR_TABLE_NAME>` TO `<SERVICE_PRINCIPAL_APPLICATION_ID>`;

Databricks workspace permissions

For Cluster Management

  • Assign the Service Principal to appropriate workspace groups
  • Grant cluster creation/management permissions through workspace admin settings

Step 3: Using the Connector in your workflow

Abstra offers two approaches to integrate Databricks functionality into your workflows:

info

For a complete list of available actions and their parameters, visit Databricks Workspace Actions.

Option A: AI-Assisted Development

You can use natural language prompts with Abstra's AI to generate Databricks operations. Here are some example prompts:

Query Data:

Select the first 10 rows from table "customers" in catalog "sales_data" and schema "production" using the databricks-workspace connector

Cluster Management:

List all active clusters in my Databricks workspace using the databricks-workspace connector

Job Operations:

Create a new job that runs daily at 9 AM to process data from the "raw_data" table using the databricks-workspace connector
Permission Troubleshooting

If you encounter permission errors, verify that your Service Principal has the necessary grants. Common issues include:

  1. Missing catalog/schema access: Ensure you've granted USE CATALOG and USE SCHEMA permissions
  2. Insufficient table permissions: Grant appropriate permissions (SELECT, INSERT, UPDATE, DELETE) based on your operations
  3. Cluster access: Verify the Service Principal can access or create clusters for job execution

Use the SQL commands in Step 2 to resolve permission issues.

Option B: Manual Python Development

For developers who prefer direct control, you can use Python code to interact with Databricks through the connector API from your Abstra Editor:

Example 1: Query Data

# Execute a SQL query
from abstra.connectors import run_connection_action

connection_name = "databricks-workspace"
action_name = "post_api_2.0_sql_statements"
params = {
"statement": "SELECT * FROM sales_data.production.customers LIMIT 10",
"warehouse_id": "your_warehouse_id", # specify warehouse
"catalog": "workspace",
"schema": "information_schema",
"wait_timeout": "30s",
"format": "JSON_ARRAY",
"disposition": "INLINE"
}

result = run_connection_action(connection_name, action_name, params)
print("Query results:")
print(result)

Example 2: List Clusters

# Get all clusters
from abstra.connectors import run_connection_action

connection_name = "databricks-workspace"
action_name = "get_api_2.1_clusters_list"
params = {}

clusters = run_connection_action(connection_name, action_name, params)
print("Available clusters:")
print(clusters)

Troubleshooting

Connection Issues:

  • Verify your instance URL format (should include https://)
  • Check that your Service Principal credentials are correct
  • Ensure your workspace allows API access

Permission Errors:

  • Review the SQL grant statements in Step 2
  • Verify Service Principal has appropriate workspace roles
  • Check Unity Catalog permissions if using Unity Catalog tables

Query Failures:

  • Ensure the specified warehouse/cluster is running
  • Verify table and column names exist
  • Check SQL syntax for Databricks compatibility