Pull a Databricks Cluster and Job Inventory Into a Google Sheet for Cost Review

The Scenario

You are a platform engineer. Your company's cloud bill just came in 18% over forecast, and your VP of Infrastructure wants a cost review meeting on Friday. She wants every Databricks cluster in the workspace listed with its node type, autoscale range, and who created it, so the team can identify over-provisioned instances.

You manage forty clusters. They are not documented anywhere outside Databricks itself.

The bad version of Thursday:

You open the Databricks UI, navigate to the Compute tab
You click into each cluster to see the node type and autoscale settings
You copy the details into a Google Sheet by hand, one row per cluster
You make a mistake on cluster 23 and have to go back
You finish two hours later, exhausted, with a sheet the VP will tear apart because the formatting is inconsistent
You also never got to the job definitions she also asked for.

The fast version is two prompts.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent inside your Google Sheet that calls the Databricks Clusters API directly, so you do not have to click through forty UI screens.

Open the SheetXAI sidebar and type:

List all Databricks clusters in my workspace. Write cluster name, cluster ID, state, node type, min workers, max workers, and creator username into this sheet with headers in row 1. Sort by node type so the largest instances group together.

SheetXAI calls the clusters API, pages through all forty clusters, and populates the sheet. Then you ask the follow-up:

Now list all Databricks job definitions and write job name, job ID, creator, schedule, and last run status into a new tab called Jobs, with headers in row 1.

Two prompts. Two tabs. The cost review meeting has what it needs.

What You Get

A Google Sheet with two populated tabs:

Clusters tab — cluster name, cluster ID, state, node type, min workers, max workers, creator username
Jobs tab — job name, job ID, creator, schedule, last run status

Sorted by node type on the Clusters tab, so the VP can immediately see which teams are running the biggest instances and whether the autoscale ranges are reasonable.

The creator username tells you who to talk to. The node type tells you what it costs.

What If the Data Is Not Quite Ready

Infrastructure audits always surface messier questions. SheetXAI handles them in the same prompt.

When cluster names do not follow the naming convention

Half the clusters have names like "my-test-cluster" or "david-temp" — no team, no environment, no indication of purpose.

List all Databricks clusters with cluster name, node type, min workers, max workers, and creator username. Add a column F called "Name Issue" — write "YES" if the cluster name does not start with one of these prefixes: prod-, staging-, data-, analytics-. Otherwise leave it blank. Sort by "Name Issue" descending so the flagged ones appear first.

When the VP wants to see estimated cost impact per cluster

The team uses a standard billing rate of $0.40 per DBU per hour. She wants a rough cost estimate per cluster based on max workers and node type.

List all clusters in my Databricks workspace with cluster name, node type, and max workers. In a new column called "Est. DBU/hr," apply the following mapping: Standard_DS3_v2 = 2 DBUs per worker, Standard_DS4_v2 = 4 DBUs per worker, Standard_DS5_v2 = 8 DBUs per worker. Calculate max DBU/hr as max workers × Est. DBU/hr and write it into a column called "Max DBU/hr." Write "UNKNOWN NODE TYPE" if the node type is not in the mapping.

When you only want clusters that are currently running

The cost review should focus on active spend, not clusters that are terminated.

List all Databricks clusters where state is RUNNING. Write cluster name, node type, min workers, max workers, creator username, and cluster ID into this sheet. Sort by max workers descending so the largest running clusters appear first.

When you want the cluster audit plus recent job failures in one view

The VP also wants to know which jobs failed in the last 7 days — not just which ones exist, but which ones are causing re-runs that drive up compute cost.

List all Databricks clusters with cluster name, node type, and creator. Then list all job runs from the last 7 days where the result state is FAILED. Write the job runs into a second tab called Failed Runs with columns: job name, job ID, run ID, start time, and error message. Add a note in cell A1 of the Clusters tab with the total count of failed runs from the last 7 days.

The pattern: what the VP wanted as a manual audit becomes a two-tab sheet in the time it takes to write two prompts.

Try It

Get the 7-day free trial of SheetXAI and open any Google Sheet, then ask it to pull your Databricks cluster and job inventory for a cost review. The Databricks integration is included in every plan. For related workflows, see how to run a SQL query and land results in a sheet or the Databricks in Google Sheets overview.