The Scenario
You are a data engineer. Your team's Databricks storage bill has been creeping up for three quarters in a row, and the infra manager just sent a message asking for a detailed audit of the /mnt/data-exports/ path in DBFS before the next cost review meeting — file name, size in MB, and last modified date, so the team can identify what is large and what is stale.
You know that /mnt/data-exports/ has been accumulating files for two years. You have no idea how many are in there.
The bad version:
- You open a Databricks notebook
- You write a Python script using dbutils.fs.ls() to list the directory
- You iterate recursively through subdirectories
- You format the output as a DataFrame and display it
- You download the result as a CSV
- You import it into Google Sheets and sort it
- It is now 3 PM and you have not done anything else today.
The fast version is one prompt in the sheet.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent inside your Google Sheet that calls the Databricks DBFS API directly, so you do not have to write a notebook or touch dbutils.
Open the SheetXAI sidebar and type:
List all files in the Databricks DBFS path '/mnt/data-exports/' and write each file's path, size in MB, and last modified timestamp into the DBFSAudit sheet. Sort by size descending so the largest files appear first.
SheetXAI calls the DBFS list API, handles the pagination, and writes the result into the DBFSAudit sheet. The infra manager gets a sortable spreadsheet instead of a notebook link she cannot open.
What You Get
The DBFSAudit sheet with the full directory listing:
- Column A — full file path
- Column B — size in MB
- Column C — last modified timestamp
Sorted by size descending, so the first thing the infra manager sees is the ten largest files. If the top three are year-old export CSVs nobody has opened since Q2 2024, the conversation becomes easy.
What If the Data Is Not Quite Ready
DBFS audits usually surface more questions than just "what is there." SheetXAI handles the follow-on analysis in the same prompt.
When the timestamp format from DBFS is in milliseconds
The DBFS API returns modification time as a Unix timestamp in milliseconds. The infra manager cannot read it.
List all files in the DBFS path '/mnt/data-exports/' and write the file path, size in MB, and last modified date into the DBFSAudit sheet. Convert the modification timestamp from milliseconds to a human-readable date in the format YYYY-MM-DD. Sort by last modified date ascending so the oldest files appear first.
When the infra manager wants only files not touched in over 90 days
The cost review is focused on stale storage. Recent files are presumed active.
List all files in the DBFS path '/mnt/data-exports/' with their path, size in MB, and last modified timestamp. Filter to files where last modified is more than 90 days ago. Write the filtered results into the DBFSAudit sheet, sorted by size descending. Add a cell at the top showing the total size of stale files in GB.
When the audit covers multiple DBFS paths
The infra manager also wants /mnt/bronze/ and /mnt/silver/ audited in the same pass.
List all files in DBFS paths '/mnt/data-exports/', '/mnt/bronze/', and '/mnt/silver/'. Write all results into the DBFSAudit sheet with columns: Path, DBFS Root (which of the three paths it came from), Size in MB, and Last Modified. Sort by DBFS Root, then by size descending within each root.
When you want the directory listing plus a cost estimate in one view
The infra manager wants to know roughly what the large, stale files are costing at your DBFS storage rate of $0.023 per GB per month.
List all files in the DBFS path '/mnt/data-exports/' with path, size in MB, and last modified timestamp. Filter to files where last modified is more than 90 days ago. For each file, calculate an estimated monthly cost as (size in MB / 1024) × 0.023 and write it into a column called "Est. Monthly Cost." Add a total estimated monthly cost for all stale files in a summary row at the bottom.
The pattern: instead of an afternoon in a Databricks notebook, the DBFS audit becomes a five-minute prompt before the cost review meeting.
Try It
Get the 7-day free trial of SheetXAI and open a blank Google Sheet, then ask it to list the files in your Databricks DBFS path and write the results into the sheet. The Databricks integration is included in every plan. For related workflows, see how to pull a cluster and job inventory for cost review or the Databricks in Google Sheets overview.
