The Scenario
You are a data engineer. Your team's Databricks storage bill has been climbing for three quarters in a row, and the infra manager has asked for a detailed audit of the /mnt/data-exports/ path in DBFS before the next cost review meeting — file name, size in MB, and last modified date, so the team can identify what is large and what is stale.
You know /mnt/data-exports/ has been accumulating files for two years. You have no idea how many are in there.
The bad version:
- You open a Databricks notebook
- You write a Python script using dbutils.fs.ls() to list the directory recursively
- You format the output as a DataFrame and display it
- You download the result as a CSV
- You import it into Excel, fix the timestamp format, sort the columns
- It is 3 PM and you have not done anything else today.
The fast version is one prompt in the workbook.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent inside your Excel workbook that calls the Databricks DBFS API directly, so you do not have to write a notebook or touch dbutils.
Open the SheetXAI sidebar and type:
List all files in the Databricks DBFS path '/mnt/data-exports/' and write each file's path, size in MB, and last modified timestamp into the DBFSAudit tab. Sort by size descending so the largest files appear first.
SheetXAI calls the DBFS list API, handles the pagination, and writes the result into the DBFSAudit tab. The infra manager gets a sortable workbook instead of a notebook link she cannot open.
What You Get
The DBFSAudit tab with the full directory listing:
- Column A — full file path
- Column B — size in MB
- Column C — last modified timestamp
Sorted by size descending. If the top ten rows are year-old export CSVs nobody has touched since 2024, the conversation about deleting them becomes easy.
What If the Data Is Not Quite Ready
DBFS audits always surface more questions than just "what is there." SheetXAI handles the follow-on analysis in the same prompt.
When the timestamp from DBFS is in milliseconds
The DBFS API returns modification time as a Unix timestamp in milliseconds. The infra manager cannot read it.
List all files in the DBFS path '/mnt/data-exports/' and write file path, size in MB, and last modified date into the DBFSAudit tab. Convert the modification timestamp from milliseconds to YYYY-MM-DD format. Sort by last modified date ascending so the oldest files appear first.
When the infra manager wants only files not touched in over 90 days
The cost review is focused on stale storage. Recent files are presumed active.
List all files in the DBFS path '/mnt/data-exports/' with path, size in MB, and last modified timestamp. Filter to files where last modified is more than 90 days ago. Write the filtered results into the DBFSAudit tab, sorted by size descending. Add a summary row at the top of the tab showing the total size of stale files in GB.
When the audit covers multiple DBFS paths
The infra manager also wants /mnt/bronze/ and /mnt/silver/ audited in the same pass.
List all files in DBFS paths '/mnt/data-exports/', '/mnt/bronze/', and '/mnt/silver/'. Write all results into the DBFSAudit tab with columns: Path, DBFS Root, Size in MB, and Last Modified. Sort by DBFS Root, then by size descending within each root.
When you want the directory listing plus a cost estimate per file
The infra manager wants to know roughly what the large, stale files are costing at your DBFS storage rate of $0.023 per GB per month.
List all files in the DBFS path '/mnt/data-exports/' with path, size in MB, and last modified timestamp. Filter to files where last modified is more than 90 days ago. For each file, calculate estimated monthly cost as (size in MB / 1024) × 0.023 and write it into a column called "Est. Monthly Cost." Add a total estimated monthly cost for all stale files in a summary row at the bottom.
The pattern: instead of an afternoon in a Databricks notebook, the DBFS audit becomes a five-minute prompt before the cost review meeting.
Try It
Get the 7-day free trial of SheetXAI and open a blank Excel workbook, then ask it to list the files in your Databricks DBFS path and write the results into the workbook. The Databricks integration is included in every plan. For related workflows, see how to pull a cluster and job inventory for cost review or the Databricks in Excel overview.
