The Scenario
You joined the data engineering team four weeks ago. Your first real deliverable before the architecture review on Friday: a complete inventory of the production ClickHouse database — every table, its storage engine, row count, compressed size, and a full column manifest showing column name, data type, and whether it's part of the primary key.
The person who built most of this database left the company two months ago. There is no data dictionary. There is a Confluence page that was last updated in 2023 and lists 12 tables. The actual database has 40.
The bad version:
- Query
system.tablesto list table names and engines. Copy the output into a sheet manually. - For each of the 40 tables, run
DESCRIBE TABLEseparately, copy the output, paste it into the right rows in the sheet. - Realize midway through that the Confluence page used different table names than the actual schema, and spend 45 minutes reconciling them.
That's not a task — it's an afternoon. And the architecture review is in two days.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Google Sheet. Connect it to ClickHouse and it can interrogate the system tables, iterate across every table in the database, and write the full schema inventory into your sheet without you touching a SQL client.
List every table in my ClickHouse 'analytics' database with its engine type, total rows, and compressed size — write table name, engine, row count, and size into this sheet.
Then for the column detail pass:
For each table listed in column A of this sheet, fetch its column definitions from ClickHouse and write column name, data type, and whether it is part of the primary key into columns B, C, and D.
What You Get
- Column A: every table name in the
analyticsdatabase, one per row. - Columns B–D (or equivalent): engine type, row count, compressed size per table.
- A second pass filling in column definitions: column name, data type, primary key flag.
- A complete, current data dictionary — not the 2023 Confluence version.
What If the Data Is Not Quite Ready
The database has multiple schemas and you want to scope to one
List every table in the ClickHouse 'analytics' database that belongs to the 'events' schema — write table name, engine, and row count into this sheet. Skip any tables in the 'staging' schema.
The row counts take too long because some tables are enormous
For each table in column A, write its compressed size from ClickHouse
system.tablesinto column C. For row count, use thetotal_rowsfield fromsystem.tables— do not run COUNT(*) queries.
You want to flag tables that haven't been written to in over 90 days
For each table in column A, check
system.tablesfor themetadata_modification_timefield and write it to column E. In column F, flag any table with a modification time older than 90 days as 'Stale'.
Full audit: table inventory + column definitions + stale flag + any tables missing from Confluence
List all tables in the ClickHouse 'analytics' database. For each, write engine, row count, size, and last modification time. Flag stale tables. Then check which table names in column A are NOT listed in the Confluence table names I'll paste into column H, and mark any gaps as 'Not in Confluence'.
When the inventory, the staleness check, and the gap analysis are one instruction, you arrive at the architecture review with a complete picture instead of a partial one.
Try It
Get the 7-day free trial of SheetXAI and open a blank Google Sheet, then ask it to document every table and column in your ClickHouse database. You can also explore the data quality audit spoke or return to the ClickHouse integration overview.
