The Scenario
You are the data quality engineer responsible for signing off on the events table before next week's product launch. The table has grown to 500 million rows. Your job is to produce an audit report covering null rates per column, duplicate event counts keyed on (user_id, event_id), and any timestamps that fall outside the expected range.
Your manager wants the findings in an Excel workbook by Thursday. The review with the engineering lead is Friday at 9 AM.
The bad version:
- Write four separate SQL queries: one for null counts per column, one for null percentages, one for duplicates, one for timestamp anomalies.
- Run each query in the ClickHouse Play UI, wait for the result (500 million rows takes a while), copy the output, paste it into the workbook.
- Discover that the null percentage query timed out because the cluster was under load during the afternoon run. Re-run it at 7 PM.
Four queries, four pastes, one timeout, one late-evening re-run. And if the engineering lead asks for an additional check on Friday morning, you start over.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Excel workbook. It connects to ClickHouse, runs the analytical queries against your table, and writes the structured findings directly into the workbook — column by column, labeled and ready for review.
Query my ClickHouse 'events' table and write a data quality summary into this sheet: for each column show the column name, null count, null percentage, distinct count, and min/max values.
Then for the duplicate check:
Run a ClickHouse query to find duplicate rows in the 'transactions' table keyed on (user_id, transaction_id) and write each duplicate group with its count into this sheet.
What You Get
- A row per column in the
eventstable: column name, null count, null percentage, distinct count, min value, max value. - A separate range (or worksheet) for the duplicate report: each
(user_id, transaction_id)combination that appears more than once, along with the duplicate count. - All labeled with headers so the engineering lead can filter and sort without reformatting.
- If any query times out, SheetXAI surfaces the error in the sidebar rather than writing a partial result silently.
What If the Data Is Not Quite Ready
The table is so large that a full null scan is too slow — you only need the five most-written columns
Query ClickHouse for null counts and null percentages for these five columns only in the 'events' table: user_id, event_type, timestamp, session_id, country. Write the results into this sheet with headers.
You want to flag columns where null percentage exceeds 5%
Query ClickHouse for null count and null percentage for every column in the 'events' table. Write the results to this sheet. In the column after null percentage, add a flag: 'High null rate' if the percentage exceeds 5%, otherwise 'OK'.
The timestamp audit needs to check for out-of-range values, not just nulls
Query the ClickHouse 'events' table for any rows where timestamp is before 2022-01-01 or after today(). Write the count of out-of-range rows and a sample of 10 such rows (showing user_id, event_type, and timestamp) into this sheet.
Full pre-launch audit in one shot
Run a complete data quality audit on the ClickHouse 'events' table. Write: (1) null counts and percentages per column, with a 'High null rate' flag above 5%; (2) duplicate count for rows keyed on (user_id, event_id); (3) count of timestamps outside the range 2022-01-01 to today; (4) a one-line summary at the top saying whether the table passed or failed based on whether any flags are raised.
When the null scan, the duplicate check, the range audit, and the pass/fail summary are all one instruction, you hand the completed workbook to the engineering lead instead of spending the morning assembling four separate query outputs.
Try It
Get the 7-day free trial of SheetXAI and open an Excel workbook, then ask it to run a full data quality scan on your ClickHouse table and write the findings in. You can also explore the schema inventory spoke or return to the ClickHouse integration overview.
