Dedup and Normalize Scraped Output in an Excel workbook

The Scenario

You're a data analyst. You ran a bulk SmartScraper job yesterday and the output landed in your Excel workbook: 400 rows of product data from supplier sites. The crawl finished at midnight. You woke up to the results.

The results are a mess.

Duplicate SKUs appear across multiple supplier pages — some rows appear three times. The price column has values in five different formats. The description column has 80 blank cells where the scraper hit a page that loaded descriptions via JavaScript after the initial render.

Your supply chain manager is running a purchase order review in three hours and this data is supposed to feed it.

The bad version:

Sort by SKU, scan for duplicates, delete them manually row by row — there are 47 duplicate groups
Select the price column, write a formula to detect the format of each cell and convert it, realize the formula handles four of the five formats and breaks on the fifth
Filter for blank descriptions, type "MISSING" in each blank cell, wonder how many you missed

By the time you've finished the duplicates, you've used 45 minutes and haven't touched the price normalization. The review starts in less than three hours.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent inside your Excel workbook. It reads your scraped product data, identifies the data quality problems, and cleans them in one pass.

Paste this into the SheetXAI sidebar:

Type this prompt

Remove duplicate rows from my scraped product workbook based on matching SKU in column B, normalize all price values in column D to a plain two-decimal number, and flag rows with blank descriptions in column E with the text MISSING

What You Get

Duplicate rows identified by SKU in column B are removed, keeping the row with the most complete data
Column D values are all converted to a plain numeric value with two decimal places
Blank cells in column E are replaced with the text MISSING
A note in cell H1 reports how many rows were deduplicated and how many MISSING flags were written

What If the Data Is Not Quite Ready

Duplicate SKUs have different prices and you need to keep the lowest

Type this prompt

Deduplicate rows by SKU in column B keeping only the row with the lowest numeric price value in column D; write the number of removed duplicates to cell I1

Some prices include tax and some don't, and you need to normalize to pre-tax

Type this prompt

For rows in column F where the value indicates tax-inclusive pricing, divide the price in column D by 1.2 and write the result back; add a note in column G indicating the price was adjusted

Blank descriptions came from pages that load content via JavaScript

Type this prompt

For all rows where column E contains "MISSING", write the source URL from column A into column H labeled "needs manual review" so you can re-run the scrape with a JavaScript-rendering option

Full cleanup in one pass: dedup, normalize prices, flag missing, and summarize

Type this prompt

For the 400-row scraped product workbook: deduplicate by SKU in column B keeping the row with the lowest price; normalize all price values in column D to two-decimal numeric format; replace blank description cells in column E with MISSING; write a summary to cell H1 showing rows deduped, prices normalized, and MISSING flags added; sort the cleaned workbook by SKU ascending

The supply chain manager gets clean data. You don't spend the morning doing manual data janitorial work.

Try It

If you have a bulk scrape result sitting in your Excel workbook with duplicates, inconsistent formats, and blank fields, Get the 7-day free trial of SheetXAI and clean it in one prompt. For related tasks, see how to crawl supplier category pages into an Excel workbook or apply a consistent schema across a URL batch.

Dedup and Normalize Scraped Output in an Excel workbook

The Scenario

The Easy Way: One Prompt in SheetXAI

What You Get

What If the Data Is Not Quite Ready

Duplicate SKUs have different prices and you need to keep the lowest

Some prices include tax and some don't, and you need to normalize to pre-tax

Blank descriptions came from pages that load content via JavaScript

Full cleanup in one pass: dedup, normalize prices, flag missing, and summarize

Try It

Stop memorizing formulas.Tell your spreadsheet what to do.

Stop memorizing formulas.
Tell your spreadsheet what to do.