The Scenario
You're a data engineer and you inherited a workbook from a colleague who left last month. It has 200 rows of raw HTML snippets in column A — pulled from a CRM export — each one containing a customer record in unstructured markup. Your job is to get customer name, email address, and phone number out of each snippet and into columns B, C, and D. The CRM migration is blocked until this is done, and your lead asked about it in the standup this morning.
The bad version:
- Open the first HTML snippet in column A, read through the tags, find the name field, extract it, paste into B2
- Find the email in the markup — sometimes it's in an href, sometimes it's in a span with a class you have to scroll to find — extract it, paste into C2
- Find the phone number — except this row uses a different HTML structure than the last one because whoever built the CRM export wasn't consistent
- Repeat 199 more times, making three errors along the way that you won't notice until the import fails
This kind of parsing is exactly what LLM-powered extraction was built for. The HTML structure is inconsistent, but the semantic content — name, email, phone — is findable. The only thing standing between you and a clean dataset is a tool that can read 200 rows at once.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Excel workbook. It reads the workbook and through its Parsera integration can parse structured fields out of the raw HTML content in each cell and write the extracted values into adjacent columns.
Parse all the raw text entries in the 'Raw Data' sheet using Parsera to extract invoice number, vendor name, and total amount into the three columns to the right
What You Get
- Column B fills with the customer name extracted from each snippet
- Column C gets the email address
- Column D gets the phone number, in whatever format the HTML contains it
- Rows where a field couldn't be found get an empty cell or a "not found" marker so you know which records need manual review before the CRM import
What If the Data Is Not Quite Ready
Phone numbers come back in five different formats
For each HTML snippet in column A, extract customer name, email, and phone number using Parsera. After extraction, normalize all phone numbers in column D to the format +1 (XXX) XXX-XXXX where possible, and flag any number that can't be normalized in column E
Some rows contain multiple customers in one snippet
For each HTML snippet in column A, use Parsera to extract all customer names, email addresses, and phone numbers present in the snippet. If more than one customer is found, create additional rows below the current row to hold the extra records
You only want to parse snippets that haven't been processed yet
Use Parsera to extract customer name, email, and phone from every HTML snippet in column A where columns B through D are all empty. Skip rows that already have data
Clean the HTML first, parse it, then flag the incomplete records for follow-up
Parse all the raw HTML entries in the 'Raw Data' worksheet using Parsera to extract customer name, email address, and phone number into columns B, C, and D. Then flag any row in column E where the email address is missing or malformed, and sort the worksheet so flagged rows appear at the top
The underlying principle: extraction and validation in a single pass means the cleanup is built into the ask, not a second round-trip.
Try It
Get the 7-day free trial of SheetXAI and open any Excel workbook with a column of raw HTML or text content you need to parse into structured fields. Ask SheetXAI to extract the specific fields you care about using Parsera. For related tasks, see how to bulk scrape structured fields from live URLs or extract full markdown content from web pages.
