The Scenario
The content strategy deck goes to the leadership team on Friday and slide 12 is a content gap analysis — your domains versus ten competitor blog sites. The researcher who was supposed to run this is out sick, and the handoff note says: "crawl competitor domains with Bright Data, pull page titles and URLs into the workbook, then we analyze."
You open the Excel workbook. Column A has ten competitor domains. The rest of it is empty.
The bad version:
- Log into Bright Data's dashboard, find the crawler product, configure a new crawl job for domain 1, set the parameters, hit start, wait, come back when it's done, export the results as JSON, write a quick script to parse the page titles and URLs from the JSON, paste them somewhere, repeat for domain 2.
- Get through three domains and realize the JSON structure is slightly different for each because different sites return different metadata schemas — your parser breaks on domain 4.
- It's now Thursday afternoon, you have seven domains left, and the deck is due tomorrow morning.
The analysis is ten minutes of work once the data is in the workbook. Getting the data into the workbook is the whole problem.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Excel workbook and talks to Bright Data's crawler on your behalf. It kicks off the crawl jobs, waits for completion, parses the results, and writes them directly into your workbook — no JSON wrangling, no domain-by-domain repetition.
Start a Bright Data crawl for each domain in column A of my "Competitors" worksheet (rows 2 through 11), wait for each crawl to complete, then write all discovered page titles into column B and their corresponding URLs into column C — add a column D with the source domain for each row
What You Get
- A multi-row output where each crawled page occupies one row in columns B, C, and D
- Column B: page title as returned by the crawl
- Column C: full URL of that page
- Column D: the originating domain from column A, so you can filter or pivot by competitor
- If a domain crawl times out or returns an error, a row is inserted with the domain name in D and an error note in B — so you know which sites to retry
What If the Data Is Not Quite Ready
Some domains in column A include "https://" and some don't
Normalize all domains in column A by stripping any protocol prefix (https://, http://) before triggering the Bright Data crawl jobs, then write results to columns B, C, and D as specified
I only want blog pages, not product or pricing pages
Start Bright Data crawl jobs for each domain in column A, but filter the results to only include pages whose URL contains "/blog/" or "/articles/" or "/insights/" before writing titles and URLs into columns B and C — add the source domain in column D
One domain has already been crawled and I want to skip it
Start Bright Data crawls for all domains in column A except row 3 (already processed), wait for completion, and write discovered page titles and URLs into columns B and C with the source domain in column D
Normalize domains, filter to blog pages, deduplicate titles, and import in one go
Strip protocol prefixes from all domains in column A, run Bright Data crawls for each, filter results to pages with "/blog/" or "/content/" in the URL, deduplicate by page title, and write unique blog page titles into column B, URLs into column C, and source domains into column D
When you're staring at a Friday deadline, one prompt is the difference between done and not.
Try It
Open any Excel workbook with a list of competitor domains in column A and get the 7-day free trial of SheetXAI — ask it to crawl each domain with Bright Data and populate the page titles and URLs into your workbook. You can also see how to run bulk SERP lookups or view the full Bright Data integration guide.
