The Scenario
You're an e-commerce analyst. Your manager dropped a file with 100 competitor product page URLs on your desk Monday and asked for a competitive intelligence database — product names, descriptions, site names — by end of week. The URLs are already in an Excel workbook, column A. Columns B through D are empty.
The bad version:
- Open the first URL, right-click, view source, search for og:title, copy the value, switch to the workbook, paste it in column B.
- Repeat for og:description and og:site_name.
- Get to row 12 and hit a page with anti-bot protection — the metadata comes back blank or returns a CloudFront error. Spend twenty minutes figuring out why before moving on.
A hundred product pages, three fields each, with half the pages actively resisting scraping. This is a week of work if you do it by hand — and a week from now, some of those pages will have updated their copy.
Nobody hired you to run a manual scraper. The competitive analysis is the job. Getting the data into the workbook should not be.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent inside your Excel workbook. It reads your URL list, calls OpenGraph.io's scraper for each page — bypassing common anti-bot protections — and writes the returned fields back into your workbook in one operation.
Scrape each URL in column A with OpenGraph.io and write the og:title, og:description, and site name into columns B, C, and D — flag any scrape failures in column E
What You Get
- Column B fills with the og:title from each product page.
- Column C fills with the og:description.
- Column D fills with the site name (the brand or domain name returned with the metadata).
- Column E flags "FAILED" for any URL that returned an empty or error response, so you know what to re-check.
What If the Data Is Not Quite Ready
Some URLs use query parameters I want stripped before scraping
For each URL in column A, strip any query parameters (everything after "?") before scraping with OpenGraph.io — then write og:title, og:description, and site name into columns B, C, and D
I want to deduplicate by domain first, so I'm not scraping the same site twenty times
Remove rows from column A where the base domain has already appeared earlier in the column, then scrape the remaining unique URLs with OpenGraph.io — write og:title, og:description, and site name into columns B, C, and D and flag failures in column E
I also want the og:image URL so I can see product photography across competitors
Scrape each URL in column A with OpenGraph.io and write og:title, og:description, og:image, and site name into columns B, C, D, and E — flag failures in column F
Clean up malformed URLs, scrape the metadata, and flag any row where og:description is under 50 characters
For each URL in column A, fix any missing https:// prefix, then scrape with OpenGraph.io and write og:title, og:description, and site name into columns B, C, and D — in column E, flag "SHORT" if og:description is under 50 characters, and flag "FAILED" for any scrape error
One prompt handles the URL cleanup, the scrape, and the conditional quality check together.
Try It
Get the 7-day free trial of SheetXAI and open any Excel workbook with competitor product URLs in column A, then ask it to scrape the OG metadata for every row in one shot. You might also want to capture screenshots of those same pages to add a visual layer to your competitive database.
