The Scenario
Your head of content just asked you to audit a competitor's site. Not a quick glance — a real content inventory: every URL, every page title, what topics they are covering, where the gaps are relative to your own content plan. You have the competitor's homepage URL. Everything else needs to be discovered.
The bad version:
- Start manually clicking through the competitor site, copying URLs into a sheet one by one, guessing at what else is linked from each page you visit
- Try a free online crawler tool, hit a 50-page cap, get a CSV with inconsistent formatting, spend 40 minutes cleaning it before it is usable in your sheet
- Give up on completeness and run the audit on a subset you collected manually, knowing the analysis will have blind spots
A content gap analysis built on incomplete crawl data produces incomplete conclusions. Your content calendar for next quarter is going to be built on this audit.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Google Sheet. Put the competitor's URL in cell A1, and SheetXAI uses Scrapfly's crawler to discover the full site — returning every internal URL and page title — and writes the results into a new tab without you touching a command line or a third-party crawler interface.
Create a Scrapfly crawler for the website in cell A1 with a limit of 500 pages, retrieve all crawled contents, and write each page's URL and title into columns A and B of a new sheet called Crawl Results
What You Get
- A new sheet called Crawl Results with one row per discovered page
- Column A contains the full URL of each crawled page
- Column B contains the page title tag as found on that page
- Pages that returned errors during crawling are included with an error note rather than silently skipped
- The crawl respects robots.txt and rate limits through Scrapfly's built-in settings — no manual throttling needed
What If the Data Is Not Quite Ready
You only want pages from a specific section of the site
The full 500-page crawl may include legal pages, tag archives, and author profiles you do not care about. Scope it before writing:
Create a Scrapfly crawler for the site in cell A1, limit to 500 pages, then write only the URLs that contain /blog/ or /resources/ in the path into columns A and B of a new sheet called Crawl Results
You want to capture the H1 heading in addition to the page title
Title tags and H1s sometimes differ significantly. If you want both for content analysis:
Crawl the site in cell A1 using Scrapfly with a 500-page limit, extract the page title and the H1 heading from each discovered page, and write URL, title, and H1 into columns A, B, and C of a new sheet called Crawl Results
You want to join the crawl results against your own content inventory
Your existing content lives in a tab called Our Content with URLs in column A. After the crawl, you need to see what the competitor covers that you do not:
Crawl the site in cell A1 with Scrapfly, write all URLs and titles into Crawl Results, then in column C note whether each URL topic appears to overlap with any URL in the Our Content tab
You want the crawl data cleaned, categorized, and ready for a content gap report in one shot
Crawl the site in cell A1 using Scrapfly with a 500-page limit, write all URLs and titles into a sheet called Crawl Results, remove any URLs containing /tag/, /author/, or /page/, then in column C categorize each remaining URL as Blog, Guide, Case Study, or Other based on the URL structure and title
One ask, one structured output — the crawl, the filter, and the categorization run together.
Try It
Get the 7-day free trial of SheetXAI and put a competitor domain in cell A1 of any Google Sheet, then ask it to run a Scrapfly crawl and populate a new tab with all discovered URLs and titles. If you also need to audit HTTP status codes from the crawl, check out the spoke on exporting crawled URLs with status codes.
