Back to ScrapeGraph AI in Excel
SheetXAI logo
ScrapeGraph AI logo
ScrapeGraph AI · Excel Guide

Extract All Sitemap URLs Into a Google Sheet

2026-05-14
5 min read

The Scenario

You're an SEO strategist and a new client just handed you their website for an audit. They say it has about 500 pages, they think. You need every URL cataloged before you can run your content audit — without the full URL inventory, you can't know what you're auditing.

The client's sitemap URL is in your notes. You've done this enough times to know that "about 500 pages" usually means something more complicated.

The bad version:

  • Navigate to the sitemap.xml, open the raw XML in a browser, start copying URLs
  • Discover the sitemap is a sitemap index — it links to six sub-sitemaps, each with its own URL list
  • Open each sub-sitemap, copy its URLs, paste into your workbook, realize the sub-sitemaps are also nested and you've been copying index entries not page URLs for the last ten minutes

What should have taken five minutes has now consumed a morning. Your audit schedule is already slipping and you haven't looked at a single page yet.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent inside your Excel workbook. It reads the sitemap URL from your workbook and uses its built-in ScrapeGraph AI integration to extract every URL — including nested sub-sitemaps — and writes them as individual rows.

Paste this into the SheetXAI sidebar:

Use ScrapeGraph AI to extract all URLs from the sitemap at the address in cell A1 and write each discovered URL as a separate row in column B

What You Get

  • Column B fills with every page URL discovered in the sitemap, one per row
  • Nested sitemap indexes are resolved automatically
  • The total count appears at the top so you know immediately whether "about 500" was accurate
  • Non-page entries are written to a separate group at the bottom with a label

What If the Data Is Not Quite Ready

The sitemap includes URLs from multiple subdomains you don't want

After writing all discovered URLs to column B, filter to keep only URLs that start with the primary domain in cell A2 and move non-matching URLs to column C labeled "excluded subdomain"

You need URLs categorized by section before the audit starts

Group the URLs in column B by their URL path prefix: write a category label in column C based on the first path segment

The client wants to know which URLs have been updated recently

For each URL in column B, extract the lastmod date from the sitemap XML if available and write it into column D; flag any URL where lastmod is older than 18 months in column E with "stale"

Full pipeline: extract, categorize, dedup, and flag priority pages

Extract all URLs from the sitemap at cell A1 using ScrapeGraph AI; write them to column B; deduplicate exact matches; add a category label in column C based on the first URL path segment; flag URLs where lastmod is over 18 months old in column D; flag URLs containing "blog" or "news" in column E as "content"; write a summary in cell F1 counting URLs per category

Your audit can start the same morning the client sends you the sitemap URL.

Try It

If you're starting a content audit and need a full URL inventory first, Get the 7-day free trial of SheetXAI and extract the complete sitemap into your Excel workbook in one prompt. For related workflows, see how to convert article URLs to Markdown or apply a consistent schema across a URL batch.

Stop memorizing formulas.
Tell your spreadsheet what to do.

Join 4,000+ professionals saving hours every week with SheetXAI.

Learn more