Back to ScrapeGraph AI in Google Sheets
SheetXAI logo
ScrapeGraph AI logo
ScrapeGraph AI · Google Sheets Guide

Convert Article URLs to Markdown in a Google Sheet

2026-05-14
5 min read

The Scenario

You run content operations for a mid-size media company. A developer built a summarization pipeline that takes Markdown as input, but the raw HTML from your 150 archived blog posts breaks the parser consistently. Somebody on the team suggested collecting clean Markdown versions of every article before the pipeline runs — and that somebody was looking at you when they said it.

You have the 150 URLs already organized in column A of a Google Sheet. What you don't have is any interest in pasting webpage source into a converter tool one hundred and fifty times.

The bad version:

  • Open URL 1, view source or use a browser extension to get the HTML, paste it into an online Markdown converter, copy the output, paste it into column B row 1
  • Repeat for URL 2, realize the converter added a bunch of nav-bar Markdown you have to trim manually, trim it, move to URL 3
  • By URL 20 you have stopped trimming the nav Markdown and the pipeline is going to fail on it anyway

Your summarization pipeline is blocked on this. It has been blocked for four days. The Markdown isn't getting cleaner the longer you wait.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent inside your Google Sheet. It reads your URL list and uses its built-in ScrapeGraph AI integration to run Markdownify on each URL, then writes the full Markdown output into column B automatically.

Paste this into the SheetXAI sidebar:

For every URL in column A, use ScrapeGraph AI Markdownify to convert each webpage to Markdown and paste the full result into column B

What You Get

  • Column B fills with clean Markdown for each article: headings, paragraph text, inline formatting, and links preserved
  • Navigation menus, sidebars, and footer content are stripped by Markdownify's extraction logic, so what lands in column B is article content, not page chrome
  • Rows where the URL is unreachable or the page returns no article body get a note in column B rather than an empty cell

What If the Data Is Not Quite Ready

Some URLs have moved and redirect to 404 pages

Before running Markdownify, check each URL in column A for HTTP status and write the status code into column C; then run Markdownify only on rows where column C contains 200

The Markdown output includes boilerplate text that appears on every article

After writing Markdownify output to column B, scan for text matching "Subscribe to our newsletter" and "Related articles" in every cell and remove those sections; write the trimmed result back to column B

You need the Markdown truncated to the first 500 words for a preview pipeline

Run ScrapeGraph AI Markdownify on each URL in column A, trim the output to the first 500 words, and write the truncated Markdown into column B; write the full word count of the original page into column C

Full pipeline in one shot: check, convert, trim, flag

For all 150 URLs in column A: verify each URL returns a 200 response, run ScrapeGraph AI Markdownify on those that do, trim boilerplate (nav, footer, subscribe prompts), limit output to the first 800 words, write the result into column B, write the word count into column C, and flag any URL that returned a non-200 status in column D with the status code

One prompt that does what would have taken an afternoon of manual conversion work.

Try It

If you have a sheet of article URLs waiting to be converted, Get the 7-day free trial of SheetXAI and run ScrapeGraph AI Markdownify across the whole column in one pass. For related workflows, see how to scrape news articles for a PR report or bulk scrape competitor pricing.

Stop memorizing formulas.
Tell your spreadsheet what to do.

Join 4,000+ professionals saving hours every week with SheetXAI.

Learn more