Back to Scrapfly in Google Sheets
SheetXAI logo
Scrapfly logo
Scrapfly · Google Sheets Guide

Bulk Extract Article Headline and Author From URLs in a Google Sheet

2026-05-14
5 min read

The Scenario

You took on a content audit three weeks ago. You have 80 industry blog post URLs in column A — a mix of competitor pieces, industry news, and analyst takes your team has been bookmarking for months. Your VP asked for a structured content library by end of week: headline, author, publish date, summary, all organized in a table so the team can reference it during planning.

It is Thursday afternoon.

The bad version:

  • Open each URL, find the headline (which is sometimes in the title tag, sometimes in an H1, sometimes in an OG tag), copy it, find the author byline (which is sometimes missing entirely), find the publish date (which is sometimes hidden in metadata), write a two-sentence summary by hand for each article
  • Get through 30 articles and realize your "summary" column is getting inconsistent — some cells have three sentences, some have one, some are more like notes to yourself
  • Spend the final two hours before the deadline reformatting everything to look like it belongs in the same table

Nobody hired you to be a data entry clerk for blog metadata. You have actual analysis to do with this content once it is organized.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent that lives inside your Google Sheet. It reads the URLs in column A, calls Scrapfly's article extraction model for each one, and writes the structured metadata directly into your sheet — headline, author, date, and summary — without you opening a single browser tab.

For each URL in column A, use Scrapfly's article extraction model to pull the headline, author, publish date, and article summary, then write the four fields into columns B, C, D, and E

What You Get

  • Column B contains the article headline as extracted from the page
  • Column C contains the byline author name where present, or a note indicating no author was found
  • Column D contains the publish date in a consistent format across all rows
  • Column E contains a two-sentence summary of the article content generated during extraction
  • Rows where the URL returned an error or the page was not an article are flagged in column F rather than left blank

What If the Data Is Not Quite Ready

Some articles are behind a soft paywall and returning truncated content

If your URL list includes publications that show a preview and then cut off, the summary extraction will be partial. Tell SheetXAI to note this rather than silently return incomplete summaries:

For each URL in column A, use Scrapfly article extraction to pull headline, author, date, and summary into columns B through E — if the article content appears truncated or paywalled, write Truncated in column F so I can handle those manually

The publish dates are coming back in inconsistent formats

Some sites return "May 14, 2026" and others return "14-05-26" or a Unix timestamp. If column D is messy:

For each URL in column A, extract headline, author, publish date, and summary using Scrapfly — normalize all dates in column D to YYYY-MM-DD format regardless of how they appear on the page

You want to filter by recency before extracting

Your 80 URLs include articles going back several years. You only need content published in the last 18 months for this audit:

For each URL in column A, use Scrapfly article extraction to pull the headline, author, publish date, and summary — only write results into columns B through E if the publish date is after November 2024, and skip older articles

You want the full extraction plus topic tagging and a relevance score in one pass

For each URL in column A, use Scrapfly article extraction to pull headline, author, date, and summary into columns B through E, then in column F assign one of these topic tags: AI Tools, Data Infrastructure, Growth, or Other — and in column G rate the article's relevance to a B2B SaaS audience on a scale of 1 to 5

One prompt covers the extraction, the classification, and the scoring.

Try It

Get the 7-day free trial of SheetXAI and open your sheet with blog URLs in column A, then ask it to run Scrapfly article extraction across all 80 rows and populate the metadata columns in one pass. If you also need to pull pricing data from SaaS pages, see the spoke on extracting SaaS pricing tiers.

Stop memorizing formulas.
Tell your spreadsheet what to do.

Join 4,000+ professionals saving hours every week with SheetXAI.

Learn more