Back to Semantic Scholar in Excel
SheetXAI logo
Semantic Scholar logo
Semantic Scholar · Excel Guide

Resolve Paper Titles to Canonical Semantic Scholar IDs in a Excel

2026-05-14
5 min read

The Scenario

A research librarian scraped 200 paper titles from three conference proceedings into an Excel column. The goal is to standardize the bibliography before import into the institution's reference manager — which means each title needs a canonical Semantic Scholar paper ID, DOI, publication year, and venue. The person who set up the original scrape left last month. The titles are inconsistent: some are all-caps, some have HTML entities baked in, some are truncated at 80 characters.

The bad version:

  • Copy title 1 into the Semantic Scholar search bar, find the closest match among several candidates, verify by checking the year and first author, copy the paper ID and DOI from the URL and metadata panel, switch back to the workbook, paste into columns B through E.
  • Title 23 is truncated and returns three plausible matches. You spend 10 minutes verifying which one is correct by cross-referencing another database.
  • After an hour you've processed 18 titles and your wrist hurts.

Two hundred titles at this rate is not one afternoon's work. It's closer to a week, and every manual disambiguation judgment is a potential error in your reference database.

The Easy Way: One Prompt in SheetXAI

SheetXAI is an AI agent that lives inside your Excel workbook. It reads the titles in column A, runs each one through Semantic Scholar's title-match endpoint, and writes the canonical paper ID, DOI, year, and venue into columns B through E.

Here is the prompt for this task:

Match every title in the PaperTitles sheet to a Semantic Scholar record and fill in the DOI, year, and first author so I have a clean deduplicated bibliography in Excel

What You Get

  • Columns filled for each title in the PaperTitles worksheet: DOI, Year, First Author.
  • Rows where the title match confidence is low are flagged with a note in a Status column rather than silently assigned to a wrong paper.
  • DOI values arrive in standard format — ready for direct import into your reference manager.
  • Year arrives as a four-digit number, not a string.

What If the Data Is Not Quite Ready

Titles contain HTML entities or encoding artifacts from the scrape

Before matching, clean each title in the PaperTitles worksheet by decoding HTML entities and stripping non-ASCII artifacts, then match each cleaned title to Semantic Scholar and write DOI, year, and first author into adjacent columns

Some titles are clearly truncated at 80 characters and need a fuzzy match strategy

For each title in the PaperTitles worksheet, attempt an exact Semantic Scholar title match first; if the result confidence is low, flag it as Needs Review in a Status column rather than writing a match — for all high-confidence matches, write paper ID, DOI, year, and first author into adjacent columns

You need to cross-reference matched DOIs against a local database to find duplicates

After matching each title in the PaperTitles worksheet to a Semantic Scholar record and writing the DOI, check each DOI against the ExistingRefs worksheet column A and mark matches as Duplicate in a Status column

Clean titles, match, flag low-confidence rows, and check against a master list in one pass

Decode HTML entities in the PaperTitles worksheet titles, run each through Semantic Scholar title-matching, write paper ID, DOI, year, and first author into adjacent columns, mark low-confidence matches as Needs Review in a Status column, and flag any DOI that already appears in the MasterBib worksheet as Already Imported in a separate column

Try It

Get the 7-day free trial of SheetXAI and open any Excel workbook with a column of scraped or inconsistent paper titles. Ask SheetXAI to resolve every title to its canonical Semantic Scholar record — and deliver a clean bibliography ready for import.

See also: Batch Enrich Paper IDs With Metadata and the Semantic Scholar hub overview.

Stop memorizing formulas.
Tell your spreadsheet what to do.

Join 4,000+ professionals saving hours every week with SheetXAI.

Learn more