The Scenario
You are an HR tech developer. Your team's skill-matching engine has been ingesting resume data for six months, and the skills column in the training dataset is a mess. "ML," "machine learning," "Machine Learning," and "deep learning" are all treated as distinct skills. The canonical skill taxonomy your model expects is PDL's. You have an Excel workbook with 100 raw skill strings that need normalizing before the next model training run. Your tech lead mentioned it in standup. It is now the end of the week.
The bad version:
- Write a fuzzy-match script against a static skill list you downloaded — it handles the obvious synonyms but doesn't map to PDL's taxonomy categories, which is what the model actually needs
- Look up each edge case in PDL's API console manually and copy the canonical name and category back into the Excel file
- Get to skill 70 and realize "data science" and "machine learning" resolve to different PDL categories, which your static list was treating as equivalent, and now you have to re-check the first 70 entries
You've spent a day on a preprocessing step that the model retraining is waiting on.
The Easy Way: One Prompt in SheetXAI
SheetXAI is an AI agent that lives inside your Excel workbook. It reads the raw skills column and uses PDL's skill enrichment endpoint to normalize each entry and write back a canonical skill name and skill type.
Normalize every skill in my Excel Resume Skills table column A using People Data Labs skill enrichment and populate the Canonical Name and Skill Type columns.
What You Get
- The Canonical Name column populated with PDL's standardized skill name for each raw input
- The Skill Type column populated with PDL's taxonomy category — programming language, framework, tool, domain knowledge, soft skill, and so on
- Entries that PDL cannot confidently classify flagged in a Notes column so you can decide how to handle them in the training pipeline
What If the Data Is Not Quite Ready
Some cells contain comma-separated skill lists rather than individual skills
Before enriching, split any multi-skill cells in column A on commas into separate rows, keeping the row number of the original entry in a Source Row column. Then run PDL skill enrichment on each individual skill and populate Canonical Name and Skill Type.
You want to collapse synonyms before they reach the model
Enrich each skill via PDL and populate Canonical Name and Skill Type. Identify any rows in the Canonical Name column that share the same value and mark the duplicates in a Synonym column as "Synonym of row X" so the pipeline can merge them before the training run.
Some entries are job titles that ended up in the skills column by mistake
For each entry in column A, attempt PDL skill enrichment. If PDL classifies the entry as a job title rather than a skill, flag it in a Type column as "Job title — exclude" and leave Canonical Name and Skill Type blank.
Normalize skills, add a model weight, flag titles for exclusion, and output a training-ready format
Normalize each skill via PDL and populate Canonical Name and Skill Type. Flag job titles in a Type column as "Exclude." For remaining skills, add a Weight column for the model: 3 for programming languages and frameworks, 2 for tools and platforms, 1 for domain knowledge and soft skills.
One prompt normalizes the taxonomy, adds weights, and filters out noise — the model retraining gets a clean input file without a separate preprocessing step.
Try It
Get the 7-day free trial of SheetXAI and open any Excel workbook with a column of raw skill strings from resumes, surveys, or applicant tracking exports. Ask it to normalize each skill via PDL and write canonical names and skill types back. Then see how to deduplicate a merged contact list using PDL identity resolution or go back to the People Data Labs overview.
