How to Automate Client Job Postings as a Recruiting Agency
If your firm is running 5+ active retained or contingency clients, you are probably losing 6 to 10 hours a week to one task nobody talks about: keeping up with which jobs are still open at each client.
Recruiters refresh client career pages twice a day. They paste roles into spreadsheets. They send "is this still open?" emails on Tuesday morning. By Friday, half of those URLs are dead and the candidate they teed up went to a competitor who pulled the listing first.
This guide is the playbook we wired into placement.solutions to kill that tax for good. You can copy the architecture, build the scraper yourself, or just use ours. Either way, the principles are the same.
Why manual job tracking breaks at scale
A solo recruiter with two clients can survive on bookmarks and a Notion page. By client number five, the math falls apart. A typical AmLaw 200 firm runs 30 to 80 active openings at any moment and posts a new role every 3 to 7 days. Multiply that by 8 active retained clients and you are tracking 400+ live URLs that change daily.
Every miss costs you a placement. Every "did this close?" email costs you trust. And every weekend you spend refreshing career pages is a weekend you are not closing candidates.
The agencies that scale past 12 clients always solve this with software. Here is how.
The four-layer architecture for automated job ingestion
A reliable client-side scraping system has four moving parts.
- A scheduler that decides when to pull each client (per-client cadence based on hiring volume).
- A scraper layer that adapts to the client's ATS (Workday, Greenhouse, Lever, iCIMS, Taleo, custom).
- A normalizer that cleans titles, locations, salary ranges, and practice areas into a consistent schema.
- A change detector that compares today's pull to yesterday's and emits events: NEW, UPDATED, REMOVED.
If you skip the change detector, your CRM becomes a graveyard of dead URLs within 60 days. If you skip the normalizer, you cannot run cross-client search ("show me all senior litigation roles in Texas"). If you skip ATS detection, every new client takes 3 days to onboard instead of 30 minutes.
ATS detection: the unsexy 80% of the problem
Most career pages run on one of 12 ATS platforms. The trick is detecting which one from a single URL so you can pick the right scraper recipe automatically. Here is the lookup we use:
| ATS | URL signal | Notes |
|---|---|---|
| Workday | myworkdayjobs.com or wd1.myworkdayjobs.com | JSON API behind the JS — bypass the SPA |
| Greenhouse | boards.greenhouse.io/{slug} | Stable JSON at /embed/json |
| Lever | jobs.lever.co/{slug} | Public API at /v0/postings/{slug} |
| iCIMS | careers-{slug}.icims.com | HTML scrape, anti-bot moderate |
| Taleo | tbe.taleo.net or *.taleo.net | Painful, paginated, session cookies |
| Greenhouse Job Boards | job-boards.greenhouse.io | Newer surface, same JSON |
| SmartRecruiters | careers.smartrecruiters.com | Public API at /postings |
| BambooHR | {slug}.bamboohr.com/jobs | Decent HTML, light anti-bot |
| Paycom | paycom.com/auth/recruit | SPA, requires real browser |
| UKG | recruiting.ultipro.com | Iframe pattern, requires Playwright |
| Jobvite | jobs.jobvite.com | Mostly clean HTML |
| Ashby | jobs.ashbyhq.com | Public GraphQL endpoint |
For anything else, fall back to a generic Playwright crawl with content extraction.
If you want a working URL-to-ATS detector in code, the open source project job-board-detector is a good starting point. We wrote our own around the same patterns.
Scheduling: why you should not just run cron every hour
The instinct is to throw every client into a cron job that fires hourly. Two problems.
First, you will get rate-limited. Workday and iCIMS in particular ban IPs that hit them too aggressively, and once you are banned the recovery timeline is days, not hours.
Second, hiring volume varies wildly. A 600-attorney firm posts new roles daily. A 40-attorney boutique posts twice a quarter. Hitting both on the same cadence wastes compute and risks the boutique's IT team filing a complaint.
The right pattern: track a last_change_at timestamp per client and adjust cadence per the exponential backoff shape. Pull a client every 30 minutes if jobs change. Pull every 4 hours if nothing changed in the last day. Pull once a day if nothing has changed in a week.
Anti-bot survival rules
If you scrape any career page in 2026 you will hit one of: Cloudflare Turnstile, Akamai bot manager, PerimeterX, or a homegrown rate limiter. The four rules that keep you out of trouble:
- Use real Chromium via Playwright, not raw HTTP. The TLS fingerprint matters more than the user agent string.
- Respect
robots.txtfor any path you cannot prove is a public listing endpoint. - Throttle to 1 request per 3 seconds per host minimum. We use jittered 5 to 12 second gaps.
- Rotate IPs only if necessary, and never from datacenter ranges. Residential proxies cost more but work.
For deeper detail on staying invisible, see our companion post How to scrape client career pages without getting blocked.
Normalization: where most agencies cut corners and pay later
Raw scraped data is messy. The same role might show up as "Sr. Associate Attorney - Litigation" at one firm and "Senior Associate, Commercial Litigation" at another. Locations come back as "New York, NY" or "NYC" or "Manhattan" or "Greater New York Area." Salary fields are sometimes ranges, sometimes single numbers, sometimes "DOE."
Build a normalizer that handles the top 40 patterns and falls back to an LLM for the long tail. Specifically: collapse seniority titles into 6 buckets (Of Counsel, Partner, Senior Associate, Associate, Junior Associate, Staff Attorney), map all city aliases to a canonical state code, and parse salary ranges into salary_min and salary_max integers.
If you skip this you will spend the rest of your career doing string matching in your candidate-search queries. We did, and we regret it.
Change detection that actually works
Hash the title plus location plus first 500 chars of description. Store the hash with the job_id. On the next pull:
- New
(title, company, location)tuple? EmitNEW. - Existing tuple with new hash? Emit
UPDATED. - Existing tuple no longer in pull? Emit
REMOVED(after 2 consecutive missing pulls — one missed pull is often a 503).
Pipe these events into your CRM, your candidate-matching system, and ideally a Slack channel for the partner who owns the client. Recruiters love getting a real-time ping when a new role drops at their account.
Build vs buy: the honest tradeoff
Building this in-house takes 4 to 6 weeks of senior engineering for the first 80% and another 6 to 9 months to harden the long tail. The ongoing maintenance is roughly 8 hours a month of an engineer's time as ATSes update their UIs and break your scrapers.
If you have engineering bandwidth and like control, build it. If you would rather spend that time placing candidates, placement.solutions ships this out of the box for $99 to $499 per month with auto-recovery scrapers and 100+ ATS adapters already built. Try the pricing page for the full breakdown.
What to track in the first 30 days after you wire it up
- Coverage: percent of client jobs that appear in your system within 24 hours of being posted.
- Stale rate: percent of jobs in your system marked active that are actually closed on the client site.
- Time to first candidate: median hours from
NEWevent to first candidate submission.
A healthy system runs 95%+ coverage, under 3% stale rate, and median time-to-first-candidate inside 6 hours. If you are above any of these thresholds, look at your scheduler cadence first, then your normalizer.
The real ROI
Recruiters using automated job tracking submit candidates 2 to 4 days faster on average than the manual cohort. In contingency placement at $25k average fee, that translates to roughly one extra placement per recruiter per quarter. For a 5-recruiter shop that is $500k in additional revenue per year from a system that costs less than one recruiter's coffee budget.
About placement.solutions: We are an AI recruiting platform built specifically for legal recruiting agencies. 11,000+ active law firm jobs scraped daily, FAISS-indexed semantic candidate matching, and one-click client portal sharing. Sign up free to wire your client career pages into our system in 30 seconds.
Want to see this in action?
Sign up for free and connect your first client career page in under a minute. We will scrape it tonight and have all open jobs in your dashboard before your morning coffee.
Related reading: