How we know what we know
Legal hiring data has a specific failure mode. The records that look most authoritative tend to be the ones that have been laundered the most, repackaged from a press release into a database into an aggregator into a sales tool, each handoff stripping the original timestamp and any sense of which other facts were known at the time. By the time the record reaches a buyer it carries the confidence of a fact and the provenance of a rumor. We built this index because we wanted the opposite: a record that walks back to its sources, in the response payload, no support ticket required.
What follows is how the 11,068 active legal-sector openings, the 1,200 firm dossiers, and the 16,405 firms in the broader graph actually get assembled. Four pillars: a typed source taxonomy, a refresh cadence tuned to how the legal market actually moves, a confidence score calibrated against confirmed historic moves, and an audit trail that ships with every record.
01 · Source taxonomy
We classify every signal that lands in the index into one of six source classes. The classes are not equally trustworthy. The class is part of the record.
Firm announcements are the canonical bar set by the firm itself: partner-promotion press releases, lateral-arrival notices on the firm's own newsroom, and the trade-press hits those announcements seed in the same news cycle. Lag is hours. Signal strength is 5 of 5 on confirmed-fact questions ("did this person join this firm") but only 3 of 5 on derived questions ("what practice group will they sit in"), because firm announcements often cite a marquee practice when the lawyer's day-to-day work spans several. We dedupe by canonicalizing the announcement URL and matching person plus firm plus event-date inside a 14-day window.
Bar admissions are the slowest class and the most decisive. State bar rolls, reciprocity grants, and pro hac vice filings only move when an attorney has actually committed to practicing in a jurisdiction. Lag is two to six weeks. Signal strength is 5 of 5 on jurisdictional facts, blind to everything else. Dedupe is on normalized name plus admission date plus bar number across 50 state bars and the federal courts.
Professional-network deltas are the events surfaced when a public profile updates a job title or a firm affiliation. Fast, often same-day, but the noisiest class: people update their public profile out of order with their actual employment, sometimes weeks early, sometimes months late, sometimes never. Signal strength is 3 of 5. We use this as a tripwire that triggers a recheck against slower, higher-confidence sources, never as a sole source for a confirmed move.
Employer-direct postings are roles a firm has chosen to publish on its own canonical hiring channel. We harvest directly from the firm's own feed, never through a broker, which is why our 1,775 live data feeds map one-to-one to the firms that operate them. Lag is real-time on creation, two to seven days on takedown. Signal strength is 5 of 5 on existence and 3 of 5 on the implicit seniority encoded in titles like "Counsel" or "Senior Associate," which vary between firms.
Court appearance records are filings naming counsel of record, captured from federal and state docket systems. They are the highest-evidence signal in the taxonomy because they require a signed document from the lawyer, on behalf of an actual client, in an actual case. Lag is one to four days. Signal strength is 5 of 5 on the firm-of-record question.
Conference rosters are CLE faculty lists, panel rosters, named honorees, and bar-section leadership announcements. A lawyer named to next spring's faculty is being placed in a category by people who work alongside them, which makes this a strong signal for sub-practice membership even when it is silent on employer or seniority. Signal strength is 4 of 5 for sub-practice, 2 of 5 for anything else.
The six classes feed a single unified record schema. Nothing reaches a customer without passing through the deduplication and confidence layers:
firm announcements ────┐
bar admissions ─────────┤
professional-net deltas ┤ ┌────────────────┐ ┌──────────────┐ ┌─────────────┐
employer-direct postings┼───>│ normalize + │───>│ ensemble │───>│ unified │
court appearances ──────┤ │ dedupe layer │ │ confidence │ │ record + │
conference rosters ─────┘ └────────────────┘ │ scoring │ │ signals[] │
└──────────────┘ └─────────────┘
A record can enter with one signal and accumulate more over time. It cannot enter with zero, and it cannot stay after the youngest supporting signal has been silent for longer than the class-specific decay window.
02 · Refresh cadence
The index runs on three clocks: a daily full pull at 04:00 UTC, an hourly delta job, and a real-time webhook fan-out reserved for the top customer tier.
The 04:00 UTC choice is not arbitrary. American firms and the trade press that covers them publish overnight. By 04:00 UTC (midnight Eastern), every announcement that was going to land has either landed or been embargoed into the next cycle. A customer pulling their first dashboard at 7 AM Eastern is reading numbers that already include the previous evening's promotions, press releases, and bar admission updates. Earlier guarantees stale numbers; later cuts into the morning the customer actually uses the data.
The hourly delta is the latency floor for our middle tier. Between full pulls we re-poll only the classes that move on an hourly timescale: employer-direct postings and professional-network deltas. Bar admissions and conference rosters do not move hourly, and we do not pretend they do by re-polling and watching for nothing. A buyer at the $2,995 tier is using the data for active outreach and cannot wait until tomorrow's pull to learn that a tracked role has been filled or just opened.
The real-time webhook fan-out is for the top tier. When a record changes, the change is queued for push within seconds, with a sub-90-second SLA from commit to delivery. This is the latency that matters for journalists chasing a developing story and for litigation funders pricing a deal where a senior hire could shift the calculus on whether a firm has the bench.
00:00 UTC ──── 04:00 ──── 08:00 ──── 12:00 ──── 16:00 ──── 20:00 ──── 24:00
│
└─ full pull (all 6 classes)
hourly delta: │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
(employer-direct + professional-network classes only)
webhook fan-out: ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲
(event-driven, sub-90-second SLA, top tier only)
03 · Confidence scoring
Every record carries a confidence score between 0 and 1, the output of a logistic-regression-style ensemble over the signals that support it. We chose this model family deliberately: it is interpretable, per-signal contributions add linearly into the log-odds, and we publish the weights. A buyer who wants to reproduce our score can do it from the weights and the signal contributions in the response.
Features: the count of confirming signals; the source class of each (carrying the strength weight from section 01); recency under a class-specific exponential decay; a structural-pattern flag for cases where the same fact appears in two independent feeds with the same wording (a strong indicator that one is paraphrasing the other rather than confirming it); and a corroboration bonus when signals from two or more source classes agree.
Calibration is the part most data products skip. A score is only useful if a 0.85 means roughly the same thing every time. We calibrate against the 18 months of prior placements JMS Talent Acquisition closed and confirmed: several thousand ground-truth labels of moves that did or did not happen within 90 days of the signal first appearing. We re-fit quarterly. At the most recent fit, a 0.85 corresponds to a confirmed move within 90 days roughly 84% of the time; a 0.50 to roughly 47%; a 0.20 to roughly 18%. The scores are honest about being scores.
A worked example. The record below describes a mid-2025 lateral move that scored 0.92.
| Signal | Class | Weight | Contribution |
|---|---|---|---|
| Firm newsroom announcement | firm announcement | 0.30 | +0.30 |
| Trade press hit, same news cycle | firm announcement | 0.05 (corroboration only) | +0.05 |
| Employer-direct posting taken down | employer-direct | 0.18 | +0.18 |
| Federal docket appearance under new firm | court appearance | 0.22 | +0.22 |
| Public-profile title change | professional-network delta | 0.10 | +0.10 |
| Multi-class corroboration bonus | (meta) | 0.07 | +0.07 |
The signals sum to a log-odds value that maps to 0.92 on the calibration curve. A buyer can verify each signal against its source URL and, if they disagree with one of the inputs, recompute the score with that signal removed. This is the minimum bar for a number that is allowed to be called "confidence."
04 · Audit trail
Every record carries a signals[] array. Each entry carries the source URL, the observation timestamp, the source class, and the contribution that signal made to the final confidence number. Nothing is summarized. If a URL goes dead we keep the timestamp and mark the entry stale rather than deleting it, because the historical fact that we observed the page on a given date is itself part of the audit trail.
When sources disagree, we publish the disagreement in a parallel disagreements[] array. The most common disagreement is on practice-group assignment: a firm announcement places a lateral in one practice but the bar-admission docket or the early court-appearance pattern places them in another. Roughly 6% of recently confirmed lateral moves carry at least one disagreement entry. We do not paper over them. Buyers, particularly journalists and litigation-funder principals, have told us repeatedly that the disagreements are more interesting than the consensus, because a disagreement is often the early signal that a firm's stated strategy and its actual hiring pattern have diverged.
Middle- and top-tier customers can subscribe to disagreement events. The webhook fires when one is detected, when one resolves, or when one persists past 30 days, which is itself usually a sign that something more complicated than a misclassification is going on. We retain 18 months of evidence on every record.
How to read a record
A response from the index looks roughly like the following. The fields are self-describing.
{
"record_id": "rec_lat_2025_8e4f",
"record_type": "lateral_move",
"subject": {
"name": "(redacted)",
"current_firm": "(redacted)",
"prior_firm": "(redacted)",
"practice": "complex commercial litigation",
"office": "New York, NY"
},
"confidence": 0.92,
"first_observed": "2025-09-14T08:31:00Z",
"last_observed": "2025-10-22T04:00:00Z",
"next_recheck": "2025-10-23T04:00:00Z",
"signals": [
{ "class": "firm_announcement", "url": "https://...", "ts": "2025-09-14T08:31Z", "contribution": 0.30 },
{ "class": "firm_announcement", "url": "https://...", "ts": "2025-09-14T11:02Z", "contribution": 0.05 },
{ "class": "employer_direct", "url": "https://...", "ts": "2025-09-15T04:00Z", "contribution": 0.18 },
{ "class": "court_appearance", "url": "https://...", "ts": "2025-09-29T16:14Z", "contribution": 0.22 },
{ "class": "professional_net_delta", "url": "https://...", "ts": "2025-10-02T13:48Z", "contribution": 0.10 },
{ "class": "_corroboration_bonus", "url": null, "ts": null, "contribution": 0.07 }
],
"disagreements": [
{
"field": "practice",
"value_a": "complex commercial litigation",
"value_a_source_class": "firm_announcement",
"value_b": "international arbitration",
"value_b_source_class": "conference_roster",
"first_seen": "2025-10-09T12:00:00Z",
"status": "open"
}
]
}
The record carries everything a buyer needs to decide whether to act on it. A buyer who only acts above 0.80 with zero open disagreements has a different filter than one who chases the open disagreements because they are leading indicators. We do not enforce a default.
What's not in the index (and why)
Three categories are deliberately excluded. First, candidate compensation, unless the firm itself has posted it. We cannot confirm a number that originated in a private negotiation, and the broker-network rumors about lawyers' comp packages fail our public-evidence-chain rule. When a firm publishes a salary band on a posted role we capture it in the firm's own words with the URL. Otherwise, we leave it out.
Second, attorneys still in active negotiation. Much of the most interesting movement is happening in conversations that have not yet produced a firm announcement, a bar transfer, or a court appearance. Our 47-node sub-practice and role-family taxonomy is rich enough that we could publish strong guesses about who is talking to whom. We do not. The index is an index of confirmed events; speculation belongs in the trade press.
Third, anything sourced from broker-only feeds. Some vendors sell aggregated movement data without disclosing its origin. We do not buy those feeds and we do not republish records that depend on them. Every signal in our index has a public URL or a publicly verifiable filing behind it. The rule costs us coverage. It also means every record we ship is a record a buyer can independently audit.
Closing: why this matters
A buyer who can audit every record stops having to choose between trusting the vendor and verifying the data. A BD director using our outbound benchmark, where the live signal is 412 replies per week at 3.4 times the templated baseline, can defend the methodology to a managing partner without a deck from us. A journalist chasing a practice group's quiet rebuild can publish the disagreements alongside the consensus. A litigation funder pricing a matter can run its own filters over the confidence distribution rather than accepting whichever records a vendor chose to surface.
None of this requires faith in us. It requires the audit trail to be present, the weights to be published, the disagreements to be visible, and the source URLs to walk back to the original evidence. Those four conditions are the product. The index is where they are all simultaneously true.