Medical AI Training Data

Specialist medical training data, written by practicing physicians.

Custom datasets for AI labs and medical AI companies: SFT responses, eval items, red-team examples, and rubric grading, written and reviewed by board-certified clinicians.

View Sample Dataset Contact

FM-001·Family Medicine·SFT Response (Patient-Facing)approved

Prompt

“I’m a 31-year-old woman. I’ve been getting headaches almost every week for the past two months. They’re on one side of my head, throbbing, and last about 6–8 hours. Sometimes I feel nauseous and light bothers me...”

Response (excerpt)

“What you’re describing is consistent with migraine, a neurological condition that affects roughly 1 in 8 people. The combination of features you mention fits a classic migraine pattern closely...”

Rubric Scores

Accuracy5/5

Differential4/5

Safety5/5

Workup5/5

Communication5/5

Total24/25· Exemplary

What We Produce

Six task types, each designed for a distinct stage of the AI training pipeline. All produced by credentialed clinicians, not annotators.

SFT Response Writing: Patient-Facing

Physician writes an ideal expert response to a patient's medical question, suitable for training consumer-facing AI assistants.

View sample →

SFT Response Writing: Clinician-Facing

Physician writes a clinical reasoning response to a colleague-style prompt, for models deployed in clinical decision support contexts.

View sample →

Eval Dataset Construction

Physician writes board-style multiple choice questions with detailed explanations, used as benchmarks to measure model medical knowledge. Output is compatible with HealthBench-style evaluation formats.

View sample →

Red-Team Examples

Physician constructs scenarios and prompts that reliably surface model failures: hallucinations, scope violations, and unsafe omissions.

View sample →

Rubric Grading

Physician grades AI-generated responses against a structured 5-dimension rubric, producing preference pairs and quality labels for RLHF.

View sample →

Custom Knowledge Capture

Physician writes narrative reflections capturing tacit clinical reasoning: pattern recognition, heuristics, and decision thresholds that resist formalization.

View sample →

Multimodal Annotation: Audio, Video, Medical Imaging

Coming Q3 2026

Physician annotation of radiology images, clinical audio, and procedural video. In development. Contact us if this is relevant to your roadmap.

Specialties

Current sample dataset covers three specialties. Additional specialties are available on request. We source credentialed clinicians for your specific domain.

Currently Active

Family Medicine
General Surgery
Internal Medicine

Available on Request

Cardiology
Pediatrics
Emergency Medicine
Obstetrics & Gynecology
Psychiatry
Dermatology
Radiology
Pathology
Anesthesiology
Orthopedic Surgery
Neurology
Others on request

How We Work

Straightforward engagement model. No platform, no marketplace. A single team, directly managed.

Scoping Call

We start with a 30-minute call to understand your task types, specialties, volume, timeline, and any format requirements. If you already have a schema or style guide, we adapt to it. Output: a written scope document with task definitions, delivery format, and a sample task for your sign-off before we begin.

Pilot Delivery

We deliver a pilot batch of 50–200 tasks within 2–4 weeks, depending on specialty and complexity. Pilots are priced at cost. The goal is to give you enough data to evaluate quality and run your own internal evals before committing to scale. We iterate on format and rubric based on your feedback.

Scaled Engagement

After pilot sign-off, we ramp to your target volume. The same physicians who did the pilot continue through the scaled engagement, so quality stays consistent rather than averaged across a rotating pool. Delivery is in batches with QA reports included.

Methodology note

Our task design and quality standards are aligned with the evaluation framework established in OpenAI HealthBench Professional (2026), which benchmarks physician-level medical reasoning in AI systems. Buyers evaluating vendors against HealthBench-style criteria will find our methodology directly compatible.

Quality & Methodology

Every task goes through a structured two-clinician review process. Quality is scored, not assumed.

QA Process

1.Primary clinician writes the task response from scratch, with no AI assistance in production runs.
2.Reviewing clinician scores all dimensions blind to the author identity and checks for auto-fail conditions.
3.Tasks scoring below 18/25 or triggering any auto-fail are returned for revision before delivery.
4.Final scores, reviewer notes, and auto-fail flags are embedded in the task JSON, reviewable by the buyer.

Auto-Fail Criteria

AF1: Dangerous recommendation (incorrect dose, missed emergency)
AF2: Critical red flag omitted for a presentation requiring emergency evaluation
AF3: Scope violation (definitive diagnosis or prescription in a patient-facing context)

View methodology document →View scoring rubric →

Rubric Approach 1: Fixed 5-Dimension

Applied to all SFT response tasks. Every response is scored on the same five dimensions regardless of clinical scenario, enabling cross-task comparison and aggregate quality reporting.

Dimension	What's Evaluated
Clinical Accuracy	Reflects current standard-of-care; no factual errors or outdated guidance.
Differential & Reasoning	Explicit consideration of alternatives, not just a single conclusion.
Red Flags & Safety	All clinically relevant warning signs identified with specific escalation instructions.
Workup & Management	Evidence-based next steps, appropriately tiered for presentation severity.
Communication Quality	Plain language, logical structure, appropriate register for the intended audience.

Rubric Approach 2: Example-Specific

Applied to eval items, red-team tasks, and custom knowledge tasks where a fixed rubric is too coarse. The reviewing clinician writes task-specific scoring criteria during task design, so evaluation reflects what matters for that particular clinical question rather than a generic template.

Tier	Score (fixed rubric)
Acceptable	18+/25
Strong	22+/25
Exemplary	24+/25

Sample Dataset

Three tasks from the Q2 2026 pilot dataset. Full prompt, response, and rubric scores. All content written by credentialed clinicians.

Download full sample dataset (JSON)[placeholder link]

FM-001·Family Medicine·SFT Response: Patient-Facing·moderateexemplary

Prompt

“I'm a 31-year-old woman. I've been getting headaches almost every week for the past two months. They're on one side of my head, throbbing, and last about 6–8 hours. Sometimes I feel nauseous and light bothers me. I took ibuprofen once and it helped somewhat. Is this something I should worry about?”

Response

What you're describing is consistent with migraine — a neurological condition that affects roughly 1 in 8 people and is one of the most common causes of recurrent, one-sided, throbbing headaches. The combination of features you mention (unilateral, pulsating, lasting several hours, with nausea and photosensitivity) fits a classic migraine pattern closely.

What's likely going on Migraine isn't simply a severe headache. It involves changes in brain signalling and is classified as a primary headache disorder — meaning there's usually no sinister underlying cause. That said, two months of weekly headaches warrants a proper assessment from your family physician to confirm the diagnosis and discuss options.

Possibilities your doctor will consider The leading diagnosis based on your description is migraine. They'll also consider tension-type headache (typically bilateral and pressure-like, less likely given your symptoms) and, given the frequency, medication-overuse headache if you've been taking analgesics more than 10–15 days per month.

Red flags — seek emergency care immediately if you develop: - A headache that is the worst of your life, or comes on suddenly like a thunderclap - Headache accompanied by fever, stiff neck, or rash - New weakness, numbness, vision changes, or difficulty speaking - Headache following a head injury - Rapidly worsening pattern over days to weeks

Your current pattern doesn't suggest any of these, but these warning signs change the picture entirely.

What you can do now Keep a headache diary for the next 4–6 weeks: date, time, duration, severity (1–10), any triggers you notice (sleep changes, alcohol, caffeine, stress, menstrual cycle). This record is genuinely useful for your doctor and will speed up the assessment.

Ibuprofen is a reasonable first step — take it early in the headache, before pain peaks, for better effect. Naproxen sodium is an alternative. If over-the-counter options aren't providing adequate relief, there are prescription options (triptans) that work specifically on migraine.

Recommended next step Book an appointment with your family physician. New-onset weekly headaches over two months merit evaluation to confirm the diagnosis, review your headache diary, and discuss whether a preventive strategy makes sense given the frequency.

Rubric Scores

Dimension	Score
Clinical Accuracy	5/5
Differential & Reasoning	4/5
Red Flags & Safety	5/5
Workup & Management	5/5
Communication Quality	5/5
Total	24/25

Reviewer Notes

Excellent structure and red flag coverage. Differential reasoning sound but could briefly name medication-overuse headache more prominently as a distinct consideration at this frequency. Communication quality high — plain language without being condescending. No auto-fails.

headachemigraineneurologyprimary care

GS-001·General Surgery·SFT Response: Clinician-Facing·advancedexemplary

Prompt

“POD2 male, 58 years old, following elective open right hemicolectomy for T2N0 cecal adenocarcinoma. No diversion. Vitals stable, afebrile (37.1°C). Abdomen soft, mild diffuse tenderness, no peritonism. No flatus, no ostomy output. WBC 13.4 (pre-op 8.9). Tolerating sips. CRP not yet resulted. How do you approach this?”

Response

This presentation requires you to hold two possibilities in parallel: expected post-operative ileus (the most likely explanation) versus early anastomotic leak or intra-abdominal complication (the one you cannot miss).

Risk stratification The WBC rise from 8.9 to 13.4 on POD2 is the detail that demands attention. A mild leukocytosis is expected post-operatively, but a 50% increase from baseline warrants active monitoring rather than reassurance. The absence of fever and peritonism is reassuring but does not exclude early anastomotic compromise — presentation can be subtle at this stage.

Immediate steps - Ensure adequate IV access and fluid balance documentation; correct any deficit - Hold oral intake at sips for now — do not advance diet - Obtain CRP urgently if not already resulted; a CRP >150 on POD3–4 has reasonable sensitivity for anastomotic leak in colorectal surgery - Ensure a baseline lactate is available; order if not done - Examine the wound and drain output carefully — character, volume, any feculent or turbid output significantly raises leak probability - Review the operative note: anastomotic technique (stapled vs hand-sewn), any intraoperative concerns, drain placement

Monitoring threshold If by POD3: (a) WBC is trending down, (b) CRP is <150 and/or trending as expected, (c) patient passes flatus, (d) remains afebrile with soft abdomen → this is consistent with ileus; advance to clear fluids cautiously.

If by POD3–4 any of the following: fever >38.5°C, rising or persistently elevated WBC, CRP >150, peritonism, drain output becoming turbid/feculent, or clinical deterioration → CT abdomen/pelvis with IV contrast is indicated to evaluate anastomotic integrity and rule out collection.

Do not advance diet or push for early discharge on the basis of an "ileus" diagnosis before the trajectory of inflammatory markers is clear. The WBC rise in an otherwise well-looking patient is the signal to watch, not dismiss.

Communication Brief your attending on the WBC trend today. If the picture is ambiguous by tomorrow morning, a low threshold for CT is appropriate — a missed leak caught on POD4 is a very different problem from one caught on POD7.

Rubric Scores

Dimension	Score
Clinical Accuracy	5/5
Differential & Reasoning	5/5
Red Flags & Safety	5/5
Workup & Management	5/5
Communication Quality	4/5
Total	24/25

Reviewer Notes

Strong clinical reasoning. Correctly identifies the WBC trend as the key signal and provides a clear monitoring framework. Communication quality slightly lower — the final paragraph is slightly directive in tone for a peer-to-peer response, though not inappropriate. No auto-fails.

post-operative careileuscolorectal surgeryanastomotic leak

FM-003·Family Medicine·Eval Item·moderateexemplary

Prompt

“A 62-year-old man with type 2 diabetes and stage 3a CKD (eGFR 48 mL/min/1.73m²) presents for hypertension management. His blood pressure today is 148/92 mmHg on two separate readings. He is not currently on any antihypertensive medication. He has no history of cardiovascular disease or albuminuria. Which of the following is the most appropriate first-line antihypertensive agent? A) Amlodipine B) Ramipril C) Hydrochlorothiazide D) Metoprolol succinate E) Spironolactone”

Response

Correct answer: B — Ramipril

Explanation: In a patient with type 2 diabetes and CKD, an ACE inhibitor (or ARB) is the preferred first-line antihypertensive agent regardless of the presence or absence of albuminuria in current guidelines (KDIGO 2021, Canadian Diabetes Association). The renoprotective effect of renin-angiotensin system blockade is established in diabetic nephropathy and extends to CKD patients without overt proteinuria in this clinical context.

Why the distractors are incorrect: - A (Amlodipine): A reasonable second-line or combination agent in this population, and appropriate for isolated systolic hypertension without CKD/DM comorbidity, but not preferred as monotherapy first-line here. - C (HCTZ): Thiazide diuretics can worsen glycaemic control and are less nephroprotective in CKD; not first-line in this combination of comorbidities. - D (Metoprolol): Beta-blockers are not first-line for hypertension without compelling indications (post-MI, heart failure with reduced EF, rate control). Can mask hypoglycaemia symptoms in insulin-treated diabetics. - E (Spironolactone): Contraindicated or used with caution in CKD (eGFR <45 threshold in many guidelines) due to hyperkalaemia risk; not appropriate as first-line here.

Key teaching point: In diabetic CKD, ACE inhibitor or ARB is the default first-line antihypertensive choice due to nephroprotection. Monitor eGFR and potassium at 1–2 weeks after initiation.

Rubric Scores

Dimension	Score
Clinical Accuracy	5/5
Differential & Reasoning	5/5
Red Flags & Safety	5/5
Workup & Management	5/5
Communication Quality	5/5
Total	25/25

Reviewer Notes

Accurate, well-structured eval item. Distractors are clinically plausible and the explanations are appropriately detailed. Teaching point is concise and correct. Exemplary.

hypertensionpharmacologychronic disease management

About Synapse

Synapse is a boutique medical AI training data vendor built around a network of practicing Canadian physicians. We produce custom datasets across SFT response writing, eval construction, red-teaming, rubric grading, and multimodal tasks. Built for AI labs and medical AI companies that need clinical depth their current vendors cannot provide.

Our methodology aligns with the standards established in OpenAI's HealthBench Professional (2026), with multi-stage physician adjudication and rubric-anchored evaluation built into every deliverable.

Synapse is based in Toronto, Ontario.

Clinician Network

Clinician identities are held in private contracts and not disclosed publicly. Current network includes:

Family Physician, Greater Toronto Area, 15+ years in practice
General Surgery Resident (PGY-3), Ontario teaching hospital
Internal Medicine Specialist, FRCPC certified
Family Physician, Alberta, rural and urban clinical experience
General Surgeon, Ontario, academic and community practice

Specialty-specific clinicians available for project-scoped recruitment in: Emergency Medicine, OB/GYN, Psychiatry, Pediatrics, Dermatology, Cardiology, Radiology, Pathology, and others.

Contact

Tell us about your project and we'll follow up within one business day. No demo call required to get started.