Service
Clinical RL Environments
AI companies training agent-based medical AI systems — particularly teams using RL, PPO, GRPO, or DPO-style training that needs interactive simulators rather than static preference data. Useful for both reward modelling and post-training agent fine-tuning, and for any team that needs a clinical-grade scoring harness their RL loop can call.
Simulated clinical workflows your agent can train in.
- 01
We design and operate clinical RL environments — simulated medical workflows where AI agents take actions, receive observations, and earn rewards based on clinician-defined criteria.
- 02
Each environment is built around a real clinical workflow (triage queue, prescribing flow, ED handover, patient conversation) with reward functions designed by practising clinicians and trajectory scoring run by our Phase 2-calibrated evaluator network.
- 03
Environments ship as Docker images with documented APIs, deterministic seeds, and full audit trails — so every agent run is reproducible and every reward is justifiable.
Every engagement, audit-ready.
Structured outputs you can take to clinical safety reviews, procurement, and regulators — with the underlying methodology referenced throughout.
- 01
Clinical workflow simulator (Docker image + documented HTTP API)
- 02
Clinician-defined reward function with severity weights, signed off by a practising specialist
- 03
Calibrated trajectory scoring from Phase 2 evaluators — with per-step and per-episode reliability scores
- 04
Reward Reliability Report with Beta-Binomial / Bootstrap confidence intervals
- 05
Deterministic seeds and reproducible run logs for safety-case audit
- 06
Failure-mode coverage matrix mapped to the 10-category clinical safety taxonomy
Clinicians design the rewards. Clinicians score the agent.
Our environments aren't built by ML engineers in a vacuum — every reward function is signed off by practising clinicians who actually run the workflow being simulated. Combined with calibrated Phase 2 evaluators scoring agent trajectories, you get an environment that is both technically sound and clinically grounded. The same Reliability Report methodology that backs every evaluator backs the environments themselves — so the rewards your agent learns from are auditable end-to-end.
Other services
Engagements often combine evaluation, annotation, red-teaming, and advisory across the medical AI lifecycle.