CertusHire
Autonomous Multi-Agent Interview Platform
“How do AI agents coordinate under uncertainty in real-time adaptive evaluation systems?”
85%
Human Agreement
<200ms
Response Latency
100+
Concurrent Sessions
90%
Screening Effort Reduced
01. The Problem
Technical hiring is broken. Teams spend 40+ hours per position on manual phone screens, yet studies consistently show that unstructured interviews predict job performance no better than chance. Existing AI tools are scripted chatbots — they ask the same questions regardless of whether the candidate aces or struggles with the previous answer. The result: high-performers are screened out by rigid scripts, and low-performers slip through when the difficulty never adapts.
02.Why It's Hard
Real-time adaptive evaluation is hard because it requires multiple AI components to coordinate in milliseconds while maintaining consistent grading criteria. A naive single-agent approach either hallucinates technical facts or becomes so slow that candidates experience unnatural pauses. Multi-agent designs introduce their own problem: agents disagree, produce inconsistent assessments, and require arbitration logic — which itself becomes a bottleneck.
03.Our Approach: DAR3 Protocol
DAR3 (Dynamic Adaptive Role-Reassignment) is a state-machine–based coordination protocol where three specialized agents — Orchestrator, Interviewer, and Evaluator — transition through defined states based on candidate signals. The Orchestrator monitors performance trajectories and triggers role reassignments: if the Evaluator flags a candidate struggling, the Interviewer agent transitions to a scaffolded difficulty mode, drawing from a RAG-grounded question bank. All inter-agent messages are typed, bounded in size, and processed within a sub-50ms budget per hop, enabling end-to-end <200ms response latency.
Architecture — The system is composed of four layers: Agent Coordination (DAR3 state machine), Knowledge Grounding (hybrid dense + sparse RAG), Infrastructure (FastAPI + Celery/RabbitMQ), and Safety (Constitutional AI guardrails).
- 1.Candidate message arrives via WebSocket → Orchestrator evaluates state transition
- 2.Orchestrator dispatches to Interviewer agent (generates next question) or Evaluator agent (grades response)
- 3.Interviewer queries hybrid RAG: dense embeddings (semantic recall) + TF-IDF (exact-match precision)
- 4.Evaluator scores response against rubric → sends signal back to Orchestrator
- 5.Orchestrator updates difficulty trajectory → issues next state
- 6.Response streamed to candidate via React frontend with constitutional safety check
04. Key Results
- ▹Designed DAR3 — a state-machine–based multi-agent coordination protocol enabling dynamic Interviewer ↔ Evaluator role assignment based on real-time candidate performance
- ▹Implemented hybrid dense–sparse RAG pipeline (embeddings + TF-IDF) achieving 85% agreement with human interviewer judgments
- ▹Scaled to 100+ concurrent sessions with fault-tolerant FastAPI + Celery/RabbitMQ backend at <200ms latency
- ▹Constitutional AI safety guardrails + low-latency React/WebSocket frontend
| Method | Agreement |
|---|---|
| Human Interviewer (baseline) | 100% |
| Single-agent (no RAG) | 61% |
| Static interviewer + dense RAG | 74% |
| DAR3 + hybrid RAG (ours)← ours | 85% |
05.What I Learned & Open Questions
The hardest part wasn't building the agents — it was defining the state machine. Too few states and the system is inflexible; too many and transitions become unpredictable.
Constitutional AI guardrails must run in the hot path, not as an async post-check, or they create an inconsistent user experience.
Open question: Can formal verification methods (like TLA+) be applied to DAR3's state transitions to prove safety properties before deployment?
Open question: How does DAR3 generalize to collaborative evaluation settings, where multiple human interviewers observe the same session?