Research2025

CertusHire

Autonomous Multi-Agent Interview Platform

“How do AI agents coordinate under uncertainty in real-time adaptive evaluation systems?”

85%

Human Agreement

<200ms

Response Latency

100+

Concurrent Sessions

90%

Screening Effort Reduced

01. The Problem

Technical hiring is broken. Teams spend 40+ hours per position on manual phone screens, yet studies consistently show that unstructured interviews predict job performance no better than chance. Existing AI tools are scripted chatbots — they ask the same questions regardless of whether the candidate aces or struggles with the previous answer. The result: high-performers are screened out by rigid scripts, and low-performers slip through when the difficulty never adapts.

02.Why It's Hard

Real-time adaptive evaluation is hard because it requires multiple AI components to coordinate in milliseconds while maintaining consistent grading criteria. A naive single-agent approach either hallucinates technical facts or becomes so slow that candidates experience unnatural pauses. Multi-agent designs introduce their own problem: agents disagree, produce inconsistent assessments, and require arbitration logic — which itself becomes a bottleneck.

03.Our Approach: DAR3 Protocol

DAR3 (Dynamic Adaptive Role-Reassignment) is a state-machine–based coordination protocol where three specialized agents — Orchestrator, Interviewer, and Evaluator — transition through defined states based on candidate signals. The Orchestrator monitors performance trajectories and triggers role reassignments: if the Evaluator flags a candidate struggling, the Interviewer agent transitions to a scaffolded difficulty mode, drawing from a RAG-grounded question bank. All inter-agent messages are typed, bounded in size, and processed within a sub-50ms budget per hop, enabling end-to-end <200ms response latency.

Architecture — The system is composed of four layers: Agent Coordination (DAR3 state machine), Knowledge Grounding (hybrid dense + sparse RAG), Infrastructure (FastAPI + Celery/RabbitMQ), and Safety (Constitutional AI guardrails).

1.Candidate message arrives via WebSocket → Orchestrator evaluates state transition
2.Orchestrator dispatches to Interviewer agent (generates next question) or Evaluator agent (grades response)
3.Interviewer queries hybrid RAG: dense embeddings (semantic recall) + TF-IDF (exact-match precision)
4.Evaluator scores response against rubric → sends signal back to Orchestrator
5.Orchestrator updates difficulty trajectory → issues next state
6.Response streamed to candidate via React frontend with constitutional safety check

04. Key Results

▹Designed DAR3 — a state-machine–based multi-agent coordination protocol enabling dynamic Interviewer ↔ Evaluator role assignment based on real-time candidate performance
▹Implemented hybrid dense–sparse RAG pipeline (embeddings + TF-IDF) achieving 85% agreement with human interviewer judgments
▹Scaled to 100+ concurrent sessions with fault-tolerant FastAPI + Celery/RabbitMQ backend at <200ms latency
▹Constitutional AI safety guardrails + low-latency React/WebSocket frontend

Method	Agreement
Human Interviewer (baseline)	100%
Single-agent (no RAG)	61%
Static interviewer + dense RAG	74%
DAR3 + hybrid RAG (ours)← ours	85%

05.What I Learned & Open Questions

The hardest part wasn't building the agents — it was defining the state machine. Too few states and the system is inflexible; too many and transitions become unpredictable.
Constitutional AI guardrails must run in the hot path, not as an async post-check, or they create an inconsistent user experience.
Open question: Can formal verification methods (like TLA+) be applied to DAR3's state transitions to prove safety properties before deployment?
Open question: How does DAR3 generalize to collaborative evaluation settings, where multiple human interviewers observe the same session?

06. Tech Stack

Multi-Agent LLMDAR3 ProtocolRAGFastAPIReactWebSocketDockerLangChainllama.cpp

07. Artifacts

GitHub