How AI Scoring Works
Overview
SageScreen uses a multi-agent architecture. A single screening currently involves 13 specialized AI agents, each responsible for a distinct stage of the process, and this number grows regularly as capabilities expand. From the reviewer's perspective, two agents are the most visible: the interview agent that conducts the conversation and the evaluation agent that produces scored results. But behind the scenes, additional agents handle tasks such as framework generation, transcript analysis, language processing, and quality checks. This article focuses on the components most relevant to understanding scores and reports.
The separation between interviewing and evaluating is intentional: the agent that interviews is not the agent that scores. Each operates with a distinct objective, and the evaluation agent works from the full transcript without the biases that can emerge from live interaction dynamics.
The Interview Agent
The interview agent is the Sage itself: the AI screening agent that candidates interact with during the screening. It conducts a structured, adaptive conversation based on the interview framework generated during the sage build.
During the interview, the agent:
The interview agent's role is to gather information. It does not score, rank, or evaluate the candidate during the conversation. Its output is a complete transcript of the exchange.
The Evaluation Agent
After the interview concludes, the evaluation agent receives the full transcript and the sage's evaluation framework (categories, rubric, and scoring criteria). It processes the transcript as a single unit and produces the structured results that appear in the report.
The evaluation agent operates independently from the interview agent. It was not present during the conversation. It evaluates what was said, not how the conversation felt in the moment. This separation reduces the risk of anchoring effects, where early impressions from the interview might disproportionately influence later scoring.
The agent that interviews is never the agent that scores. This architectural separation is deliberate and prevents live interaction dynamics from influencing the evaluation outcome.
How Categories Are Generated
Evaluation categories are not drawn from a fixed template. They are generated dynamically during the sage build, derived from the role context and cultural alignment inputs you provide.
When you create a sage, you supply a job description, role title, seniority level, evaluation guideline, tone preferences, and any additional instructions. During the build phase, the system analyzes these inputs and generates a set of evaluation categories that reflect the dimensions most relevant to the role. A sage built for a senior software engineer will have different categories than one built for a customer support representative.
Examples of categories might include technical competence, communication clarity, problem-solving approach, cultural alignment, or leadership potential, but the specific set varies per sage. The categories are locked at build time and remain fixed for the life of the deployed sage. Every candidate screened by that sage is evaluated against the same categories.
Scoring: 0–100 Per Category
Each evaluation category receives a percentage score from 0 to 100. The score is accompanied by two supporting elements:
The combination of score, explanation, and quotes gives reviewers a transparent basis for each evaluation dimension. Reviewers can assess whether the score aligns with their own reading of the quoted material.
Overall Score
The overall score is the unweighted arithmetic mean of all individual category scores. Every category contributes equally to the aggregate.
If a sage has four evaluation categories and a candidate scores 92, 85, 78, and 88 in each, the overall score is (92 + 85 + 78 + 88) / 4 = 85.75%.
The overall score provides a single-number summary. It should always be read alongside the individual category breakdowns. A candidate with an 80% overall could have consistent performance across all categories, or they could have a 95% in one area offset by a 65% in another. The category-level detail is where the actionable signal lives.
The Recommendation
The evaluation agent produces a pass or do-not-pass recommendation based on the evaluation guideline selected during sage creation. Four guideline levels are available, each setting a different performance threshold.
The recommendation considers the overall score, individual category performance, and any evaluation rules configured in the sage. All conditions must be met for a positive recommendation.
The recommendation is a signal, not a verdict. It is designed to inform the human reviewer's decision, not replace it.
What AI Scoring Does Not Do
Understanding the boundaries of the system is as important as understanding its capabilities.