Articles in this section

How AI Scoring Works

AI Scoring

How AI Scoring Works

The architecture behind how SageScreen conducts interviews, evaluates transcripts, and produces scored results.

13 Specialized AI Agents
3 Isolated Pipelines
0–100 Score Range Per Category

Overview

SageScreen uses a multi-agent architecture. A single screening currently involves 13 specialized AI agents, each responsible for a distinct stage of the process, and this number grows regularly as capabilities expand. From the reviewer's perspective, two agents are the most visible: the interview agent that conducts the conversation and the evaluation agent that produces scored results. But behind the scenes, additional agents handle tasks such as framework generation, transcript analysis, language processing, and quality checks. This article focuses on the components most relevant to understanding scores and reports.

The separation between interviewing and evaluating is intentional: the agent that interviews is not the agent that scores. Each operates with a distinct objective, and the evaluation agent works from the full transcript without the biases that can emerge from live interaction dynamics.

The Interview Agent

The interview agent is the Sage itself: the AI screening agent that candidates interact with during the screening. It conducts a structured, adaptive conversation based on the interview framework generated during the sage build.

During the interview, the agent:

Opens with the intro question defined in the sage's cultural alignment configuration.
Asks contextually relevant questions aligned to the role, seniority, and evaluation criteria.
Adapts follow-up questions in real time based on the candidate's responses.
Maintains the conversational tone selected during sage creation (e.g., Professional, Friendly, Direct).
Manages the interview within the configured duration.

The interview agent's role is to gather information. It does not score, rank, or evaluate the candidate during the conversation. Its output is a complete transcript of the exchange.

The Evaluation Agent

After the interview concludes, the evaluation agent receives the full transcript and the sage's evaluation framework (categories, rubric, and scoring criteria). It processes the transcript as a single unit and produces the structured results that appear in the report.

The evaluation agent operates independently from the interview agent. It was not present during the conversation. It evaluates what was said, not how the conversation felt in the moment. This separation reduces the risk of anchoring effects, where early impressions from the interview might disproportionately influence later scoring.

Important

The agent that interviews is never the agent that scores. This architectural separation is deliberate and prevents live interaction dynamics from influencing the evaluation outcome.

How Categories Are Generated

Evaluation categories are not drawn from a fixed template. They are generated dynamically during the sage build, derived from the role context and cultural alignment inputs you provide.

When you create a sage, you supply a job description, role title, seniority level, evaluation guideline, tone preferences, and any additional instructions. During the build phase, the system analyzes these inputs and generates a set of evaluation categories that reflect the dimensions most relevant to the role. A sage built for a senior software engineer will have different categories than one built for a customer support representative.

Examples of categories might include technical competence, communication clarity, problem-solving approach, cultural alignment, or leadership potential, but the specific set varies per sage. The categories are locked at build time and remain fixed for the life of the deployed sage. Every candidate screened by that sage is evaluated against the same categories.

Scoring: 0–100 Per Category

Each evaluation category receives a percentage score from 0 to 100. The score is accompanied by two supporting elements:

Explanation: a written rationale for the score, describing what the evaluation agent observed in the candidate's responses and why it resulted in that rating.
Quotes: direct excerpts from the candidate's actual responses that support the assessment. These provide a traceable link between the score and the candidate's own words.

The combination of score, explanation, and quotes gives reviewers a transparent basis for each evaluation dimension. Reviewers can assess whether the score aligns with their own reading of the quoted material.

Overall Score

The overall score is the unweighted arithmetic mean of all individual category scores. Every category contributes equally to the aggregate.

Example

If a sage has four evaluation categories and a candidate scores 92, 85, 78, and 88 in each, the overall score is (92 + 85 + 78 + 88) / 4 = 85.75%.

The overall score provides a single-number summary. It should always be read alongside the individual category breakdowns. A candidate with an 80% overall could have consistent performance across all categories, or they could have a 95% in one area offset by a 65% in another. The category-level detail is where the actionable signal lives.

The Recommendation

The evaluation agent produces a pass or do-not-pass recommendation based on the evaluation guideline selected during sage creation. Four guideline levels are available, each setting a different performance threshold.

Guideline Threshold Description
Flexible 65% Open to a range of approaches; lower bar for a positive recommendation.
Relaxed 75% Moderate expectations with room for growth areas.
Balanced 80% Solid performance expected across categories.
Strict 85% High bar; strong performance required in all areas.

The recommendation considers the overall score, individual category performance, and any evaluation rules configured in the sage. All conditions must be met for a positive recommendation.

Note

The recommendation is a signal, not a verdict. It is designed to inform the human reviewer's decision, not replace it.

What AI Scoring Does Not Do

Understanding the boundaries of the system is as important as understanding its capabilities.

1
It does not make screening decisions.
The platform produces structured data: scores, explanations, recommendations, and transcripts. A human reviewer always makes the final determination.
2
It does not weight categories.
All categories contribute equally to the overall score. There is no mechanism to assign higher importance to specific evaluation dimensions.
3
It does not compare candidates against each other.
Each candidate is scored in isolation, against the sage's fixed rubric. Scores are absolute, not relative.
4
It does not learn from results.
There is no feedback loop where human decisions influence future scoring. Each sage's evaluation framework is fixed at build time and does not adapt based on outcomes.
5
It does not train on candidate data.
Candidate responses, transcripts, and evaluation data are not used to train AI models.
6
It does not evaluate identifying information.
The evaluation agent assesses candidates based on the content of their responses. Candidate names and locations are hidden from the evaluation process.
Was this article useful?
Like
Dislike
Help us improve this page
Please provide feedback or comments
Access denied
Access denied