The Problem with Human-Only Interview Evaluation
Research consistently shows that unstructured interviews are poor predictors of job performance โ largely because interviewers unconsciously evaluate candidates on irrelevant factors (confidence, accent, physical appearance, familiarity bias). InnoHire.ai's transcript system provides a structured, language-grounded signal that removes opinion and surfaces evidence.
Supported Interview Formats
The system accepts interview inputs in three forms:
- Audio upload โ MP3, M4A, WAV files transcribed via a Whisper-based ASR (Automatic Speech Recognition) model.
- Video upload โ Audio track is extracted and passed to the same ASR pipeline.
- Text paste โ Interviewers can paste a transcript directly for analysis without ASR.
Transcription Pipeline
For audio/video inputs:
- Audio extraction โ Video files are stripped to audio only via FFmpeg.
- Speaker diarisation โ The audio is split into interviewer vs. candidate segments using a diarisation model (important for analysing only candidate responses, not questions).
- Transcription โ The Whisper ASR model transcribes each segment with timestamps.
- Post-processing โ Filler words ("um", "uh", "like") are flagged but not removed. They inform the fluency metric.
NLP Evaluation Pipeline
Once a transcript is available (via ASR or direct paste), the NLP evaluation module runs:
Competency Signal Extraction
Each candidate answer is mapped to a set of competency tags from a predefined competency ontology (e.g. "problem solving", "leadership", "technical depth", "communication clarity", "ownership"). A zero-shot classifier assigns a relevance score for each competency per answer.
STAR Framework Scoring
For behavioural questions, the model evaluates whether the candidate's response contains all four STAR components:
- Situation โ was context provided?
- Task โ was the candidate's responsibility clear?
- Action โ were specific actions described?
- Result โ was a measurable outcome stated?
Each component is scored 0โ1. A complete STAR response scores 4.0. The STAR completeness score correlates strongly with answer quality in most competency-based interview frameworks.
Technical Depth Scoring
For technical questions, the model identifies technical entity mentions (tools, concepts, methodologies) and checks their contextual accuracy against a domain knowledge graph. A candidate who says "I used Redis as a write-through cache to reduce DB load" scores higher on technical depth than one who says "I used some caching".
Language & Communication Analysis
- Clarity score โ reading ease, sentence length variance, concrete vs. vague language ratio
- Fluency score โ filler word frequency, self-correction rate (from ASR transcript)
- Confidence signal โ hedging language frequency ("I think", "maybe", "sort of") vs. assertive language
Output: Candidate Interview Report
The system generates a structured report per interview:
{
"candidateId": "abc123",
"interviewDate": "2026-02-20",
"overall_interview_score": 76,
"competency_scores": {
"technical_depth": 82,
"communication_clarity": 74,
"problem_solving": 80,
"leadership": 65,
"ownership": 71
},
"star_completeness_avg": 3.2,
"fluency_score": 78,
"highlight_quotes": [...],
"concern_flags": ["low ownership signals", "vague on metrics"],
"interviewer_summary": "Candidate demonstrated strong technical knowledge..."
}Privacy & Data Handling
Interview audio and video are processed in memory and not stored beyond the transcription step unless the recruiter explicitly saves the transcript. Transcripts are stored encrypted and are not used to train any external models. Candidates can request transcript deletion under applicable data protection regulations.