Interview Transcript System

The interview transcript system brings AI-driven objectivity to one of hiring's most subjectively evaluated moments — the candidate interview response.

The Problem with Human-Only Interview Evaluation

Research consistently shows that unstructured interviews are poor predictors of job performance — largely because interviewers unconsciously evaluate candidates on irrelevant factors (confidence, accent, physical appearance, familiarity bias). InnoHire.ai's transcript system provides a structured, language-grounded signal that removes opinion and surfaces evidence.

Supported Interview Formats

The system accepts interview inputs in three forms:

Audio upload — MP3, M4A, WAV files transcribed via a Whisper-based ASR (Automatic Speech Recognition) model.
Video upload — Audio track is extracted and passed to the same ASR pipeline.
Text paste — Interviewers can paste a transcript directly for analysis without ASR.

Transcription Pipeline

For audio/video inputs:

Audio extraction — Video files are stripped to audio only via FFmpeg.
Speaker diarisation — The audio is split into interviewer vs. candidate segments using a diarisation model (important for analysing only candidate responses, not questions).
Transcription — The Whisper ASR model transcribes each segment with timestamps.
Post-processing — Filler words ("um", "uh", "like") are flagged but not removed. They inform the fluency metric.

NLP Evaluation Pipeline

Once a transcript is available (via ASR or direct paste), the NLP evaluation module runs:

Competency Signal Extraction

Each candidate answer is mapped to a set of competency tags from a predefined competency ontology (e.g. "problem solving", "leadership", "technical depth", "communication clarity", "ownership"). A zero-shot classifier assigns a relevance score for each competency per answer.

STAR Framework Scoring

For behavioural questions, the model evaluates whether the candidate's response contains all four STAR components:

Situation — was context provided?
Task — was the candidate's responsibility clear?
Action — were specific actions described?
Result — was a measurable outcome stated?

Each component is scored 0–1. A complete STAR response scores 4.0. The STAR completeness score correlates strongly with answer quality in most competency-based interview frameworks.

Technical Depth Scoring

For technical questions, the model identifies technical entity mentions (tools, concepts, methodologies) and checks their contextual accuracy against a domain knowledge graph. A candidate who says "I used Redis as a write-through cache to reduce DB load" scores higher on technical depth than one who says "I used some caching".

Language & Communication Analysis

Clarity score — reading ease, sentence length variance, concrete vs. vague language ratio
Fluency score — filler word frequency, self-correction rate (from ASR transcript)
Confidence signal — hedging language frequency ("I think", "maybe", "sort of") vs. assertive language

Output: Candidate Interview Report

The system generates a structured report per interview:

{
  "candidateId": "abc123",
  "interviewDate": "2026-02-20",
  "overall_interview_score": 76,
  "competency_scores": {
    "technical_depth": 82,
    "communication_clarity": 74,
    "problem_solving": 80,
    "leadership": 65,
    "ownership": 71
  },
  "star_completeness_avg": 3.2,
  "fluency_score": 78,
  "highlight_quotes": [...],
  "concern_flags": ["low ownership signals", "vague on metrics"],
  "interviewer_summary": "Candidate demonstrated strong technical knowledge..."
}

Privacy & Data Handling

Interview audio and video are processed in memory and not stored beyond the transcription step unless the recruiter explicitly saves the transcript. Transcripts are stored encrypted and are not used to train any external models. Candidates can request transcript deletion under applicable data protection regulations.