Voice Tutor
How It Works

Voice Tutor

TutorQ's voice tutor provides real-time speech-to-speech tutoring powered by AWS Nova Sonic. Students speak naturally, and the AI responds with voice — grounded in the course materials.

How It Works

Student speaks a question

Audio streamed to TutorQ (16kHz PCM)

Nova Sonic processes speech and decides whether to search course materials

If needed: RAG search finds relevant passages from uploaded materials

Nova Sonic generates a spoken response using the found content

Audio streamed back to student (24kHz)

Key Features

Curriculum-Grounded

Every answer comes from the professor's uploaded materials. The AI uses a search_course_materials tool to find relevant passages before responding. No hallucinations from generic training data.

Voice-First

Students speak naturally — no typing. The AI responds with voice. This is especially valuable for:

  • Students who struggle with written text
  • Mobile users
  • Hands-free study sessions
  • Students with reading difficulties

Adaptive Teaching

Six discussion modes that adapt to how the student wants to learn — from direct explanation to Socratic dialogue to guided reading.

Barge-In Support

Students can interrupt the AI mid-response (barge-in), just like in a real conversation. The AI stops immediately and listens.

Multi-Language

Voice selection adapts to language. Currently supported: English (matthew, tiffany, gregory, stephen voices) and Hindi.

Technical Details

FeatureSpecification
Speech modelAWS Nova Sonic (amazon.nova-2-sonic-v1:0)
Input audio16kHz, 16-bit PCM, mono
Output audio24kHz, 16-bit PCM, mono
Latency~1-2 seconds for first response
Max session10 minutes (configurable)
VADBuilt-in voice activity detection
ProtocolWebSocket (bidirectional streaming)

Session Flow

  1. Connect — WebSocket connection established
  2. Init — Client sends init message with course ID and language
  3. Session start — Nova Sonic session created with system prompt and course context
  4. Audio loop — Bidirectional audio streaming (student ↔ AI)
  5. Tool calls — AI searches course materials when relevant
  6. End — Client sends end_session, metrics saved to database