Lumi — Emotionally Intelligent Voice Concierge
Team led by a Contrivance Inc software engineer with USF and CUNY degrees, specializing in multimodal AI, Spring Boot, and pgvector-based RAG pipelines.
YouTube Video
Project Description
Lumi is an emotion-aware voice concierge for a fictional luxury hotel, The Lumen. It hears HOW a guest speaks — not just the words — and responds differently because of what it heard.
EMOTIONAL ACCURACY: A real acoustic pipeline reads emotion from the voice itself (emotion2vec+ embeddings + openSMILE prosody) and fuses it with a lexical read from an LLM into a Valence-Arousal-Dominance (V-A-D) state. Crucially, Lumi distinguishes service-anger from genuine distress — anger triggers de-escalation and service recovery, while real distress trips a safety guardrail — proving we model emotion, not keywords.
REAL-TIME ADAPTATION: Per turn, the detected emotion drives tone, pacing, word choice, on-screen media, and escalation strategy. A V-A-D trajectory tracks the guest’s emotional arc across a multi-turn conversation (e.g. anxious → angry → relieved → happy) and the reply is spoken back inline. A tiered fallback ladder (model → DSP heuristic → manual toggle) keeps it reliable end-to-end.
MULTILINGUAL: Expressive replies are voiced with ElevenLabs Flash v2.5, which supports multilingual, low-latency speech, with stability/style voice_settings selected by the detected emotion for genuine expressive range.
CREATIVE EXPRESSION: Emotion is a first-class output — voice settings, emotion-gated food/venue media tiles, a live emotional-trajectory visualization, and a red-alert safety mode are all driven by the acoustic read. An evaluation dashboard proves the system reacts to tone, not words (text flatlines while the acoustic signal tracks emotion).
TECHNOLOGIES: ElevenLabs (Flash v2.5 expressive TTS + Voice Design); emotion2vec+ and openSMILE for acoustic emotion/prosody; faster-whisper STT; qwen3 LLM via Ollama for tone-adapted replies; FastAPI inference service on GPU; browser mic capture + web UI SearXNG for media
Prior Work
None. All code, design, and the emotion pipeline were built during the hackathon. The project borrows only already-running open-source models (Ollama/qwen3) on a personal GPU server, read-only, as inference infrastructure.