TIGP (SNHCC) -- Virtual Musician: An Automated System for Generating Expressive Virtual Violin Performances from Music
- LecturerDr. Ting-Wei Lin (Institute of Information Science, Academia Sinica)
Host: TIGP (SNHCC) - Time2025-10-13 (Mon.) 14:00 ~ 16:00
- LocationAuditorium 106 at IIS new Building
Abstract
Motion-capture (MOCAP)-free music-to-performance generation using deep generative models has emerged as a promising solution for the next generation of animation technologies, enabling the creation of animated musical performances without relying on motion capture. However, building such systems presents substantial challenges, particularly in integrating multiple
independent models responsible for different aspects of avatar control, such as facial expression generation for emotive dynamics and fingering generation for instrumental articulation. Moreover, most existing approaches primarily focus on human-only performance generation, overlooking the critical role of human-instrument interactions in achieving expressive and realistic musical performances.
To address these limitations, this dissertation proposes a comprehensive system for generating expressive virtual violin performances. The system integrates five key modules—expressive music synthesis, facial expression generation, fingering generation, body movement generation, and video shot generation—into a unified framework. By eliminating the need for MOCAP and explicitly modeling human-instrument interactions, this work advances the field of MOCAP-free content-to-performance generation. Extensive experiments, including quantitative analyses and user studies, demonstrate the system's ability to produce realistic, expressive, and synchronized virtual performances, paving the way for interactive applications such as VTubing, music education, and virtual concerts.
independent models responsible for different aspects of avatar control, such as facial expression generation for emotive dynamics and fingering generation for instrumental articulation. Moreover, most existing approaches primarily focus on human-only performance generation, overlooking the critical role of human-instrument interactions in achieving expressive and realistic musical performances.
To address these limitations, this dissertation proposes a comprehensive system for generating expressive virtual violin performances. The system integrates five key modules—expressive music synthesis, facial expression generation, fingering generation, body movement generation, and video shot generation—into a unified framework. By eliminating the need for MOCAP and explicitly modeling human-instrument interactions, this work advances the field of MOCAP-free content-to-performance generation. Extensive experiments, including quantitative analyses and user studies, demonstrate the system's ability to produce realistic, expressive, and synchronized virtual performances, paving the way for interactive applications such as VTubing, music education, and virtual concerts.