Page 14 - My FlipBook
P. 14
tlight Projects亮



畫 The AMCA Project: Automatic Music Concert Animation

Principal Investigator: Dr. Li Su
Project Period: 2019/1~2021/12

Incorporating artificial intelligence into the multimedia based systems that simultaneously detect multiple pitch,
industry is a complex project. It involves integration timing, and instrument types have become possible due
of cross-modal data such as images, sounds, and even to the development of neural networks (NN) in multi-
emotions into multimedia content. For example, for the task learning (MTL) approaches. Moreover, we can now
production of animation, how to make a video match superimpose different types of signal representations,
perfectly with performed or background music is a task allowing convolution kernels in NN to automatically
that requires a lot of e ort from animation producers. The select desired features. Consequently, the training model
purpose of this project is to overcome these difficulties. exhibits enhanced robustness, achieves transposition-
We are endeavoring to teach machines to automatically invariance, and suppresses the challenging overtone
understand music content and then match the body errors usually generated in audio processing. More
movement of virtual characters to that content. This would specifically, our proposed method simplifies the issue
ultimately allow virtual musicians to perform with real of musical transcription into semantic segmentation in
people. We anticipate that, in the future, our technology will computer vision. Our U-Net-based architecture considers
make the animation industry more efficient and enhance convolution kernels via attention or dilation mechanisms to
the possibilities of interactive multimedia presentations. simultaneously process objects of di erent sizes, such as to
identify both short and long musical notes.
In this project, we are generating virtual musicians that
can play music along with live musicians. Our proposed To generate animated body movement, we have achieved
system can be divided into three elements: audio analysis, preliminary results based on the motion of a violin player.
motion generation, and real-time synchronization. Audio Using a recording of a violin solo as the input signal, we
analysis primarily involves automatic music transcription, automatically generate coordinate values of the body joints
melody detection, and musical instrument recognition. In for a virtual violinist. Long-term body rhythms can also
the past, due to the diversity of signal characteristics and be determined by our music emotion recognition model.
data labels, it was di cult to establish a systematic solution Instead of employing an end-to-end NN, we are focusing
for automatic music analysis. Nowadays, deep learning- on more interpretable and controllable body movement

Figure 1 : The virtual musician system.

12
   9   10   11   12   13   14   15   16   17   18   19