Page 26 - profile2014.indd

P. 26

Multimedia shapes our future

Multimedia Technologies Lab

Lab

Research Faculty
Hong-Yuan Mark Liao Multimedia technology is considered to be one of the three most promising indus-
Distinguished Research Fellow tries of the twenty-first century, along with biotechnology and nanotechnology.
Indeed, over the past two decades, we have witnessed how multimedia technol-
Chu-Song Chen
Research Fellow ogy can influence various aspects of our daily life. Its wide spectrum of applications
naturally calls for constant development of a broad range of multimedia techniques,
Wen-Liang Hwang including those related to music, video, image, text, or 3D animation.
Research Fellow
Tyng-Luh Liu The main research interests of the members of the Multimedia Technology Group
Research Fellow at IIS include multimedia signal processing, computer vision, and machine learn-
Chun-Shien Lu ing. In addition to the research interests of each individual investigator, our group
Research Fellow is engaged in joint research activity, which can be best characterized by two ongo-
ing major projects: 1) Integration of Video and Audio Intelligence for Multimedia
Applications, and 2) Compressive Sensing and Sparse Representation. These joint
projects are described in detail below.

A. Integration of Video and Audio Intelligence for Multimedia Applications

This project concerns our research efforts to explore new multimedia techniques
and applications that require the fusion of video and audio intelligence. Specifically,
we are considering how a system might first extract key emotion elements from a
short music clip, and then carry out emotion transfer to the key targets in a video
sequence. Accomplishing such a task would require at least three core techniques.
First, we would need to process the video sequence to have access to the geometric
and appearance information pertaining to meaningful and representative targets.
Second, we need a systematic way to reliably classify and identify important emo-
tions from the music. Third, to complete the emotion transfer, we are considering
using computer graphic methods to manipulate the video targets according to the
extracted music emotions.

The main challenges in carrying out this project are threefold. First, to efficiently
extract and then manipulate video targets from an RGB-based video sequence is
a very challenging task. We are addressing the information gap between 2D and
3D by converting a 2D video target sequence into its corresponding 3D skeleton
sequence, allowing systematic motion manipulation. Second, we need to establish
a well-defined formulation to “quantize” a given piece of music. That is, from an ar-
bitrary music excerpt, we should be able to “calculate” its corresponding emotion
intensity and tempo. Third, as manipulation of the video target is based on the cor-
responding 3D skeleton sequence, increasing its visual impact will require seamless
incorporation of 3D texture synthesis accounting for the varying emotion intensity
and tempo of the intended music into the proposed system.

B. Compressive Sensing and Sparse Representation

We have already made several significant accomplishments through this project. To
solve the signal separation problem, we designed a re-weighting scheme that ap-
preciably outperforms other methods. We also addressed a closely-related problem
of analysis operator learning, and introduced a two-stage iterative method with

26 研究群 Research Laboratories

21 22 23 24 25 26 27 28 29 30 31