Page 26 - profile2014.indd
P. 26

Multimedia shapes our future





                                                   Multimedia Technologies Lab



                 Lab




                Research Faculty
                Hong-Yuan Mark Liao                Multimedia technology is considered to be one of the three most promising indus-
                Distinguished Research Fellow      tries of the twenty-first century, along with biotechnology and nanotechnology.
                                                   Indeed, over the past two decades, we have witnessed how multimedia technol-
                Chu-Song Chen
                Research Fellow                    ogy can influence various aspects of our daily life. Its wide spectrum of applications
                                                   naturally calls for constant development of a broad range of multimedia techniques,
                Wen-Liang Hwang                    including those related to music, video, image, text, or 3D animation.
                Research Fellow
                Tyng-Luh Liu                      The main research interests of the members of the Multimedia Technology Group
                Research Fellow                    at IIS include multimedia signal processing, computer vision, and machine learn-
                Chun-Shien Lu                      ing. In addition to the research interests of each individual investigator, our group
                Research Fellow                    is engaged in joint research activity, which can be best characterized by two ongo-
                                                   ing major projects: 1) Integration of Video and Audio Intelligence for Multimedia
                                                   Applications, and 2) Compressive Sensing and Sparse Representation. These joint
                                                   projects are described in detail below.

                                                   A. Integration of Video and Audio Intelligence for Multimedia Applications

                                                  This project concerns our research efforts to explore new multimedia techniques
                                                   and applications that require the fusion of video and audio intelligence. Specifically,
                                                   we are considering how a system might first extract key emotion elements from a
                                                   short music clip, and then carry out emotion transfer to the key targets in a video
                                                   sequence. Accomplishing such a task would require at least three core techniques.
                                                   First, we would need to process the video sequence to have access to the geometric
                                                   and appearance information pertaining to meaningful and representative targets.
                                                   Second, we need a systematic way to reliably classify and identify important emo-
                                                   tions from the music. Third, to complete the emotion transfer, we are considering
                                                   using computer graphic methods to manipulate the video targets according to the
                                                   extracted music emotions.

                                                  The main challenges in carrying out this project are threefold. First, to efficiently
                                                   extract and then manipulate video targets from an RGB-based video sequence is
                                                   a very challenging task. We are addressing the information gap between 2D and
                                                  3D by converting a 2D video target sequence into its corresponding 3D skeleton
                                                   sequence, allowing systematic motion manipulation. Second, we need to establish
                                                   a well-defined formulation to “quantize” a given piece of music. That is, from an ar-
                                                   bitrary music excerpt, we should be able to “calculate” its corresponding emotion
                                                   intensity and tempo. Third, as manipulation of the video target is based on the cor-
                                                   responding 3D skeleton sequence, increasing its visual impact will require seamless
                                                   incorporation of 3D texture synthesis accounting for the varying emotion intensity
                                                   and tempo of the intended music into the proposed system.

                                                   B. Compressive Sensing and Sparse Representation

                                                  We have already made several significant accomplishments through this project. To
                                                   solve the signal separation problem, we designed a re-weighting scheme that ap-
                                                   preciably outperforms other methods. We also addressed a closely-related problem
                                                   of analysis operator learning, and introduced a two-stage iterative method with






          26    研究群 Research Laboratories
   21   22   23   24   25   26   27   28   29   30   31