Page 20 - profile2012.indd
P. 20

Research Laboratories  研究群



                                                                                                                                                                        資料處理與探勘實驗室


                         Data Management and




                            Information Discovery                                                                                     Laboratory




          Research Faculty
          Research Faculty

             MengChang Chen            Ming-Syan Chen           Hong-Yuan Mark Liao       Mi-Yen Yeh                                  Yuan-Hao Chang           Chun-Nan Hsu              De-Nian Yang
             Research Fellow           Distinguished Research Fellow  Research Fellow     Assistant Research Fellow                   Assistant Research Fellow  Research Fellow         Assistant Research Fellow




               Group Profile

             In the data explosion era, data of various types (e.g., sensor   gorithms within one time series stream or across multiple ones   port or enhance various applications, or solve difficult location-based prob-
             data, trajectory data, transaction data, multimedia data, Web   under the constraints such as streams are distributed, data with   lems. However, there are difficulties in collecting large volume of data from
             browsing data, etc.) are generated in an increasing rate. Due to   uncertain noise, and with various distance measurements. We   ordinary  users.  In  this  research  project,  we  proposed  the  PLASH  platform
             the abundant and inexpensive of hardware and network, it is   have also designed trajectory mining and search algorithms to   designed to help location-based service (LBS) providers deploy their applica-
             never better timing to explore all possible emerging opportu-  acquire knowledge from huge historical trajectories.      tions conveniently so that users can contribute their efforts and location-re-  With 10  bytes of web-
                                                                                                                                                                                                                  17
             nity of utilizing those data to enhance existing applications or                                                         lated data by using the services, which is the main difference from traditional
             create new applications. Therefore, Data Management and In-  2. Social network analysis and query processing             location-aware services.                                            based information, large
             formation Discovery Group was formed with main objectives to   Analysis of a large social network is a challenging problem since
             initiate innovative researches and to strengthen scientific and   numerating all the possible graph patterns is expensive and in-  The PLASH system provides a GUI to allow user to construct their LAS ap-  collections of scientific
             technological excellence in (1) effective collection, representa-  tractable. Many existing graph analysis methods are designed   plication and generate programs on both smartphone and server, while con-
             tion, storage and processing of massive data, and (2) exploring   for homogenous social networks. In contrast, the major chal-  sidering scalability and compatibility. It also allows users to donate software   and sensor-based data,
             data mining technologies to discover valuable knowledge ef-  lenge faced in analyzing heterogeneous social networks comes   components to be mashed up as an integrated LBS application that it is una-
             ficiently and effectively on various types of data. Currently, the   from multiple types of roles associated in the nodes, while the   voidable to have inherent security problem as well as other system risks. The   skyrocketing multimedia
             research of this group focuses on the following categories: (1)   link relationship is allowed to be different. On the other hand,   data collected by PLASH can be used  for further analysis to enhance existing   and location-based data, the
             Time Series Data Analysis and Mining, (2) Social network analy-  query processing and optimization in social networks are still   application or to solve difficult tasks.
             sis and query processing, (3) Location-based Data Collection   in its infancy stage. Finding a solution following multiple con-                                                              great challenge is all about
             Platform and Applications, (4) Data Centric Storage System De-  straints in a huge social network within limited time is difficult,   4. Data Centric Storage System Designs
             signs. The research project descriptions are as follows.  due to the complicated network structure and parameters as-    Flash-based storage systems play an important role in the mobile storage   the unfathomable amount of
                                                                 sociated in nodes and links.                                         system. In recent years, the flash-based solid-state drive (SSD) has become
             1. Time Series Data Analysis and Mining                                                                                  a popular candidate for the replacement of hard disk drives. Enterprises are   information being generated,
             A time series is a sequence of data at consecutive time instants   Observing that the patterns are essential for social services and   also designing new storage systems with flash memory as the cache or the   stored, discovered and
             spaced at uniform/non-uniform time intervals. For example,   applications, we have identified unique characteristics of het-  main-storage media to reduce the energy consumption and improve the
             hourly sensor readings of many sensors, daily stock trading   erogeneous networks such as node/link type distributions, and   performance/reliability of their data centers. However, due to the advance   utilized on the Internet.
             data in financial market, GPS traces data of objects with mo-  studied the capability of existing sampling algorithms such as   of manufacturing technologies, reliability and  performance  have become
             bility, and so on. By analyzing and mining the time series data   random-based and exploration-based ones on capturing these   critical issues of flash-based storage systems. Meanwhile, emerging storage
             we want to capture the characteristics of data and find inter-  characteristics. Our goal is to design adaptive sampling algo-  media such as phase-change memory provides alternatives in the storage
             esting knowledge for developing further services and applica-  rithms to efficiently identify the heterogeneous graph patterns   system designs, but the key issue is how to improve the performance, reli-
             tions. Technical challenges of the research is to deal with grow-  and  network  characteristics,  while  role-based  information,   ability, and energy-efficiency of storage systems with the integration of the
             ing, high-dimensional, and huge-volume data generated like   such as role-based community detection, will also be exam-  new storage media.
             streams, the main challenge is to develop algorithms with high   ined. Noticing the growing importance of social queries that
             processing efficiency while providing high-quality results.   are potential to be very useful in various social applications, we   Our research focuses on solving the performance, reliability, and energy-effi-
                                                                 have proposed a new social query to automatically identify a         ciency issues of storage systems. We exploited the file-system designs in the
             As many types of data can be modeled as time series, we can   group of familiar individuals and find their common available   operating systems and the management firmware in the storage devices. For
             apply our developed techniques to many applications. For ex-  time slot, when receiving the query from an initiator specifying   example, we developed new designs for native flash file systems to improve
             ample, the co-evolving trend mined from the stock data can   the group size, activity length, and an acquaintance parameter   the performance and reliability of the data stored on flash-based storage sys-
             be provided to stock program traders as decision support, the   that can be properly set for different kinds of activities. We will   tems. For enterprise data centers, due to the energy consumption and huge
             moving behavior learned from huge GPS trajectories of humans   continue to formulate new query problems and design efficient   amount of data, we are exploiting the indexing problem for huge amount
             and vehicles are good for developing location-based services   query optimization algorithms and techniques for finding the   of data (or called big data) with fast-growing capacity, and are studying the
             or urban planning. We have designed offline/online clustering   optimal or approximate solutions in small time.          solutions to get rid of the inherit issues of hard disk drives by adopting new
             algorithm design for multiple streams, and similarity search al-                                                         storage media in enterprise data centers; meanwhile, various technologies
                                                                 3. Location-based Data Collection and Application Deployment         such as bloom filters and data deduplication will be studied and redesigned
                                                                   Platform                                                           to fully utilize the capability of the data centers that adopt new storage me-
                                                                 Location-based data has useful information to be mined to sup-       dia to cooperate or replace traditional hard drives.


               研究群
         20    Research Laboratories
         20
                                                                                                                                                                                                                                            21
                                                                                                                                                                                                                                            21
   15   16   17   18   19   20   21   22   23   24   25