Page 21 - profile2012.indd
P. 21

Research Laboratories  研究群



                                              資料處理與探勘實驗室


 Data Management and




 Information Discovery  Laboratory




 Research Faculty
 Research Faculty

 MengChang Chen  Ming-Syan Chen  Hong-Yuan Mark Liao  Mi-Yen Yeh  Yuan-Hao Chang  Chun-Nan Hsu  De-Nian Yang
 Research Fellow   Distinguished Research Fellow  Research Fellow  Assistant Research Fellow  Assistant Research Fellow  Research Fellow   Assistant Research Fellow




 Group Profile

 In the data explosion era, data of various types (e.g., sensor   gorithms within one time series stream or across multiple ones   port or enhance various applications, or solve difficult location-based prob-
 data, trajectory data, transaction data, multimedia data, Web   under the constraints such as streams are distributed, data with   lems. However, there are difficulties in collecting large volume of data from
 browsing data, etc.) are generated in an increasing rate. Due to   uncertain noise, and with various distance measurements. We   ordinary  users.  In  this  research  project,  we  proposed  the  PLASH  platform
 the abundant and inexpensive of hardware and network, it is   have also designed trajectory mining and search algorithms to   designed to help location-based service (LBS) providers deploy their applica-
 never better timing to explore all possible emerging opportu-  acquire knowledge from huge historical trajectories.   tions conveniently so that users can contribute their efforts and location-re-  With 10  bytes of web-
                                                                                       17
 nity of utilizing those data to enhance existing applications or   lated data by using the services, which is the main difference from traditional
 create new applications. Therefore, Data Management and In-  2. Social network analysis and query processing  location-aware services.   based information, large
 formation Discovery Group was formed with main objectives to   Analysis of a large social network is a challenging problem since
 initiate innovative researches and to strengthen scientific and   numerating all the possible graph patterns is expensive and in-  The PLASH system provides a GUI to allow user to construct their LAS ap-  collections of scientific
 technological excellence in (1) effective collection, representa-  tractable. Many existing graph analysis methods are designed   plication and generate programs on both smartphone and server, while con-
 tion, storage and processing of massive data, and (2) exploring   for homogenous social networks. In contrast, the major chal-  sidering scalability and compatibility. It also allows users to donate software   and sensor-based data,
 data mining technologies to discover valuable knowledge ef-  lenge faced in analyzing heterogeneous social networks comes   components to be mashed up as an integrated LBS application that it is una-
 ficiently and effectively on various types of data. Currently, the   from multiple types of roles associated in the nodes, while the   voidable to have inherent security problem as well as other system risks. The   skyrocketing multimedia
 research of this group focuses on the following categories: (1)   link relationship is allowed to be different. On the other hand,   data collected by PLASH can be used  for further analysis to enhance existing   and location-based data, the
 Time Series Data Analysis and Mining, (2) Social network analy-  query processing and optimization in social networks are still   application or to solve difficult tasks.
 sis and query processing, (3) Location-based Data Collection   in its infancy stage. Finding a solution following multiple con-  great challenge is all about
 Platform and Applications, (4) Data Centric Storage System De-  straints in a huge social network within limited time is difficult,   4. Data Centric Storage System Designs
 signs. The research project descriptions are as follows.  due to the complicated network structure and parameters as-  Flash-based storage systems play an important role in the mobile storage   the unfathomable amount of
 sociated in nodes and links.   system. In recent years, the flash-based solid-state drive (SSD) has become
 1. Time Series Data Analysis and Mining  a popular candidate for the replacement of hard disk drives. Enterprises are   information being generated,
 A time series is a sequence of data at consecutive time instants   Observing that the patterns are essential for social services and   also designing new storage systems with flash memory as the cache or the   stored, discovered and
 spaced at uniform/non-uniform time intervals. For example,   applications, we have identified unique characteristics of het-  main-storage media to reduce the energy consumption and improve the
 hourly sensor readings of many sensors, daily stock trading   erogeneous networks such as node/link type distributions, and   performance/reliability of their data centers. However, due to the advance   utilized on the Internet.
 data in financial market, GPS traces data of objects with mo-  studied the capability of existing sampling algorithms such as   of manufacturing technologies, reliability and  performance  have become
 bility, and so on. By analyzing and mining the time series data   random-based and exploration-based ones on capturing these   critical issues of flash-based storage systems. Meanwhile, emerging storage
 we want to capture the characteristics of data and find inter-  characteristics. Our goal is to design adaptive sampling algo-  media such as phase-change memory provides alternatives in the storage
 esting knowledge for developing further services and applica-  rithms to efficiently identify the heterogeneous graph patterns   system designs, but the key issue is how to improve the performance, reli-
 tions. Technical challenges of the research is to deal with grow-  and  network  characteristics,  while  role-based  information,   ability, and energy-efficiency of storage systems with the integration of the
 ing, high-dimensional, and huge-volume data generated like   such as role-based community detection, will also be exam-  new storage media.
 streams, the main challenge is to develop algorithms with high   ined. Noticing the growing importance of social queries that
 processing efficiency while providing high-quality results.   are potential to be very useful in various social applications, we   Our research focuses on solving the performance, reliability, and energy-effi-
 have proposed a new social query to automatically identify a   ciency issues of storage systems. We exploited the file-system designs in the
 As many types of data can be modeled as time series, we can   group of familiar individuals and find their common available   operating systems and the management firmware in the storage devices. For
 apply our developed techniques to many applications. For ex-  time slot, when receiving the query from an initiator specifying   example, we developed new designs for native flash file systems to improve
 ample, the co-evolving trend mined from the stock data can   the group size, activity length, and an acquaintance parameter   the performance and reliability of the data stored on flash-based storage sys-
 be provided to stock program traders as decision support, the   that can be properly set for different kinds of activities. We will   tems. For enterprise data centers, due to the energy consumption and huge
 moving behavior learned from huge GPS trajectories of humans   continue to formulate new query problems and design efficient   amount of data, we are exploiting the indexing problem for huge amount
 and vehicles are good for developing location-based services   query optimization algorithms and techniques for finding the   of data (or called big data) with fast-growing capacity, and are studying the
 or urban planning. We have designed offline/online clustering   optimal or approximate solutions in small time.  solutions to get rid of the inherit issues of hard disk drives by adopting new
 algorithm design for multiple streams, and similarity search al-  storage media in enterprise data centers; meanwhile, various technologies
 3. Location-based Data Collection and Application Deployment   such as bloom filters and data deduplication will be studied and redesigned
 Platform  to fully utilize the capability of the data centers that adopt new storage me-
 Location-based data has useful information to be mined to sup-  dia to cooperate or replace traditional hard drives.


 研究群
 20  Research Laboratories
 20
                                                                                                                 21
                                                                                                                 21
   16   17   18   19   20   21   22   23   24   25   26