Page 20 - profile2012.indd
P. 20
Research Laboratories 研究群
資料處理與探勘實驗室
Data Management and
Information Discovery Laboratory
Research Faculty
Research Faculty
MengChang Chen Ming-Syan Chen Hong-Yuan Mark Liao Mi-Yen Yeh Yuan-Hao Chang Chun-Nan Hsu De-Nian Yang
Research Fellow Distinguished Research Fellow Research Fellow Assistant Research Fellow Assistant Research Fellow Research Fellow Assistant Research Fellow
Group Profile
In the data explosion era, data of various types (e.g., sensor gorithms within one time series stream or across multiple ones port or enhance various applications, or solve difficult location-based prob-
data, trajectory data, transaction data, multimedia data, Web under the constraints such as streams are distributed, data with lems. However, there are difficulties in collecting large volume of data from
browsing data, etc.) are generated in an increasing rate. Due to uncertain noise, and with various distance measurements. We ordinary users. In this research project, we proposed the PLASH platform
the abundant and inexpensive of hardware and network, it is have also designed trajectory mining and search algorithms to designed to help location-based service (LBS) providers deploy their applica-
never better timing to explore all possible emerging opportu- acquire knowledge from huge historical trajectories. tions conveniently so that users can contribute their efforts and location-re- With 10 bytes of web-
17
nity of utilizing those data to enhance existing applications or lated data by using the services, which is the main difference from traditional
create new applications. Therefore, Data Management and In- 2. Social network analysis and query processing location-aware services. based information, large
formation Discovery Group was formed with main objectives to Analysis of a large social network is a challenging problem since
initiate innovative researches and to strengthen scientific and numerating all the possible graph patterns is expensive and in- The PLASH system provides a GUI to allow user to construct their LAS ap- collections of scientific
technological excellence in (1) effective collection, representa- tractable. Many existing graph analysis methods are designed plication and generate programs on both smartphone and server, while con-
tion, storage and processing of massive data, and (2) exploring for homogenous social networks. In contrast, the major chal- sidering scalability and compatibility. It also allows users to donate software and sensor-based data,
data mining technologies to discover valuable knowledge ef- lenge faced in analyzing heterogeneous social networks comes components to be mashed up as an integrated LBS application that it is una-
ficiently and effectively on various types of data. Currently, the from multiple types of roles associated in the nodes, while the voidable to have inherent security problem as well as other system risks. The skyrocketing multimedia
research of this group focuses on the following categories: (1) link relationship is allowed to be different. On the other hand, data collected by PLASH can be used for further analysis to enhance existing and location-based data, the
Time Series Data Analysis and Mining, (2) Social network analy- query processing and optimization in social networks are still application or to solve difficult tasks.
sis and query processing, (3) Location-based Data Collection in its infancy stage. Finding a solution following multiple con- great challenge is all about
Platform and Applications, (4) Data Centric Storage System De- straints in a huge social network within limited time is difficult, 4. Data Centric Storage System Designs
signs. The research project descriptions are as follows. due to the complicated network structure and parameters as- Flash-based storage systems play an important role in the mobile storage the unfathomable amount of
sociated in nodes and links. system. In recent years, the flash-based solid-state drive (SSD) has become
1. Time Series Data Analysis and Mining a popular candidate for the replacement of hard disk drives. Enterprises are information being generated,
A time series is a sequence of data at consecutive time instants Observing that the patterns are essential for social services and also designing new storage systems with flash memory as the cache or the stored, discovered and
spaced at uniform/non-uniform time intervals. For example, applications, we have identified unique characteristics of het- main-storage media to reduce the energy consumption and improve the
hourly sensor readings of many sensors, daily stock trading erogeneous networks such as node/link type distributions, and performance/reliability of their data centers. However, due to the advance utilized on the Internet.
data in financial market, GPS traces data of objects with mo- studied the capability of existing sampling algorithms such as of manufacturing technologies, reliability and performance have become
bility, and so on. By analyzing and mining the time series data random-based and exploration-based ones on capturing these critical issues of flash-based storage systems. Meanwhile, emerging storage
we want to capture the characteristics of data and find inter- characteristics. Our goal is to design adaptive sampling algo- media such as phase-change memory provides alternatives in the storage
esting knowledge for developing further services and applica- rithms to efficiently identify the heterogeneous graph patterns system designs, but the key issue is how to improve the performance, reli-
tions. Technical challenges of the research is to deal with grow- and network characteristics, while role-based information, ability, and energy-efficiency of storage systems with the integration of the
ing, high-dimensional, and huge-volume data generated like such as role-based community detection, will also be exam- new storage media.
streams, the main challenge is to develop algorithms with high ined. Noticing the growing importance of social queries that
processing efficiency while providing high-quality results. are potential to be very useful in various social applications, we Our research focuses on solving the performance, reliability, and energy-effi-
have proposed a new social query to automatically identify a ciency issues of storage systems. We exploited the file-system designs in the
As many types of data can be modeled as time series, we can group of familiar individuals and find their common available operating systems and the management firmware in the storage devices. For
apply our developed techniques to many applications. For ex- time slot, when receiving the query from an initiator specifying example, we developed new designs for native flash file systems to improve
ample, the co-evolving trend mined from the stock data can the group size, activity length, and an acquaintance parameter the performance and reliability of the data stored on flash-based storage sys-
be provided to stock program traders as decision support, the that can be properly set for different kinds of activities. We will tems. For enterprise data centers, due to the energy consumption and huge
moving behavior learned from huge GPS trajectories of humans continue to formulate new query problems and design efficient amount of data, we are exploiting the indexing problem for huge amount
and vehicles are good for developing location-based services query optimization algorithms and techniques for finding the of data (or called big data) with fast-growing capacity, and are studying the
or urban planning. We have designed offline/online clustering optimal or approximate solutions in small time. solutions to get rid of the inherit issues of hard disk drives by adopting new
algorithm design for multiple streams, and similarity search al- storage media in enterprise data centers; meanwhile, various technologies
3. Location-based Data Collection and Application Deployment such as bloom filters and data deduplication will be studied and redesigned
Platform to fully utilize the capability of the data centers that adopt new storage me-
Location-based data has useful information to be mined to sup- dia to cooperate or replace traditional hard drives.
研究群
20 Research Laboratories
20
21
21