Institute of Information Science, Academia Sinica

Research

Print

Press Ctrl+P to print from browser

Recent Research Results

:::

Function Clustering to Optimize Resource Utilization on Container Platform

IEEE International Conference on Parallel and Distributed Systems, December 2023

Chao-Yu Lee, Ding-Yong Hong, Pangfeng Liu and Jan-Jan Wu

Ding-Yong Hong Jan-Jan Wu

Abstract

In recent years, container technology has gained significant attention in the software industry, with many businesses opting for its elasticity, cost-effectiveness, is not without challenges. The ”cold-start” problem is the most critical issue in deploying containers. The cold-start time is the delay from a container being provisioned on a physical server to getting ready to run the application. The end users need to endure delays when they run the application at the time the container has just started. These delays cause a negative user experience and may deteriorate the business’s profitability. The most common way to ensure a seamless user experience is to keep a substantial number of containers active throughout the day, which causes resource over-provision. Conversely, closing the container right after handling the requests can reduce memory consumption but generate a cold start whenever the request arrives. Cold start occurrence and resource usage is a trade-off and presents a significant challenge on the container platform. To address this challenge, we observe that serving consecutive requests with the same container can notably decrease the number of cold starts. We propose TAC, a Temporal Adjacency Function Clustering algorithm, to meet the challenge. TAC selects the functions with time adjacency requests into a cluster from the historical data. TAC packs functions serving time adjacency requests into a cluster to reduce cold starts and enable efficient resource utilization. The experiment result shows that TAC reduces 8% cold start occurrences and 53% memory usage with the real-world traces compared to the state-of-the-art methods, e.g., Defuse and Hybrid histogram policy.

Enabling Highly-Efficient DNA Sequence Mapping via ReRAM-based TCAM

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2023

Yu-Shao Lai, Shuo-Han Chen, and Yuan-Hao Chang

Shuo-Han Chen Yuan-Hao Chang

Abstract

In the post-pandemic era, third-generation DNA sequencing (TGS) has received increasing attention from both academics and industries. As TGS technologies have become a requisite for extracting DNA sequences, the DNA sequence mapping, which is the most basic bioinformatics application and the core of polymerase chain reaction (PCR) tests, receives great challenges, due to the large size and noisy nature of TGS technologies. In addition, the ever-increasing data volume of DNA sequences also induces the issue of memory wall while large datasets are moved between the memory and the computing units. However, much less effort has been devoted to DNA sequence mapping acceleration while considering both the memory wall issue and the challenges of TGS technologies. To enable highly-efficient DNA sequence mapping, this study proposes a novel resistive random-access memory (ReRAM)-based ternary content-addressable memory (TCAM) and exploits the intrinsic parallelity of ReRAM crossbar for efficient mapping acceleration. Promising results have been demonstrated through a series of experiments with different scales of datasets.

Sky-NN: Enabling Efficient Neural Network Data Processing with Skyrmion Racetrack Memory

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2023

Yong-Cheng Liao, Shuo-Han Chen, Yuan-Hao Chang, and Yu-Pei Liang

Shuo-Han Chen Yuan-Hao Chang

Abstract

The thriving of artificial intelligence has brought numerous efforts to build strengthened and sophisticated neural network models to resolve almost all kinds of problems in different academic fields. Owing to the growing complexity and size of neural networks, nonvolatile random access memory (NVRAM) has been utilized to avoid excessive data movements between volatile memory and persistent storage. Among various NVRAM alternatives, skyrmion racetrack memory (SK-RM) is regarded as a promising candidate owing to its high memory density and efficient reads and writes. Nevertheless, due to the distinct shift operation of SK-RM, directly applying existing data process methods of neural networks on SK-RM hinders the benefits and performance of both SK-RM and neural networks. To resolve this issue, this paper proposes Sky-NN to enable efficient NN data processing methods on SK-RM by utilizing the distinct shift and re-assemblability capability of skyrmions. A series of experiments were conducted to demonstrate the capability of Sky-NN.

Improving quantitation accuracy in isobaric-labeling mass spectrometry experiments with spectral library searching and feature-based peptide-spectrum match filter

Scientific Reports, August 2023

Tzu-Yun Kuo, Jen-Hung Wang, Yung-Wen Huang, Ting-Yi Sung* and Ching-Tai Chen*

Jen-Hung Wang Ting-Yi Sung Ching-Tai Chen

Abstract

Isobaric labeling relative quantitation is one of the dominating proteomic quantitation technologies. Traditional quantitation pipelines for isobaric-labeled mass spectrometry data are based on sequence database searching. In this study, we present a novel quantitation pipeline that integrates sequence database searching, spectral library searching, and a feature-based peptide-spectrum-match (PSM) filter using various spectral features for filtering. The combined database and spectral library searching results in larger quantitation coverage, and the filter removes PSMs with larger quantitation errors, retaining those with higher quantitation accuracy. Quantitation results show that the proposed pipeline can improve the overall quantitation accuracy at the PSM and protein levels. To our knowledge, this is the first study that utilizes spectral library searching to improve isobaric labeling-based quantitation. For users to conveniently perform the proposed pipeline, we have implemented the feature-based filter being executable on both Windows and Linux platforms; its executable files, user manual, and sample data sets are freely available at https://ms.iis.sinica.edu.tw/comics/Software_FPF.html. Furthermore, with the developed filter, the proposed pipeline is fully compatible with the Trans-Proteomic Pipeline.

REFROM: Responsive, Energy-efficient Frame Rendering for Mobile Devices

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2023

Tsung-Yen Hsu, Yi-Shen Chen, Yun-Chih Chen, Yuan-Hao Chang, and Tei-Wei Kuo

Yuan-Hao Chang

Abstract

The increasing demand for high-quality graphics on mobile devices necessitates a high frame rate for display refresh. However, current process scheduling and memory management policies fail to consider the computation demands of frame rendering because they are optimized for saving energy and resource utilization. This leads to unresponsive displays for mobile users due to rendering delays. Accurately estimating computation demands is challenging for the mobile operating system, particularly under memory pressure, without displayspecific semantics from user space. Moreover, the complexity of frame rendering makes it infeasible to schedule them with real-time policies. To address these issues, we propose a new framework called REFROM that utilizes a history-based frame time estimator to analyze frame time samples from UI threads and predict the computation requirements of upcoming frames. Experimental results demonstrate that REFROM reduces the number of delayed frames by up to 40% and improves up to 4% energy efficiency compared to the existing approaches.

Attention Discriminant Sampling for Point Clouds

International Conference on Computer Vision (ICCV), October 2023

Cheng-Yao Hong, Yu-Ying Chou and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

This paper describes an attention-driven approach to 3-D point cloud sampling. Our method is established based on a structure-aware attention discriminant analysis that explores geometric and semantic relations embodied among points and their clusters. The proposed {\em attention discriminant sampling} (ADS) starts by efficiently decomposing a given point cloud into clusters to implicitly encode its structural and geometric relatedness among points. By treating each cluster as a structural component, ADS then draws on evaluating two levels of self attention: within-cluster and between-cluster. The former reflects the semantic complexity entailed by the learned features of points within each cluster, while the latter reveals the semantic similarity between clusters. Driven by structurally preserving the point distribution, these two aspects of self attention help avoid sampling redundancy and decide the number of sampled points in each cluster.   Extensive experiments demonstrate that ADS significantly improves classification performance to \textbf{95.1\%} on ModelNet40 and \textbf{87.5\%} on ScanObjectNN and achieves \textbf{86.9\%} mIoU on ShapeNet Part Segmentation. For scene segmentation, ADS yields \textbf{91.1\%}  accuracy on S3DIS with higher mIoU to the state-of-the-art and \textbf{75.6\%} mIoU on ScanNetV2. Furthermore, ADS surpasses the state-of-the-art with \textbf{55.0\%} mAP$_{50}$ on ScanNetV2 object detection. 

Shape-Guided Dual-Memory Learning for 3D Anomaly Detection

40th International Conference on Machine Learning (ICML), July 2023

Yu-Min Chu, Chieh Liu, Ting-I Hsieh, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection. Our method is established on the effectiveness of two specialized expert models and their synergy to localize anomalous regions from color and shape modalities. The first expert utilizes geometric information to probe 3D structural anomalies by modeling the implicit distance fields around local shapes. The second expert considers the 2D RGB features associated with the first expert to identify color appearance irregularities on the local shapes. We use the two experts to build the dual memory banks from the anomaly-free training samples and perform shape-guided inference to pinpoint the defects in the testing samples. Owing to the per-point 3D representation and the effective fusion scheme of complementary modalities, our method efficiently achieves state-of-the-art performance on the MVTec 3D-AD dataset with better recall and lower false positive rates, as preferred in real applications. 

ABC-Norm Regularization for Fine-Grained and Long-Tailed Image Classification

IEEE Transactions on Image Processing, July 2023

Yen-Chi Hsu, Cheng-Yao Hong, Ming-Sui Lee, Davi Geiger and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

Image classification for real-world applications often involves complicated data distributions such as fine-grained and long-tailed. To address the two challenging issues simultaneously, we propose a new regularization technique that yields an adversarial loss to strengthen the model learning. Specifically, for each training batch, we construct an {\em adaptive batch prediction} (ABP) matrix and establish its corresponding {\em adaptive batch confusion norm} (ABC-Norm). The ABP matrix is a composition of two parts, including an adaptive component to class-wise encode the imbalanced data distribution, and the other component to batch-wise assess the softmax predictions. The ABC-Norm leads to a norm-based regularization loss, which can be theoretically shown to be an upper bound for an objective function closely related to rank minimization. By coupling with the conventional cross-entropy loss, the ABC-Norm regularization could introduce adaptive classification confusion and thus trigger adversarial learning to improve the effectiveness of model learning. Different from most of state-of-the-art techniques in solving either fine-grained or long-tailed problems, our method is characterized with its simple and efficient design, and most distinctively, provides a unified solution. In the experiments, we compare ABC-Norm with relevant techniques and demonstrate its efficacy on several benchmark datasets, including (CUB-LT, iNaturalist2018); (CUB, CAR, AIR); and (ImageNet-LT), which respectively correspond to the real-world, fine-grained, and long-tailed scenarios. 

Extreme Event Discovery with Self-Attention for PM2.5 Anomaly Prediction

IEEE Intelligent Systems, January 2023

Hsin-Chih Yang, Ming-Chuan Yang, Guo-Wei~Wong, Meng Chang Chen

Ming-Chuan Yang Meng-Chang Chen

Abstract

Fine particulate matter (PM2.5) values of a particular location form a time series, whose prediction is challenging due to the complicated interactions between numerous factors from meteorological measurements, terrain conditions, and industry and human habitation activities, and their predictions have attracted considerable attention from the deep learning community. Although the deep learning approach for PM2.5 prediction generally has an acceptable accuracy, it has difficulty in PM2.5 anomaly prediction, while mispredictions prevent the authority from issuing proper instructions to reduce the impact on general health. We use extreme value theory (EVT) to formulate the PM2.5 prediction problem with a self-attention-based neural network implementation. EVT-based loss accounts for the rarity of anomalous data, and self-attention captures global information. Experiments demonstrate that the proposed model obtains an improved performance of 478% in F1 score and 286% in Matthews correlation coefficient (MCC) over the fully connected network, and 229% in F1 and 148% in MCC over the typical Transformer trained with the traditional loss function.

A feature extraction free approach for protein interactome inference from co-elution data

Briefings in Bioinformatics, June 2023

Chen, Y.H., Chao, K.H., Wong, J.Y., Liu, C.F., Leu, J.Y. and Tsai, H.K.

Yu-Hsin Chen Jin Yung Wong Chien-Fu Liu Huai-Kuang Tsai

Abstract

Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic PPI networks. Current methods usually first infer protein-protein interactions (PPIs) based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, SPIFFED, to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.