您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

中央研究院 資訊科學研究所

研究

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

近期研究成果

:::

Telomere-to-Telomere, Haplotype-Resolved Chromosome-Level Genome Assembly and Annotation of Taiwan Hard Clam (Meretrix taiwanica)

Scientific Data, May 2026

Ching-Huei Huang, Po-Cheng Hsu, San-Tzu Hsieh, Fu-Shen Tseng, Chung-Yen Lin

Po-Cheng Hsu Chung-Yen Lin

Abstract

Taiwan Hard Clam (Meretrix taiwanica) is an economically important aquaculture species in Taiwan, yet genomic resources for this species have remained fragmented. We present a telomere-to-telomere (T2T), haplotype-resolved, chromosome-level genome assembly for M. taiwanica, generated using PacBio HiFi long reads and Hi-C sequencing. The two haploid assemblies (hap1 and hap2) span 1,006.48 Mb and 1,007.28 Mb, comprising 126 and 66 sequences, respectively, and each containing 19 chromosomes. Hap1 and hap2 exhibit sequence N50 values of 53.87 Mb and 51.57 Mb, with average scaffold lengths of 7.99 Mb and 15.26 Mb, and contain 0.0176% and 0.1313% ambiguous bases. Comparative analyses revealed 81.59% and 83.78% syntenic regions between haplotypes and identified 10,175 structural variations. Repetitive elements constitute 47.06% and 47.02% of the hap1 and hap2 genomes. We annotated 23,320 and 23,598 protein-coding gene models, with median gene lengths of 7,721 bp and 7,657.5 bp, respectively. The mitochondrial genome was assembled at 21,164 bp and encodes 13 protein-coding genes, 22 tRNAs, and 2 rRNAs. Functional annotation covered 16.23% and 16.33% of the nuclear and mitochondrial gene sets. BUSCO analysis indicated genome completeness of 92.4% and 92.5%, and proteome completeness of 95.4% and 94.5% for hap1 and hap2. By providing the first T2T-level reference, this dataset enables precise identification of trait-associated markers for marker-assisted selection (MAS), thereby facilitating genetic improvement of growth and stress-resistance traits. Furthermore, it serves as a robust genomic framework for conservation genomics to assess the genetic diversity of both wild and hatchery populations of this economically vital species.


Regret-Guided Search Control for Efficient Learning in AlphaZero

the Fourteenth International Conference on Learning Representations (ICLR), April 2026

Yun-Jui Tsai, Wei-Yu Chen, Yan-Ru Ju, Yu-Hung Chang, Ti-Rong Wu

Yan-Ru Ju Yu-Hung Chang Ti-Rong Wu

Abstract

Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensive self-play games to extract useful signals, humans often need only a few games, improving rapidly by repeatedly revisiting states where mistakes occurred. This idea, known as search control, aims to restart from valuable states rather than always from the initial state. In AlphaZero, prior work Go-Exploit applies this idea by sampling past states from self-play or search trees, but it treats all states equally, regardless of their learning potential. We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent's evaluation diverges most from the actual outcome. These states are collected from both self-play trajectories and MCTS nodes, stored in a prioritized regret buffer, and reused as new starting positions. Across 9x9 Go, 10x10 Othello, and 11x11 Hex, RGSC outperforms AlphaZero and Go-Exploit by an average of 77 and 89 Elo, respectively. When training on a well-trained 9x9 Go model, RGSC further improves the win rate against KataGo from 69.3% to 78.2%, while both baselines show no improvement. These results demonstrate that RGSC provides an effective mechanism for search control, improving both efficiency and robustness of AlphaZero training. Our code is available at https://rlg.iis.sinica.edu.tw/papers/rgsc.

HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection

The Fourteenth International Conference on Learning Representations (ICLR), April 2026

Chin-Chia Yang, Yung-Yu Chuang, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

Synthetic image generators evolve rapidly, challenging detectors to generalize across current methods and adapt to new ones. We study domain-incremental synthetic image detection with a two-phase evaluation. Phase I trains on either diffusion- or GAN-based data and tests on the combined group to quantify bidirectional cross-generator transfer. Phase II sequentially introduces renders from 3D Gaussian Splatting (3DGS) head avatar pipelines, requiring adaptation while preserving earlier performance. We observe that CLIP-based detectors inherit text-image alignment semantics that are irrelevant to authenticity and hinder generalization. We introduce a Hilbert-Schmidt Independence Criterion (HSIC) bottleneck loss on intermediate CLIP ViT features, encouraging representations predictive of real versus synthetic while independent of generator identity and caption alignment. For domain-incremental learning, we propose HSIC-Guided Replay (HGR), which selects per-class exemplars via a hybrid score combining HSIC relevance with k-center coverage, yielding compact memories that mitigate forgetting. Empirically, the HSIC bottleneck improves transfer between diffusion and GAN families, and HGR sustains prior accuracy while adapting to 3DGS renders. These results underscore the value of information-theoretic feature shaping and principled replay for resilient detection under shifting generative regimes.

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement

IEEE Transaction on Audio, Speech and Language Processing, February 2026

Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, and Berlin Chen

Hsin-Min Wang

Abstract

Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models often suffer from severe performance degradation when confronted with domain shifts, particularly in the presence of unseen noise and channel distortions. In view of this, we in this paper present URSA-GAN, a unified and domain-aware generative framework specifically designed to mitigate mismatches in both noise and channel conditions. URSA-GAN leverages a dual-embedding architecture that consists of a noise encoder and a channel encoder, each pre-trained with limited in-domain data to capture domain-relevant representations. These embeddings condition a GAN-based speech generator, facilitating the synthesis of speech that is acoustically aligned with the target domain while preserving phonetic content. To enhance generalization further, we propose dynamic stochastic perturbation, a novel regularization technique that introduces controlled variability into the embeddings during generation, promoting robustness to unseen domains. Empirical results demonstrate that URSA-GAN effectively reduces character error rates in ASR and improves perceptual metrics in SE across diverse noisy and mismatched channel scenarios. Notably, evaluations on compound test conditions with both channel and noise degradations confirm the generalization ability of URSA-GAN, yielding relative improvements of 16.16% in ASR performance and 15.58% in SE metrics.

Cross-Attention Reprogramming for ASR: Bridging Discrete Speech Units and Pretrained Language Models

IEEE Access, January 2026

Pei-Jun Liao, Hung-Yi Lee, and Hsin-Min Wang

Hsin-Min Wang

Abstract

In automatic speech recognition (ASR), an emerging trend involves converting continuous speech features into sequences of discrete speech units (DSUs) via quantization. A key advantage of DSU representations is their compatibility with pretrained language models (PLMs), where DSUs are directly mapped to PLM token indices and the embedding layer is fine-tuned. However, this conventional strategy often relies heavily on large-scale training data to mitigate the inherent modality mismatch. In light of this, we explore a more effective way to exploit the PLM embedding dictionary. Drawing inspiration from Time-LLM, a recent time-series forecasting model, we propose a cross-attention reprogramming mechanism that incorporates codebook information from the DSU quantizer to better align the DSUs with the PLM embeddings. Compared to direct fine-tuning of PLM embeddings, our method consistently achieves improvements on the Discrete Audio and Speech Benchmark (DASB), reaching state-of-the-art performance across most DASB-style settings. We also evaluate our method on LibriSpeech-960, LibriLight-10, and Swedish, Czech, and Hungarian data from Common Voice, and observe similar trends. Notably, the proposed reprogramming method demonstrates significant gains over the fine-tuning baseline, particularly in cross-lingual and low-resource scenarios. This study proposes a new approach to using PLM embedding dictionaries in DSU-based ASR, and lays a foundation for combining speech representations with large language models in other discriminative tasks of speech processing such as speech emotion recognition and spoken question answering.

Can We Formalise Type Theory Intrinsically without Any Compromise? A Case Study in Cubical Agda

Proceedings of the 15th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP '26), January 2026

Liang-Ting Chen, Fredrik Nordvall Forsberg, Tzu-Chun Tsai

Liang-Ting Chen Fredrik Nordvall Forsberg

Abstract

We present an intrinsic representation of type theory in the proof assistant Cubical Agda, inspired by Awodey’s natural models of type theory. The initial natural model is defined as quotient inductive-inductive-recursive types, leading us to a syntax accepted by Cubical Agda without using any transports, postulates, or custom rewrite rules. We formalise some meta-properties such as the standard model, normalisation by evaluation for typed terms, and strictification constructions. Since our formalisation is carried out using Cubical Agda's native support for quotient inductive types, all our constructions compute at a reasonable speed. When we try to develop more sophisticated metatheory, however, the 'transport hell' problem reappears. Ultimately, it remains a considerable struggle to develop the metatheory of type theory using an intrinsic representation that lacks strict equations. The effort required is about the same whether or not the notion of natural model is used.

Efficient Column-Wise N:M Pruning on RISC-V CPU

Journal of Systems Architecture (JSA), March 2026

Chi-Wei Chu, Ding-Yong Hong, Jan-Jan Wu

Chi-Wei Chu Ding-Yong Hong Jan-Jan Wu

Abstract

In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate’s profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4×, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline.

Uncertainty-Guided Exploration for Efficient AlphaZero Training

Annual Conference on Neural Information Processing Systems (NeurIPS), December 2025

Scott Cheng, Meng-Yu Tsai, Ding-Yong Hong, Mahmut Kandemir

Scott Cheng Ding-Yong Hong

Abstract

AlphaZero has achieved remarkable success in complex decision-making problems through self-play and neural network training. However, its self-play process remains inefficient due to limited exploration of high-uncertainty positions, the overlooked runner-up decisions in Monte Carlo Tree Search (MCTS), and high variance in value labels. To address these challenges, we propose and evaluate uncertainty-guided exploration by branching from high-uncertainty positions using our proposed Label Change Rate (LCR) metric, which is further refined by a Bayesian inference framework. Our proposed approach leverages runner-up MCTS decisions to create multiple variations, and ensembles value labels across these variations to reduce variance. We investigate three key design parameters for our branching strategy: where to branch, how many variations to branch, and which move to play in the new branch. Our empirical findings indicate that branching with 10 variations per game provides the best performance-exploration balance. Overall, our end-to-end results show an improved sample efficiency over the baseline by 58.5% on 9x9 Go in the early stage of training and by 47.3% on 19x19 Go in the late stage of training.

A Grouping Algorithm for Training Tree-Shaped Models on Multiple GPUs with High Efficiency

IEEE International Conference on Computers, Software, and Applications (COMPSAC), July 2025

Cai-Feng Lin, Ding-Yong Hong, Tzu-Hsien Tsai, Pangfeng Liu, Jan-Jan Wu

Ding-Yong Hong Jan-Jan Wu

Abstract

Graph Neural Network (GNN) is an important tool in deep learning to handle structured data, where graphs with nodes and edges represent entities and their relationships. Various challenges arise when GNN is tree-shaped, with irregular connectivity patterns and varying depth. It is difficult to distribute and process the dynamic structure for parallel execution on multiple GPUs. In addition, tree data dependency demands the processing of parent nodes before their children, severely limiting execution parallelism. This research aims to improve the training speed of treeshaped GNN on multi-GPU systems. First, we introduce a cost model that estimates the running time of the training across multiple GPUs. Then, we demonstrate that finding an optimal way to distribute tree-structured data across GPUs is an NP-complete problem on this cost model. We then propose a practical heuristic method for distributing data that improves efficiency while maintaining training quality. The heuristic method first assigns data to batches based on our cost model and then assigns data in each batch to the devices. We also show that our device assigning algorithm is a 4-approximation algorithm. That is, it guarantees that its cost is four times the optimal running time in each training batch, ensuring that it performs effectively in practice. We implement the algorithm and conduct the experiments. The results show that our algorithm achieves a significant increase in training time. The speedup is up to 1.86 for two GPUs, 3.43 for four GPUs, and 7.25 for eight GPUs.