Recent Research Results

SGCD: Stain-Guided CycleDiffusion for Unsupervised Domain Adaptation of Histopathology Image Classification

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) (Spotlight), December 2025

Hsi-Ling Chen, Chun-Shien Lu, and Pau-Choo Chung

Abstract

The effectiveness of domain translation in addressing image-based problems of Unsupervised Domain Adaptation (UDA) depends on the quality of the translated images and the preservation of crucial discriminative features. However, achieving high-quality and stable translations typically requires paired data, which poses a challenge in scenarios with limited annotations in the target domain. To address this issue, this paper proposes a novel method termed Stain-Guided Cycle Diffusion (SGCD), employing a dual diffusion model with bidirectional generative constraints to synthesize highly realistic data for downstream task fine-tuning. The bidirectional generative constraints ensure that the translated images retain the features critical to the downstream model in properly controlling the generation process. Additionally, a stain-guided consistency loss is introduced to enhance the denoising capability of the dual diffusion model, thereby improving the quality of images translated between different domains using latents from one domain and a diffusion model trained on another. Experiments conducted on four public datasets demonstrate that SGCD can effectively enhance the performance of downstream task models on the target domain.

Safety Alignment Depth in Large Language Models: A Markov Chain Perspective

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2025

Ching-Chia Kao, Chia-Mu Yu, Chun-Shien Lu, and Chu-Song Chen

Abstract

Large Language Models (LLMs) are increasingly adopted in high-stakes scenarios, yet their safety mechanisms often remain fragile. Simple jailbreak prompts or even benign fine-tuning can bypass internal safeguards, underscoring the need to understand the failure modes of current safety strategies. Recent findings suggest that vulnerabilities emerge when alignment is confined to only the initial output tokens. To address this, we introduce the notion of \textit{safety depth}, a designated output position where the model refuses to generate harmful content. While deeper alignment appears promising, identifying the optimal safety depth remains an open and underexplored challenge. We leverage the equivalence between autoregressive language models and Markov chains to derive the first theoretical result on identifying the optimal safety depth. To reach this safety depth effectively, we propose a cyclic group augmentation strategy that improves safety scores across six LLMs. In addition, we uncover a critical interaction between safety depth and ensemble width, demonstrating that larger ensembles can offset shallower alignments. These results suggest that test-time computation, often overlooked in safety alignment, can play a key role. Our approach provides actionable insights for building safer LLMs.

Uncertainty-Guided Exploration for Efficient AlphaZero Training

Annual Conference on Neural Information Processing Systems (NeurIPS), December 2025

Scott Cheng, Meng-Yu Tsai, Ding-Yong Hong, Mahmut Kandemir

Abstract

AlphaZero has achieved remarkable success in complex decision-making problems through self-play and neural network training. However, its self-play process remains inefficient due to limited exploration of high-uncertainty positions, the overlooked runner-up decisions in Monte Carlo Tree Search (MCTS), and high variance in value labels. To address these challenges, we propose and evaluate uncertainty-guided exploration by branching from high-uncertainty positions using our proposed Label Change Rate (LCR) metric, which is further refined by a Bayesian inference framework. Our proposed approach leverages runner-up MCTS decisions to create multiple variations, and ensembles value labels across these variations to reduce variance. We investigate three key design parameters for our branching strategy: where to branch, how many variations to branch, and which move to play in the new branch. Our empirical findings indicate that branching with 10 variations per game provides the best performance-exploration balance. Overall, our end-to-end results show an improved sample efficiency over the baseline by 58.5% on 9x9 Go in the early stage of training and by 47.3% on 19x19 Go in the late stage of training.

Efficient Beam Search for Large Language Models Using Trie-Based Decoding

The 30th Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), November 2025

Brian J Chan, Mao-xun Huang, Jui-Hung Cheng, Chao-Ting Chen, Hen-Hsen Huang

Abstract

This work presents a novel trie (prefix-tree)-based parallel decoding method that addresses the memory inefficiency of batch-based beam search. By sharing a single KV cache across beams with common prefixes, our approach dramatically reduces memory usage and enables efficient decoding. We evaluated our method across three attention architectures--Multi-Head Attention (Phi-3.5-mini-instruct), Grouped Query Attention (Llama-3.1-8B-Instruct), and Sliding Window Attention (Mistral-Small-24B-Instruct-2501)--using CNN/DailyMail for abstractive summarization and HumanEval for code generation. Our experiments demonstrate substantial memory savings (4–8×) and notable decoding speed improvements (up to 2.4×), without compromising generation quality. These results highlight the method's suitability for memory-constrained environments and large-scale deployments.

Detail

BadVim: Unveiling Backdoor Threats in Visual State Space Model

European Conference on Artificial Intelligence (ECAI), October 2025

Cheng-Yi Lee, Yu-Hsuan Chiang, Zhong-You Wu, Chia-Mu Yu, and Chun-Shien Lu

Abstract

Visual State Space Models (VSSM) have shown remarkable performance in various computer vision tasks. However, backdoor attacks pose significant security challenges, causing compromised models to predict target labels when specific triggers are present while maintaining normal behavior on benign samples. In this paper, we investigate the robustness of VSSMs against backdoor attacks. Specifically, we delicately design a novel framework for VSSMs, dubbed BadVim, which utilizes low-rank perturbations on state-wise to uncover their impact on state transitions during training. By poisoning only 0.3% of the training data, our attacks cause any trigger-embedded input to be misclassified to the targeted class with a high attack success rate (over 97%) at inference time. Our findings suggest that the state-space representation property of VSSMs, which enhances model capability, may also contribute to its vulnerability to backdoor attacks. Our attack exhibits effectiveness across three datasets, even bypassing state-of-the-art defenses against such attacks. Extensive experiments show that the backdoor robustness of VSSMs is comparable to that of Transformers (ViTs) and superior to that of Convolutional Neural Networks (CNNs).We believe our findings will prompt the community to reconsider the trade-offs between performance and robustness in model design.

MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion Models

ACM International Conference on Multimedia (ACM MM), October 2025

Po-Yuan Mao, Cheng-Chang Tsai, and Chun-Shien Lu

Abstract

The great success of the diffusion model in image synthesis led to the release of gigantic commercial models, raising the issue of copyright protection and inappropriate content generation. Trainingfree diffusion watermarking provides a low-cost solution for these issues. However, the prior works remain vulnerable to rotation, scaling, and translation (RST) attacks. Although some methods employ meticulously designed patterns to mitigate this issue, they often reduce watermark capacity, which can result in identity (ID) collusion. To address these problems, we propose MaXsive, a training-free diffusion model generative watermarking technique that has high capacity and robustness. MaXsive best utilizes the initial noise to watermark the diffusion model. Moreover, instead of using a meticulously repetitive ring pattern, we propose injecting the X-shape template to recover the RST distortions. This design significantly increases robustness without losing any capacity, making ID collusion less likely to happen. The effectiveness of MaXsive has been verified on two well-known watermarking benchmarks under the scenarios of verification and identification.

Bridging Local and Global Knowledge via Transformer in Board Games

the thirty-fourth International Joint Conference on Artificial Intelligence (IJCAI), August 2025

Yan-Ru Ju, Tai-Lin Wu, Chung-Chin Shih, Ti-Rong Wu

Abstract

Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its limitations in handling scenarios requiring a comprehensive understanding of the entire board, such as recognizing long-sequence patterns in Go. To address this challenge, we propose ResTNet, a network that interleaves residual and Transformer blocks to bridge local and global knowledge. ResTNet improves playing strength across multiple board games, increasing win rate from 54.6% to 60.8% in 9x9 Go, 53.6% to 60.9% in 19x19 Go, and 50.4% to 58.0% in 19x19 Hex. In addition, ResTNet effectively processes global information and tackles two long-sequence patterns in 19x19 Go, including circular pattern and ladder pattern. It reduces the mean square error for circular pattern recognition from 2.58 to 1.07 and lowers the attack probability against an adversary program from 70.44% to 23.91%. ResTNet also improves ladder pattern recognition accuracy from 59.15% to 80.01%. By visualizing attention maps, we demonstrate that ResTNet captures critical game concepts in both Go and Hex, offering insights into AlphaZero's decision-making process. Overall, ResTNet shows a promising approach to integrating local and global knowledge, paving the way for more effective AlphaZero-based algorithms in board games. Our code is available at https://rlg.iis.sinica.edu.tw/papers/restnet.

Detail

NeuroAMP: A Novel End-to-end General Purpose Deep Neural Amplifier for Personalized Hearing Aids

IEEE Transactions on Artificial Intelligence, August 2025

Shafique Ahmed, Ryandhimas E. Zezario, Hui-Guan Yuan, Amir Hussain, Hsin-Min Wang, Wei-Ho Chung, and Yu Tsao

Abstract

The prevalence of hearing aids is increasing. However, optimizing their amplification remains challenging due to the complexity of integrating multiple components in traditional methods. To address this, we present NeuroAMP, a novel deep neural network for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages spectral features and the listener’s audiogram as inputs, and we explore four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction with amplification for improved real-world performance. To enhance generalization, we employed a comprehensive data augmentation strategy during training on diverse speech (TIMIT, TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) shows that the Transformer-based NeuroAMP achieves the best performance, with SRCC scores of 0.9927 (HASQI) and 0.9905 (HASPI) on TIMIT, and 0.9738 (HAAQI) on Cadenza dataset. Notably, the augmentation strategy maintains robust performance on unseen datasets (e.g., VoiceBank-DEMAND, MUSDB18-HQ). Furthermore, Denoising NeuroAMP outperforms both the conventional NAL-R+WDRC method and a two-stage baseline on the VoiceBank-DEMAND dataset, achieving HASPI of 0.90 and HASQI of 0.59. These results highlight the strong potential of NeuroAMP and Denoising NeuroAMP to provide a novel and effective framework for personalized hearing aid amplification.

Detail

From AI to action: Antimicrobial peptides engineered by generative adversarial networks (GANs)-A novel approach to combat resistant Bacteria

Chemical Engineering Journal, September 2025

Chia-Wen Lai, Chung-Yen Lin, Meng-Chen Tsai, Wei-Jung Chen, Chia-Chun Hsieh, Zi-Jing Lin, Li-Jiuan Shen, Ying-Lien Chen, Lee-Jene Lai, Shu-Hwa Chen, Yang-Hsin Shih

Abstract

•AI-powered design: An AI (WGAN-GP) model trained on 3,000+ samples generated novel AMPs, scored and ranked using AI4AMP. •Broad-spectrum efficacy: GAN-pep3 showed potent effects against Gram(+) and Gram(-) pathogens, including MRSA and CRPA. •Membrane-targeting mechanism: GAN-pep3 targets membranes, causing collapse and leakage as evidenced by image data. •High selectivity and safety: AI-designed GAN-pep3 achieved a high therapeutic index with minimal hemolysis. •Therapeutic promise: AI-designed AMPs show strong therapeutic potential to combat antibiotic-resistant infections.

Detail

SmartSpatial: Enhancing 3D Spatial Awareness in Stable Diffusion with a Novel Evaluation Framework

The 34th International Joint Conference on Artificial Intelligence (IJCAI 2025) Special Track on AI, Arts & Creativity, August 2025

Mao-Xun Huang, Brian J Chan, Hen-Hsen Huang

Abstract

Stable Diffusion models have made remarkable strides in generating photorealistic images from text prompts but often falter when tasked with accurately representing complex spatial arrangements, particularly involving intricate 3D relationships. To address this limitation, we introduce SmartSpatial, an innovative approach that not only enhances the spatial arrangement capabilities of Stable Diffusion but also fosters AI-assisted creative workflows through 3D-aware conditioning and attention-guided mechanisms. SmartSpatial incorporates depth information injection and cross-attention control to ensure precise object placement, delivering notable improvements in spatial accuracy metrics. In conjunction with SmartSpatial, we present SmartSpatialEval, a comprehensive evaluation framework that bridges computational spatial accuracy with qualitative artistic assessments. Experimental results show that SmartSpatial significantly outperforms existing methods, setting new benchmarks for spatial fidelity in AI-driven art and creativity.

Detail

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

IEEE Transaction on Audio, Speech and Language Processing, February 2025

Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, and Yu Tsao

Abstract

This paper introduces HAAQI-Net, a non-intrusive music audio quality assessment model for hearing aid users. Unlike traditional methods such as Hearing Aid Audio Quality Index (HAAQI), which requires intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. Leveraging a bidirectional long short-term memory architecture with attention mechanisms and features extracted from a pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. The experimental results demonstrate that, compared to the traditional HAAQI as the reference, HAAQI-Net achieves a linear correlation coefficient (LCC) of 0.9368, a Spearman's rank correlation coefficient (SRCC) of 0.9486, and a mean squared error (MSE) of 0.0064, while significantly reducing the inference time from 62.52 seconds to 2.54 seconds. Furthermore, a knowledge distillation strategy was applied, reducing the parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071, SRCC: 0.9307, MSE: 0.0091). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores, mean opinion score (MOS), by fine-tuning. This adaptation significantly improved the prediction accuracy. Furthermore, the robustness of HAAQI-Net was evaluated under varying sound pressure level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with the accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a reliable solution for music audio quality assessment, significantly contributing to the development of efficient and accurate models in audio signal processing and hearing aid technology.

Detail

Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification

The Thirteenth International Conference on Learning Representations (ICLR 2025 Spotlight), April 2025

Hsun-Yu Kuo, Yin-Hsiang Liao, Yu-Chieh Chao, Wei-Yun Ma, Pu-Jen Cheng

Abstract

Synthetic data augmentation via Large Language Models (LLMs) allows researchers to leverage additional training data, thus enhancing the performance of downstream tasks, especially when real-world data is scarce. However, the generated data can deviate from the real-world data, and this misalignment can bring about deficient results while applying the trained model to applications. Therefore, we proposed efficient weighted-loss approaches to align synthetic data with real-world distribution by emphasizing high-quality and diversified data generated by LLMs using merely a tiny amount of real-world data. We empirically assessed the effectiveness of our methods on multiple text classification tasks, and the results showed that leveraging our approaches on a BERT-level model robustly outperformed standard cross-entropy and other data weighting approaches, providing potential solutions to effectively leveraging synthetic data from any suitable data generator.

Detail