DIffUMI: Training-Free Universal Model Inversion via Unconditional Diffusion for Face Recognition
IEEE Transactions on Information Forensics and Security , April 2026
Hanrui Wang, Shuo Wang, Chun-Shien Lu, and Isao Echizen
Abstract
Face recognition poses serious privacy risks due to its reliance on sensitive and immutable biometric data. While modern systems mitigate privacy risks by mapping facial images to embeddings (commonly regarded as privacy-preserving), model inversion attacks reveal that identity information can still be recovered, exposing critical vulnerabilities. However, existing attacks are often computationally expensive and lack generalization, especially those requiring target-specific training. Even training-free approaches suer from limited identity controllability, hindering faithful reconstruction of nuanced or unseen identities. In this work, we propose DiMI, the first diusion-driven, training-free model inversion attack. DiMI introduces a novel pipeline combining robust latent code initialization, a ranked adversarial refinement strategy, and a statistically grounded, confidence-aware optimization objective. DiMI applies directly to unseen target identities and face recognition models, oering greater adaptability than trainingdependent approaches while significantly reducing computational overhead. Our method achieves 84.42%–92.87% attack success rates against inversion-resilient systems and outperforms the best prior training-free GAN-based approach by 4.01%–9.82%. The implementation is available at https://github.com/azrealwang/ DiMI.
Harnessing Sequence Embedding and Ensemble Learning to Identify Antifungal Peptides with Low Hemolytic Risk
ACS OMEGA, April 2026
Chung-Yen Lin,Wen-Chih Cheng, U-Lin Chen, Tzu-Tang Lin, Li-Hang Hsu, Yang-Hsin Shih, I-Hsuan Lu, Ying-Lien Chen, Shu-Hwa Chen
Abstract
The increasing prevalence of fungal infections represents a growing threat to human health, driven in part by the misuse of antibiotics and the rising incidence of resistance to conventional antifungal agents. Antifungal peptides (AFPs) have emerged as promising alternatives due to their diverse mechanisms of action and their relatively low propensity to develop resistance. To facilitate the systematic discovery of AFPs, we developed AI4AFP. This computational framework integrates curated antifungal peptide resources with advanced machine learning approaches to predict antifungal potential directly from peptide sequences.
Using a comprehensive dataset, we constructed a seven-model ensemble that combines multiple sequence encoding strategies, including ProtBERT-BFD, PC6, and Doc2Vec, with diverse learning algorithms, including random forests, support vector machines, convolutional neural networks, and fine-tuned BERT models. This ensemble demonstrated robust performance on an independent test set, achieving 0.94 in accuracy and 0.89 in Matthews correlation coefficient, outperforming existing AFP prediction methods. Importantly, the predicted AFP score is intended to reflect the general antifungal potential rather than species-specific potency.
Experimental validation against representative fungal pathogens, including Candida albicans, Candida glabrata, and Cryptococcus neoformans, revealed that peptides with high predicted AFP scores exhibited context-dependent antifungal activity. Several candidates displayed pronounced inhibitory effects against specific species, despite limited activity against others, highlighting the inherent species-dependence of antifungal efficacy and supporting the role of AI4AFP as a prioritization tool rather than a species-specific predictor.
To complement antifungal prediction, we further developed a hemolysis classifier that incorporates both peptide sequence and applied concentration as continuous inputs, enabling explicit modeling of the dose-dependent nature of hemolytic toxicity. Experimental determination of the minimum concentration inducing 10% hemolysis (MHC₁₀) provided an empirical safety reference, allowing antifungal activity to be interpreted alongside concentration-dependent toxicity. All models and validation results are implemented on a user-friendly web server, AI4AFP (https://axp.iis.sinica.edu.tw/AI4AFP), providing an accessible platform for the discovery and prioritization of antifungal peptides, with consideration of both efficacy and safety.
Overcoming Copyright Barriers in Corpus Distribution Through Non-Reversible Hashing
The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Main Conference, July 2026
Arthur Amalvy, Vincent Labatut, Xavier Bost, and Hen-Hsen Huang
Abstract
While annotated corpora are crucial in the field of natural language processing (NLP), those containing copyrighted material are difficult to exchange among researchers. Yet, such corpora are necessary to fully represent the diversity of data found in the wild in the context of NLP tasks. We tackle this issue by proposing a method to lawfully and publicly share the annotations of copyrighted literary texts. The corpus creator shares the annotations in clear, along with a non-reversible hashed version of the source material. The corpus user must own the source material, and apply the same hash function to their own tokens, in order to match them to the shared annotations. Crucially, our method is robust to reasonable divergences in the version of the copyrighted data owned by the user. As an illustration, we present alignment experiments on different editions of novels. Our results show that our method is able to correctly align 98.7 to 99.79% of tokens depending on the novel, provided the user version is sufficiently close to the corpus creator's version. We publicly release novelshare, a Python implementation of our method.
Rethinking Forgery Attacks on Semantic Watermarks in Black-Box Settings: A Geometric Distortion Perspective
Forty-third International Conference on Machine Learning (ICML), July 2026
Cheng-Yi Lee, Yichi Zhang, Yuchen Yang, and Chun-Shien Lu, and Jun-Cheng Chen
Abstract
Recent studies have shown that semantic watermarks, which embed information into the initial noise of latent diffusion models (LDMs), are vulnerable to black-box forgery attacks. However, existing methods primarily rely on empirical evidence and lack a rigorous theoretical understanding of the conditions under which such attacks succeed or fail. To bridge this gap, we rethink the nature of such attacks through the lens of ratedistortion in the latent space. Our analysis identifies an irreducible distortion floor due to structural mismatches between proxy and target models, which fundamentally limits the fidelity of forged watermarks. We further characterize this distortion as structured geometric deviations on the latent manifold, in the form of global drift and local deformation rather than stochastic noise. Leveraging these insights, we propose a scheme-agnostic detection method that distinguishes forged samples before watermark verification. Extensive experiments demonstrate the effectiveness of our method across diverse black-box scenarios, while preserving robustness to common distortions.
Submodular Optimization for Minimal Augmentation in Robust Language Model Alignment
Forty-third International Conference on Machine Learning (ICML), July 2026
Ching-Chia Kao, Chia-Mu Yu, Chun-Shien Lu, and Chu-Song Chen
Abstract
Safety alignment of large language models is fragile: even small fine-tuning perturbations elastically revert behaviors toward those of the pretraining, with degradation inversely proportional to the size of the alignment set. We ask how to achieve safety alignment with minimal augmentation. To this end, we model augmentation as a set of group actions on sequences and formalize robustness gains as a normalized, monotone submodular function over transformations. We then leverage submodular optimization to select minimal augmentations that provably improve robustness. Experiments confirm that our approach efficiently restores safety alignment while minimizing the overhead of augmentation.
Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors, and Perceptual Insights
IEEE Computational Intelligence Magazine, May 2026
Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, and Hsin-Min Wang
Abstract
Deep learning has been successfully applied in various fields, and its impact on deepfake detection is no exception. Deepfakes are fake, yet realistic synthetic content that can be used deceitfully for political impersonation, phishing, slander, or the spread of misinformation. Despite extensive research on unimodal deepfake detection, the identification of complex deepfakes through joint analysis of audio and visual streams remains relatively unexplored. To fill this gap, this survey first provides an overview of audiovisual deepfake generation techniques, applications, and their consequences, and then provides a comprehensive review of state-of-the-art methods that combine audio and visual modalities to increase detection accuracy, summarizing and critically analyzing their strengths and limitations. Furthermore, we discuss existing open source datasets for a deeper understanding, which can contribute to the research community and provide necessary information for beginners who want to analyze deep learning-based audiovisual methods for video forensics. By bridging the gap between unimodal and multimodal approaches, this paper aims to improve the effectiveness of deepfake detection strategies and guide future research on cybersecurity and media integrity.
Telomere-to-Telomere, Haplotype-Resolved Chromosome-Level Genome Assembly and Annotation of Taiwan Hard Clam (Meretrix taiwanica)
Scientific Data, May 2026
Ching-Huei Huang, Po-Cheng Hsu, San-Tzu Hsieh, Fu-Shen Tseng, Chung-Yen Lin
Abstract
Taiwan Hard Clam (Meretrix taiwanica) is an economically important aquaculture species in Taiwan, yet genomic resources for this species have remained fragmented. We present a telomere-to-telomere (T2T), haplotype-resolved, chromosome-level genome assembly for M. taiwanica, generated using PacBio HiFi long reads and Hi-C sequencing. The two haploid assemblies (hap1 and hap2) span 1,006.48 Mb and 1,007.28 Mb, comprising 126 and 66 sequences, respectively, and each containing 19 chromosomes. Hap1 and hap2 exhibit sequence N50 values of 53.87 Mb and 51.57 Mb, with average scaffold lengths of 7.99 Mb and 15.26 Mb, and contain 0.0176% and 0.1313% ambiguous bases. Comparative analyses revealed 81.59% and 83.78% syntenic regions between haplotypes and identified 10,175 structural variations. Repetitive elements constitute 47.06% and 47.02% of the hap1 and hap2 genomes. We annotated 23,320 and 23,598 protein-coding gene models, with median gene lengths of 7,721 bp and 7,657.5 bp, respectively. The mitochondrial genome was assembled at 21,164 bp and encodes 13 protein-coding genes, 22 tRNAs, and 2 rRNAs. Functional annotation covered 16.23% and 16.33% of the nuclear and mitochondrial gene sets. BUSCO analysis indicated genome completeness of 92.4% and 92.5%, and proteome completeness of 95.4% and 94.5% for hap1 and hap2. By providing the first T2T-level reference, this dataset enables precise identification of trait-associated markers for marker-assisted selection (MAS), thereby facilitating genetic improvement of growth and stress-resistance traits. Furthermore, it serves as a robust genomic framework for conservation genomics to assess the genetic diversity of both wild and hatchery populations of this economically vital species.
Regret-Guided Search Control for Efficient Learning in AlphaZero
the Fourteenth International Conference on Learning Representations (ICLR), April 2026
Yun-Jui Tsai, Wei-Yu Chen, Yan-Ru Ju, Yu-Hung Chang, Ti-Rong Wu
Abstract
Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensive self-play games to extract useful signals, humans often need only a few games, improving rapidly by repeatedly revisiting states where mistakes occurred. This idea, known as search control, aims to restart from valuable states rather than always from the initial state. In AlphaZero, prior work Go-Exploit applies this idea by sampling past states from self-play or search trees, but it treats all states equally, regardless of their learning potential. We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent's evaluation diverges most from the actual outcome. These states are collected from both self-play trajectories and MCTS nodes, stored in a prioritized regret buffer, and reused as new starting positions. Across 9x9 Go, 10x10 Othello, and 11x11 Hex, RGSC outperforms AlphaZero and Go-Exploit by an average of 77 and 89 Elo, respectively. When training on a well-trained 9x9 Go model, RGSC further improves the win rate against KataGo from 69.3% to 78.2%, while both baselines show no improvement. These results demonstrate that RGSC provides an effective mechanism for search control, improving both efficiency and robustness of AlphaZero training. Our code is available at https://rlg.iis.sinica.edu.tw/papers/rgsc.
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
The Fourteenth International Conference on Learning Representations (ICLR), April 2026
Chin-Chia Yang, Yung-Yu Chuang, Hwann-Tzong Chen and Tyng-Luh Liu
Abstract
Synthetic image generators evolve rapidly, challenging detectors to generalize across current methods and adapt to new ones. We study domain-incremental synthetic image detection with a two-phase evaluation. Phase I trains on either diffusion- or GAN-based data and tests on the combined group to quantify bidirectional cross-generator transfer. Phase II sequentially introduces renders from 3D Gaussian Splatting (3DGS) head avatar pipelines, requiring adaptation while preserving earlier performance. We observe that CLIP-based detectors inherit text-image alignment semantics that are irrelevant to authenticity and hinder generalization. We introduce a Hilbert-Schmidt Independence Criterion (HSIC) bottleneck loss on intermediate CLIP ViT features, encouraging representations predictive of real versus synthetic while independent of generator identity and caption alignment. For domain-incremental learning, we propose HSIC-Guided Replay (HGR), which selects per-class exemplars via a hybrid score combining HSIC relevance with k-center coverage, yielding compact memories that mitigate forgetting. Empirically, the HSIC bottleneck improves transfer between diffusion and GAN families, and HGR sustains prior accuracy while adapting to 3DGS renders. These results underscore the value of information-theoretic feature shaping and principled replay for resilient detection under shifting generative regimes.
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
IEEE Transaction on Audio, Speech and Language Processing, February 2026
Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, and Berlin Chen
Abstract
Pre-trained models for automatic speech recognition (ASR) and speech enhancement (SE) have exhibited remarkable capabilities under matched noise and channel conditions. However, these models often suffer from severe performance degradation when confronted with domain shifts, particularly in the presence of unseen noise and channel distortions. In view of this, we in this paper present URSA-GAN, a unified and domain-aware generative framework specifically designed to mitigate mismatches in both noise and channel conditions. URSA-GAN leverages a dual-embedding architecture that consists of a noise encoder and a channel encoder, each pre-trained with limited in-domain data to capture domain-relevant representations. These embeddings condition a GAN-based speech generator, facilitating the synthesis of speech that is acoustically aligned with the target domain while preserving phonetic content. To enhance generalization further, we propose dynamic stochastic perturbation, a novel regularization technique that introduces controlled variability into the embeddings during generation, promoting robustness to unseen domains. Empirical results demonstrate that URSA-GAN effectively reduces character error rates in ASR and improves perceptual metrics in SE across diverse noisy and mismatched channel scenarios. Notably, evaluations on compound test conditions with both channel and noise degradations confirm the generalization ability of URSA-GAN, yielding relative improvements of 16.16% in ASR performance and 15.58% in SE metrics.
Cross-Attention Reprogramming for ASR: Bridging Discrete Speech Units and Pretrained Language Models
IEEE Access, January 2026
Pei-Jun Liao, Hung-Yi Lee, and Hsin-Min Wang
Abstract
In automatic speech recognition (ASR), an emerging trend involves converting continuous speech features into sequences of discrete speech units (DSUs) via quantization. A key advantage of DSU representations is their compatibility with pretrained language models (PLMs), where DSUs are directly mapped to PLM token indices and the embedding layer is fine-tuned. However, this conventional strategy often relies heavily on large-scale training data to mitigate the inherent modality mismatch. In light of this, we explore a more effective way to exploit the PLM embedding dictionary. Drawing inspiration from Time-LLM, a recent time-series forecasting model, we propose a cross-attention reprogramming mechanism that incorporates codebook information from the DSU quantizer to better align the DSUs with the PLM embeddings. Compared to direct fine-tuning of PLM embeddings, our method consistently achieves improvements on the Discrete Audio and Speech Benchmark (DASB), reaching state-of-the-art performance across most DASB-style settings. We also evaluate our method on LibriSpeech-960, LibriLight-10, and Swedish, Czech, and Hungarian data from Common Voice, and observe similar trends. Notably, the proposed reprogramming method demonstrates significant gains over the fine-tuning baseline, particularly in cross-lingual and low-resource scenarios. This study proposes a new approach to using PLM embedding dictionaries in DSU-based ASR, and lays a foundation for combining speech representations with large language models in other discriminative tasks of speech processing such as speech emotion recognition and spoken question answering.
Can We Formalise Type Theory Intrinsically without Any Compromise? A Case Study in Cubical Agda
Proceedings of the 15th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP '26), January 2026
Liang-Ting Chen, Fredrik Nordvall Forsberg, Tzu-Chun Tsai
Abstract
We present an intrinsic representation of type theory in the proof assistant Cubical Agda, inspired by Awodey’s natural models of type theory. The initial natural model is defined as quotient inductive-inductive-recursive types, leading us to a syntax accepted by Cubical Agda without using any transports, postulates, or custom rewrite rules. We formalise some meta-properties such as the standard model, normalisation by evaluation for typed terms, and strictification constructions. Since our formalisation is carried out using Cubical Agda's native support for quotient inductive types, all our constructions compute at a reasonable speed. When we try to develop more sophisticated metatheory, however, the 'transport hell' problem reappears. Ultimately, it remains a considerable struggle to develop the metatheory of type theory using an intrinsic representation that lacks strict equations. The effort required is about the same whether or not the notion of natural model is used.
Efficient Column-Wise N:M Pruning on RISC-V CPU
Journal of Systems Architecture (JSA), March 2026
Chi-Wei Chu, Ding-Yong Hong, Jan-Jan Wu
Abstract
In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate’s profiling technique to identify the optimal implementation for each convolutional operator. Our proposed approach effectively increases ResNet inference throughput by as much as 4×, and preserves ImageNet top-1 accuracy within 2.1% of the dense baseline.
Complete end-to-end learning from protein feature representation to protein interactome inference
GigaScience, November 2025
Yu-Hsin Chen, Chien-Fu Liu, Jun-Yi Leu*, and Huai-Kuang Tsai*
Abstract
Co-fractionation coupled with mass spectrometry (CF-MS) is a powerful strategy for mapping protein-protein interactions (PPIs) under near-physiological conditions. Despite recent progress, existing analysis pipelines remain constrained by reliance on handcrafted features, sensitivity to experimental noise, and an inherent focus on pairwise interactions, which limit their scalability and generalizability. To address these difficulties, we introduce FREEPII (Feature Representation Enhancement End-to-End Protein Interaction Inference), a unified deep learning framework that integrates CF-MS data with sequence-derived features to learn biologically meaningful protein-level representations for accurate and efficient inference of PPIs and protein complexes. FREEPII employs a convolutional neural network (CNN) architecture to learn protein-level representations directly from raw data, enabling feature sharing across interaction pairs and reducing computational complexity. To enhance robustness against CF-MS noise, protein sequences are introduced as auxiliary input to enrich the feature space with complementary biological cues. The supervised protein embeddings further encode network-level context derived from complex annotations, allowing the model to capture higher-order interactions and enhance the expressive power of protein representations. Extensive benchmarking demonstrates that FREEPII consistently outperforms state-of-the-art CF-MS analysis tools, capturing more biologically coherent and discriminative protein features. Cross-dataset evaluations further reveal that integrating multi-modal data from diverse experimental contexts substantially improves the generalization and sensitivity of data-driven models, offering a scalable, cross-species strategy for reliable protein interaction inference.
GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm
IEEE Transactions on Information Forensics and Security , November 2025
Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Christopher Leckie, and Isao Echizen
Abstract
Deep neural networks are highly vulnerable to adversarial examples, which are inputs with small, carefully crafted perturbations that cause misclassification—making adversarial attacks a critical tool for evaluating robustness. Existing black-box methods typically entail a trade-o between precision and flexibility: pixel-sparse attacks (e.g., single- or fewpixel attacks) provide fine-grained control but lack adaptability, whereas patch- or frequency-based attacks improve eciency or transferability, but at the cost of producing larger and less precise perturbations. We present GreedyPixel, a fine-grained black-box attack method that performs brute-force-style, per-pixel greedy optimization guided by a surrogate-derived priority map and refined by means of query feedback. It evaluates each coordinate directly without any gradient information, guaranteeing monotonic loss reduction and convergence to a coordinate-wise optimum, while also yielding near white-box-level precision and pixel-wise sparsity and perceptual quality. On the CIFAR-10 and ImageNet datasets, spanning convolutional neural networks (CNNs) and Transformer models, GreedyPixel achieved state-ofthe- art success rates with visually imperceptible perturbations, eectively bridging the gap between black-box practicality and white-box performance. The implementation is available at https://github.com/azrealwang/greedypixel
Chromosome-Level Genome Assembly and Annotation of the Japanese Cutlassfish (Trichiurus japonicus): A High-Quality Genomic Resource Featuring Nuclear and Mitochondrial Completeness for Future Studies
Scientific Data, November 2025
Po-Cheng Hsu, Chung-Yen Lin, Ping-Heng Hsieh, Wei-Hsuan Chuang, Mei-Yeh Lu, Chaolun A llen Chen, Shu-Hwa Chen
Abstract
The Japanese cutlassfish (Trichiurus japonicus) is a commercially important marine species across Asia. Here, we present a high-quality, chromosome-level genome assembly generated using PacBio HiFi, Hi-C, and Nanopore ONT reads. The nuclear genome comprised 24 chromosomes with 160 scaffolds totaling 1,138 Mb, with a scaffold N50 of 47.10 Mb and an average scaffold length of 6.18 Mb. A complete mitochondrial genome of 16,796 bp was also assembled, comprising 13 protein-coding and 23 non-coding RNA (ncRNA) genes, with 99.32% sequence identity to the reference in the NCBI database. The nuclear genome encodes 26,541 protein-coding genes (median length: 7,391 base pairs) and 16,383 non-coding RNA (ncRNA) genes. The ncRNA genes account for approximately 0.1694% of the genome's total length. BUSCO analysis indicated 99.4% and 99.2% completeness against the Actinopterygii ortholog set for the genome and proteome. Functional annotation covered 98.15% of genes. Recognized repeat elements and ncRNA regions accounted for 61.10% of the nuclear genome. With high mapping rates from external datasets, this assembly offers a valuable foundation for future sequencing-based studies.