您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Research

Print

Press Ctrl+P to print from browser

Recent Research Results

:::

Secretome from estrogen‑responding human placenta‑derived mesenchymal stem cells rescues ovarian function and circadian rhythm in mice with cyclophosphamide‑induced primary ovarian insufficiency

Journal of Biomedical Science, October 2024

Duy-Cuong Le, Mai-Huong T Ngo, Yung-Che Kuo, Shu-Hwa Chen, Chung-Yen Lin, Thai-Yen Ling, Quoc Thao Trang Pham, Heng-Kien Au, Jihwan Myung, Yen-Hua Huang

Chung-Yen Lin

Abstract

Background

Primary ovarian insufficiency (POI) is an early decline in ovarian function that leads to ovarian failure. Conventional treatments for POI are inadequate, and treatments based on mesenchymal stem cells (MSCs) have emerged as an option. However, the lack of consideration of the estrogen niche in ovarian tissue significantly reduces the therapeutic efficacy, with an unclear mechanism in the MSCs in POI treatment. Furthermore, the disruption of circadian rhythm associated with POI has not been previously addressed.


Methods

Conditioned medium (CM) and estradiol-conditioned medium (E2-CM) were generated from estrogen receptor positive MSCs (ER+pcMSCs). Chemotherapy-induced POI models were established using C57BL/6 mice (in vivo) and KGN cells (in vitro) treated with cyclophosphamide (CTX) or 4-hydroperoxycyclophosphamide (4-OOH-CP). Gene/protein expressions were detected using RT-qPCR, Western blotting, and immunohistochemistry assays. Locomotor activity was monitored for behavioral circadian rhythmicity. Cytokine arrays and miRNA analysis were conducted to analyze potential factors within CM/E2-CM.


Results

The secretome of ER+pcMSCs (CM and E2-CM) significantly reduced the CTX-induced defects in ovarian folliculogenesis and circadian rhythm. CM/E2-CM also reduced granulosa cell apoptosis and rescued angiogenesis in POI ovarian tissues. E2-CM had a more favorable effect than the CM. Notably, ER+pcMSC secretome restored CTX-induced circadian rhythm defects, including the gene expressions associated with the ovarian circadian clock (e.g., Rora, E4bp4, Rev-erbαPer2 and Dbp) and locomotor activity. Additionally, the cytokine array analysis revealed a significant increase in cytokines and growth factors associated with immunomodulation and angiogenesis, including angiogenin. Neutralizing the angiogenin in CM/E2-CM significantly reduced its ability to promote HUVEC tube formation in vitro. Exosomal miRNA analysis revealed the miRNAs involved in targeting the genes associated with POI rescue (PTEN and PDCD4), apoptosis (caspase-3, BIM), estrogen synthesis (CYP19A1), ovarian clock regulation (E4BP4REV-ERBα) and fibrosis (COL1A1).


Conclusion

This study is the first to demonstrate that, in considering the estrogen niche in ovarian tissue, an estrogen-priming ER+pcMSC secretome achieved ovarian regeneration and restored the circadian rhythm in a CTX-induced POI mouse model. The potential factors involved include angiogenin and exosomal miRNAs in the ER+pcMSC secretome. These findings offer insights into potential stem cell therapies for chemotherapy-induced POI and circadian rhythm disruption.

Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning

BMC Genomics, September 2024

Lin, C.H., Tsai, C.H., Shiau, C.K., Huang, J.H. and Tsai, H.K.*

H.K.

Abstract

Background Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored. Results In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation. Conclusion In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.

Speculative Monte-Carlo Tree Search

Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024

Scott Cheng, Mahmut Kandemir, Ding-Yong Hong

Scott Cheng Ding-Yong Hong

Abstract

Monte-Carlo tree search (MCTS) is an influential sequential decision-making algorithm notably employed in AlphaZero. Despite its success, the primary challenge in AlphaZero training lies in its prolonged time-to-solution due to the high latency imposed by the sequential MCTS process. To address this challenge, this paper proposes and evaluates an inter-decision parallelization strategy called speculative MCTS, a new type of parallelism in AlphaZero which implements speculative execution. This approach allows for the parallel execution of future moves before the current MCTS computations are completed, thus reducing the latency. Additionally, we analyze factors contributing to the overall speedup by studying the synergistic effects of speculation and neural network caching in MCTS. We also provide an analytical model that can be used to evaluate the potential of different speculation strategies before they are implemented and deployed. Our empirical findings indicate that the proposed speculative MCTS can reduce training latency by 5.81x in 9x9 Go games. Moreover, our study shows that speculative execution can enhance the NN cache hit rate by 26% during midgame. Overall, our end-to-end evaluation indicates 1.91x speedup in 19x19 Go training time, compared to the state-of-the-art KataGo program.

Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

IEEE Workshop on Spoken Language Technology (SLT2024), December 2024

Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, and Hsin-Min Wang

Hsin-Min Wang

Abstract

Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited target noisy speech data. Notably, our method employs a noise encoder to extract noise embeddings from target-domain data. These embeddings aptly guide the generator to synthesize utterances acoustically fitted to the target domain while authentically preserving the phonetic content of the input clean speech. Furthermore, we introduce the notion of dynamic stochastic perturbation, which can inject controlled perturbations into the noise embeddings during inference, thereby enabling the model to generalize well to unseen noise conditions. Experiments on the VoiceBank-DEMAND benchmark dataset demonstrate that our domain-adaptive SE method outperforms an existing strong baseline based on data simulation.

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

IEEE Workshop on Spoken Language Technology (SLT2024), December 2024

Wen-Chin Huang, Szu-Wei Fu, Erica Cooper, Ryandhimas E. Zezario, Tomoki Toda, Hsin-Min Wang, Junichi Yamagishi, and Yu Tsao

Hsin-Min Wang

Abstract

We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of ``zoomed-in'' high-quality samples from speech synthesis systems. The second track was to predict ratings of samples from singing voice synthesis and voice conversion with a large variety of systems, listeners, and languages. The third track was semi-supervised quality prediction for noisy, clean, and enhanced speech, where a very small amount of labeled training data was provided. Among the eight teams from both academia and industry, we found that many were able to outperform the baseline systems. Successful techniques included retrieval-based methods and the use of non-self-supervised representations like spectrograms and pitch histograms. These results showed that the challenge has advanced the field of subjective speech rating prediction.

Interaction of the Gut Microbiota and Brain Functional Connectivity in Late Life Depression

Journal of Psychiatry and Neuroscience, September 2024

Chia-Fen Tsai, Chia-Hsien Chuang, Pei-Chi Tu, Wan-Chen Chang,Yen-Po Wang, Pei-Yi Liu, Po-Shan Wu, Chung-Yen Lin, Ching-Liang Lu

Chia-Hsien Chuang Chung-Yen Lin

Abstract

Background: Increasing evidence suggests an important role of the gut microbiome in the pathogenesis of mental disorders, including depression, along the microbiota-gut-brain axis. The interactions between gut microbe composition and neural circuits in late-life depression (LLD) remain to be elucidated.

Methods: We performed fecal 16S rRNA sequencing and resting-state functional magnetic resonance imaging in a case-control cohort of 32 older adults with LLD, defined as major depressive disorder (MDD), and 16 healthy controls (HCs) to characterize the association of gut microbiota and brain functional connectivity (FC). The Hamilton Depression Rating Scale (HAMD) was used to assess depressive symptoms.

Results: At the genus level, the relative abundances of Enterobacter, Akkermansiaceae, Haemophilus, Burkholderia, and Rothia were significantly higher in depressive patients than in HCs. Reduced FC within mood regulation circuits were mainly found in the frontal cortex (such as the right superior and inferior frontal gyrus, right lateral occipital cortex, left middle frontal gyrus, and left caudate) in the depression patients compared with the HCs. The group-characterized gut microbes in HCs and LLD patients showed opposite correlations with seed-based FC, which may account for the aberrant emotion regulation in depressive patients. The abundance of Enterobacter (dominant genus in LLD) was positively correlated with both HAMD scores and group-characterized FC, while Odoribacter (dominant genus in HC) was negatively correlated with both HAMD scores and group-characterized FC.

Conclusion: Significant correlations were identified between depression-characterized gut microbes and brain FC and depression severity, which may contribute to the pathophysiology of depression development in LLD patients.

Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification

European Conference on Computer Vision (ECCV), September 2024

Cheng-Chang Tsai, Yuan-Chih Chen, and Chun-Shien Lu

Chun-Shien Lu Cheng-Chang Tsai

Abstract

Stain shifts are prevalent in histopathology images, and typically dealt with by normalization or augmentation. Considering training-time methods are limited in dealing with unseen stains, we propose a test-time stain adaptation method (TT-SaD) with diffusion models that achieves stain adaptation by solving a nonlinear inverse problem during testing. TT-SaD is promising in that it only needs a single domain for training but can adapt well from other domains during testing, preventing models from retraining whenever there are new data available. For tumor classification, stain adaptation by TT-SaD outperforms state-of-the-art diffusion model-based test-time methods. Moreover, TT-SaD beats training-time methods when testing on data that are inaccessible during training. To our knowledge, the study of stain adaptation in diffusion model during testing time is relatively unexplored.

Transcriptomics and gut microbiome analysis of the edible herb Bidens pilosa as a functional feed additive to promote growth and metabolism in tilapia (Oreochromis spp.)

BMC Genomics, August 2024

Che-Chun Chen, Chung-Yen Lin, Hsin-Yun Lu, Chyng-Hwa Liou, Ying-Ning Ho, Chang-Wen Huang, Zhong-Fu Zhang, Chih-Hsin Kao, Wen-Chin Yang, Hong-Yi Gong

Chung-Yen Lin Chih-Hsin Kao

Abstract

Background

To reduce the use of antibiotics and chemicals in aquaculture, an edible herb - B. pilosa - has been selected as multifunctional feed additives to address this issue. Although there has been considerable research into the effects of B. pilosa on poultry, the wider effects, particularly on the growth and gut microbiota in fish, remain largely unexplored. We aim to investigate the interactive effects between the host on growth and the gut microbiota using transcriptomics and gut microbiota in B. pilosa-fed tilapia.

Results

In this study, we added 0.5% and 1% B. pilosa to the diet and observed that the growth performance of tilapia was significantly increased after 8 weeks of feeding. Comparative transcriptome analysis was performed on RNA sequence profiles obtained from liver and muscle tissues. Functional enrichment analysis showed that B. pilosa regulates several pathways and genes including amino acid metabolism, lipid metabolism, carbohydrate metabolism, endocrine system, signal transduction and metabolism of other amino acids. The expression of selected growth-associated genes was validated by qRT-PCR. The qRT-PCR result indicated that B. pilosa may enhance growth performance by activating the expression of liver igf1 and muscle igf1rb genes and inhibiting the expression of the muscle negative regulator myostatin b (mstnb). Enhancement of endocrine Igf1/Igf1rb signaling and suppression of Mstn signaling both induced the expression of myogenic regulatory factors (MRFs), myod1myogenin and mrf4, to promote muscle growth in tilapia. The predicted function of the gut microbiota showed several significantly different pathways that overlapped with the KEGG enrichment results of differentially expressed genes in the liver transcriptomes. This suggests that gut microbiota may be able to influence liver metabolism through the gut-liver axis in B. pilosa-fed tilapia.

Conclusions

In conclusion, dietary B. pilosa can regulate endocrine igf1 signaling and myostatin signaling to activate expression of MRFs to promoter muscle growth, and alter the composition of gut bacteria, which can then affect liver amino acid metabolism, carbohydrate metabolism, the endocrine system, lipid metabolism, metabolism of other amino acids, and signal transduction of the host, ultimately enhancing growth performance. Our results suggest that B. pilosa has the potential to be a functional additive that can be an alternative to reduce antibiotics as growth promoter in aquaculture organisms.

Learnable Layer Selection and Model Fusion for Speech Self-Supervised Learning Models

Interspeech2024, September 2024

Sheng-Chieh Chiu, Chia-Hua Wu, Jih-Kang Hsieh, Yu Tsao, and Hsin-Min Wang

Yu Tsao Hsin-Min Wang

Abstract

In this paper, we investigate methods for fusing feature representations derived from multiple speech self-supervised learning (SSL) models, along with techniques to determine the optimal layer within each model. We evaluate five fusing strategies, finding that temporal interleaved concatenation is the most robust and effective for the SUPERB ASR task. Additionally, we demonstrate that Gumbel layer selection can automatically select the most appropriate SSL layer with better performance than the commonly used weighted sum method. Furthermore, dimension-wise Gumbel layer selection shows promise in adaptive combination of layers of a single SSL model. Finally, we show that joint SSL model fusion and dimension-wise Gumbel layer selection further enhances effectiveness.

Learning Diffusion Models for Multi-View Anomaly Detection

18th European Conference on Computer Vision, September 2024

Chieh Liu, Yu-Min Chu, Ting-I Hsieh, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

We are exploring an emerging formulation in anomaly detection (AD) where multiple instances of the same object are produced simultaneously and distinctly to address the limitation that using only a single instance may not effectively capture any underlying defects. More specifically, we concentrate on a specific scenario where each object of interest is linked to seven distinct data views/representations. The first six views involve capturing images with a stationary camera under six different lighting conditions, while the seventh view pertains to the 3D normal information. We refer to our intended task as {\\em multi-view anomaly detection}. To tackle this problem, our approach involves training a view-invariant ControlNet that can produce consistent feature maps regardless of the data views. This training strategy enables us to mitigate the impact of varying lighting conditions and to fuse information from both the RGB color appearance and the 3D normal geometry effectively. Moreover, as the diffusion process is not deterministic, we utilize the DDIM scheme to improve the applicability of our established memory banks of diffusion-based features for anomaly detection inference. To demonstrate the efficacy of our approach, we present extensive ablation studies and state-of-the-art experimental results on the Eyecandies dataset.

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Interspeech2024, September 2024

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, and Yu Tsao

Yu Tsao Hsin-Min Wang

Abstract

Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper’s embeddings to create crossdomain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearingaid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings, speech metadata, and related complementary metrics to improve prediction performance for HA.

Pseudo-Embedding for Generalized Few-Shot 3D Segmentation

18th European Conference on Computer Vision, September 2024

Chih-Jung Tsai, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

Abstract

Existing generalized few-shot 3D segmentation (GFS3DS) methods typically prioritize enhancing the training of base-class prototypes while neglecting the rich semantic information within background regions for future novel classes. We introduce a novel GFS3DS learner that strategically leverages background context to improve both base prototype training and few-shot adaptability. Our method employs foundation models to extract semantic features from background points and grounds on text embeddings to cluster background points into pseudo-classes. This approach facilitates clearer base/novel class differentiation and generates pseudo prototypes that effectively mimic novel support samples. Comprehensive experiments on S3DIS and ScanNet datasets demonstrate the state-of-the-art performance of our method in both 1-shot and 5-shot tasks. Our approach significantly advances GFS3DS by unlocking the potential of background context, offering a promising avenue for broader applications.

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

Interspeech2024, September 2024

Chun Yin, Tai-Shih Chi, Yu Tsao, and Hsin-Min Wang

Yu Tsao Hsin-Min Wang

Abstract

Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.

Effective Compression of Language Models by Combining Pruning and Knowledge Distillation

IEEE International Conference on Computers, Software, and Applications (COMPSAC), July 2024

Chi-Yu Chiu, Ding-Yong Hong, Pangfeng Liu and Jan-Jan Wu

Ding-Yong Hong Jan-Jan Wu

Abstract

Weight pruning is a prominent model compression technique that removes some weights in a model. However, pruning on transformer models faces a challenge. After pruning, Transformer models require repeating the whole training process, including pre-training on a large generalized data set and fine-tuning on a small downstream data set, to recover the accuracy. The whole training process takes a long time and many computation resources. To address the challenge, we propose a pruning method that combines with knowledge distillation to avoid a long re-training time while recovering the accuracy. We use 2:4 pruning as our basic pruning method. 2:4 pruning is a method proposed by NVIDIA that keeps two larger absolute values in every four consecutive elements in every row in a weight matrix. We generalize 2:4 pruning to N:M pruning which refers to keeping N larger absolute values in every M consecutive elements in every row in a weight matrix. Knowledge distillation is another model compression method that makes a small model, which is referred to as a student, learn from a large model, which is referred to as a teacher. With our method, we use N:M pruning to uniformly prune the model into N:M structure. Next, we use two-stage fine-tuning on the downstream dataset with knowledge distillation. By using our method, the pruned models can achieve comparable accuracy by using only downstream datasets and take much less time than traditional retraining. We run our experiments on SQuAD and GLUE datasets using DistilBERT. The experimental results show that DistilBERT in a 1:4 structure can achieve comparable accuracy on the SQuAD v1.1 and SQuAD v2.0 datasets and 1.7x speedup on inference compared to the original dense model.

Automatic Construction of a Chinese Review Dataset for Aspect Sentiment Triplet Extraction via Iterative Weak Supervision

LREC-COLING 2024, May 2024

Chia-Wen Lu, Ching-Wen Yang, Wei-Yun Ma

Chia-Wen Lu Wei-Yun Ma

Abstract

Aspect Sentiment Triplet Extraction (ASTE), introduced in 2020, is a task that involves the extraction of three key elements: target aspects, descriptive opinion spans, and their corresponding sentiment polarity. This process, however, faces a significant hurdle, particularly when applied to Chinese languages, due to the lack of sufficient datasets for model training, largely attributable to the arduous manual labeling process. To address this issue, we present an innovative framework that facilitates the automatic construction of ASTE via Iterative Weak Supervision, negating the need for manual labeling, aided by a discriminator to weed out subpar samples. The objective is to successively improve the quality of this raw data and generate supplementary data. The effectiveness of our approach is underscored by our results, which include the creation of a substantial Chinese review dataset. This dataset encompasses over 60,000 Google restaurant reviews in Chinese and features more than 200,000 extracted triplets. Moreover, we have also established a robust baseline model by leveraging a novel method of weak supervision. Both our dataset and model are openly accessible to the public.

Decoding the genome of bloodsucking midge Forcipomyia taiwana (diptera: Ceratopogonidae): Insights into odorant receptor expansion

Insect Biochemistry and Molecular Biology, May 2024

Ming-Der Lin, Chia-Hsien Chuang, Chih-Hsin Kao, Shu-Hwa Chen, Szu-Chieh Wang, Ping-Heng Hsieh, Guan-Yu Chen, Chun-Chia Mao, Jeng-Yi Li, Mei-Yeh Jade Lu, Chung-Yen Lin

Ming-Der Lin Chia-Hsien Chuang Shu-Hwa Chen Szu-Chieh Wang Ping-Heng Hsieh Chun-Chia Mao Chung-Yen Lin

Abstract

Biting midges, notably those within the Ceratopogonidae family, have long been recognized for their epidemiological significance, both as nuisances and vectors for disease transmission in vertebrates. Despite their impact, genomic insights into these insects, particularly beyond the Culicoides genus, remain limited. In this study, we assembled the Forcipomyia taiwana (Shiraki) genome, comprising 113 scaffolds covering 130.4 Mbps—with the longest scaffold reaching 7.6 Mbps and an N50 value of 2.6 Mbps—marking a pivotal advancement in understanding the genetic architecture of ceratopogonid biting midges. Phylogenomic analyses reveal a shared ancestry between F. taiwana and Culicoides sonorensis Wirth & Jones, dating back approximately 124 million years, and highlight a dynamic history of gene family expansions and contractions within the Ceratopogonidae family. Notably, a substantial expansion of the odorant receptor (OR) gene family was observed, which is crucial for the chemosensory capabilities that govern biting midges' interactions with their environment, including host seeking and oviposition behaviors. The distribution of OR genes across the F. taiwana genome displays notable clusters on scaffolds, indicating localized tandem gene duplication events. Additionally, several collinear regions were identified, hinting at segmental duplications, inversions, and translocations, contributing to the olfactory system's evolutionary complexity. Among the 156 ORs identified in F. taiwana, 134 are biting midge-specific ORs, distributed across three distinct clades, each exhibiting unique motif features that distinguish them from the others. Through weighted gene co-expression network analysis, we correlated distinct gene modules with sex and reproductive status, laying the groundwork for future investigations into the interplay between gene expression and adaptive behaviors in F. taiwana. In conclusion, our study not only highlights the unique olfactory repertoire of ceratopogonid biting midges but also sets the stage for future studies into the genetic underpinnings of their unique biological traits and ecological strategies.