Institute of Information Science, Academia Sinica

Research Overview


Recent Research Results

Parallel Asynchronous Stochastic Dual Coordinate Descent Algorithms for Efficiency and Convergence

29th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2021), March 2021

Yung-Chen Chen, Pangfeng Liu, Jan-Jan Wu

Jan-Jan Wu

Parallel asynchronous stochastic dual coordinate descent algorithm (PASSCoDe) is an efficient method to train linear models in multi-core shared memory systems.
{\\tt PASSCoDe} enjoys a good speedup when the number of threads is less than 8 on sparse datasets, i.e., the percentage of nonzero elements in the training data is relatively small.
However, due to the memory conflict and delayed parameter access problem in parallel execution, it often diverges or does not converge to the best accuracy as a serial dual coordinate descent algorithm does.
In this paper, we proposed two algorithms -- {\\em Adaptive Hybrid} algorithm and {\\em Lazy-Sync} algorithm, to overcome the convergence issues in parallel execution.
Experiment results indicate that both algorithms converge to the {\\em same} high accuracy as a sequential program does on {\\em all}  datasets we tested, except on one extremely small dataset.
On the other hand, PASSCoDe sometimes converges to a less accurate value, or does not converge at all on some datasets.
Our methods also outperform PASSCoDe-Fix, an improved version of PASSCoDe, in stable convergence, execution speed, and scalability.

PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding

The 35th AAAI Conference on Artificial Intelligence (AAAI 2021), February 2021

Zhu-Mu Chen, Mi-Yen Yeh, and Tei-Wei Kuo

Mi-Yen Yeh

In this paper, we study the problem of embedding uncertain knowledge graphs, where each relation between entities is associated with a confidence score. Observing the existing embedding methods may discard the uncertainty information, only incorporate a specific type of score function, or cause many false-negative samples in the training, we propose the PASSLEAF framework to solve the above issues. PASSLEAF consists of two parts, one is a model that can incorporate different types of scoring functions to predict the relation confidence scores and the other is the semi-supervised learning model by exploiting both positive and negative samples associated with the estimated confidence scores. Furthermore, PASSLEAF leverages a sample pool as a relay of generated samples to further augment the semi-supervised learning. Experiment results show that our proposed framework can learn better embedding in terms of having higher accuracy in both the confidence score prediction and tail entity prediction.

Accelerating Continuous Normalizing Flow with Trajectory Polynomial Regularization

The 35th AAAI Conference on Artificial Intelligence (AAAI 2021), February 2021

Han-Hsien Huang and Mi-Yen Yeh

Mi-Yen Yeh

In this paper, we propose an approach to effectively accelerating the computation of continuous normalizing flow (CNF), which has been proven to be a powerful tool for the tasks such as variational inference and density estimation. The training time cost of CNF can be extremely high because the required number of function evaluations (NFE)  for solving corresponding ordinary differential equations (ODE) is very large. We think that the high NFE results from large truncation errors of solving ODEs. To address the problem, we propose to add a regularization. The regularization penalizes the difference between the trajectory of the ODE and its fitted polynomial regression. The trajectory of ODE will approximate a polynomial function, and thus the truncation error will be smaller. Furthermore, we provide two proofs and claim that the additional regularization does not harm training quality. Experimental results show that our proposed method can result in 42.3\%  to  71.3\% reduction of NFE on the task of density estimation, and 19.3\%  to  32.1\% reduction of NFE on  variational auto-encoder, while the testing losses are not affected at all.

Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification

Thirty-Fifth AAAI Conference on Artificial Intelligence, February 2021

I-Yuan Kuo, Wen-Li Wei, and Jen-Chun Lin

I-Yuan Kuo Jen-Chun Lin

Recently, a non-local (NL) operation has been designed as the central building block for deep-net models to capture long-range dependencies (Wang et al. 2018). Despite its excellent performance, it does not consider the interaction between positions across channels and layers, which is crucial in fine-grained classification tasks. To address the limitation, we target at singer identification (SID) task and present a fully generalized non-local (FGNL) module to help identify finegrained vocals. Specifically, we first propose a FGNL operation, which extends the NL operation to explore the correlations between positions across channels and layers. Secondly, we further apply a depth-wise convolution with Gaussian kernel in the FGNL operation to smooth feature maps for better generalization. More, we modify the squeeze-and-excitation (SE) scheme into the FGNL module to adaptively emphasize correlated feature channels to help uncover relevant feature responses and eventually the target singer. Evaluating results on the benchmark artist20 dataset shows that the FGNL module significantly improves the accuracy of the deep-net models in SID. Codes are available at

Scaled-YOLOv4: Scaling Cross Stage Partial Network

arXiv:2011.08036vl, November 2020

Chien-Yao Wang, Alexey Bochkovskiy, and H. Y. Mark Liao

Chien-Yao Wang Hong-Yuan Mark Liao

We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4- large model achieves state-of-the-art results: 55.4% AP (73.3% AP50) for the MS COCO dataset at a speed of 15 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 55.8% AP (73.2 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.

A Flexible Template Generation and Matching Method with Applications for Publication Reference Metadata Extraction

Journal of the Association for Information Science and Technology, To Appear

Ting-Hao Yang, Yu-Lun Hsieh, Shih-Hung Liu, Yung-Chun Chang, and Wen-Lian Hsu

Yu-Lun Hsieh Yung-Chun Chang Wen-Lian Hsu

Conventional rule‐based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle‐based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment‐based template‐matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule‐based approaches. Last, we apply PBA to RME on extensive cross‐domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand‐crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi‐directional long short‐term memory with a CRF layer, Bi‐LSTM‐CRF), PBA has the best performance for all datasets.

LBERT: Lexically-aware Transformers based Bidirectional Encoder Representation model for learning Universal Bio-Entity Relations

Bioinformatics, To Appear

Neha Warikoo, Yung-Chun Chang, and Wen-Lian Hsu

Yung-Chun Chang Wen-Lian Hsu

Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Biomedical relation extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or employ external knowledge-based pre/post processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically-aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence level classification tasks. This paper presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein (PPI), drug-drug (DDI) and protein-bio-entity (REL) relation classification tasks by 0.02%, 11.2% and 41.4% respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context.

Text-guided Graph Neural Networks for Referring 3D Instance Segmentation

Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021

Pin-Hao Huang, Han-Hung Lee, Hwann-Tzong Chen and Tyng-Luh Liu

Tyng-Luh Liu

This paper addresses a new task called referring 3D instance segmentation, which aims to segment out the target instance in a 3D scene given a query sentence. Previous work on scene understanding has explored visual grounding with natural language guidance, yet the emphasis is mostly constrained on images and videos. We propose a Text-guided Graph Neural Network for referring 3D instance segmentation on point clouds. Given a query sentence and the point cloud of a 3D scene, our method learns to extract per-point features and predicts an offset to shift each point toward its object center. Based on the point features and the offset, we cluster the points to produce fused features and coordinates for the candidate objects. The resulting clusters are modeled as nodes in a Graph Neural Network (GNN) to learn the representations that encompass the relation structure for each candidate object. The GNN layers leverage each object's features and its relations with neighbors to generate an attention heatmap for the input sentence expression. Finally, the attention heatmap is used to ``guide" the aggregation of information from neighborhood nodes. Our method achieves state-of-the-art performance on the tasks of referring 3D instance segmentation and 3D localization on ScanRefer, Nr3D, and Sr3D benchmarks.

Comparison of different variant sequence types coupled with decoy generation methods used in concatenated target-decoy database searches for proteogenomic research

Journal of Proteomics, January 2021

Wai-Kok Choong and Ting-Yi Sung

Wai-Kok Choong Ting-Yi Sung

Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set.

Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 2020

Yun-Hsuan Jen, Chieh-Yang Huang, MeiHua Chen, Ting-Hao Huang and Lun-Wei Ku

Lun-Wei Ku

Many language learners have trouble using near-synonym words (e.g.,small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agents performance. To enable the agent to behave like a learner, we leverage entailment modelings capability of inferring answers from the provided materials. Experimental results show that the proposed agentis equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners.The results of the user study show that the proposed agent can find out example sentencesthat help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

Reactive Supervision: A New Method for Collecting Sarcasm Data

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 2020

Boaz Shmueli, Lun-Wei Ku and Soumya Ray

Lun-Wei Ku

Sarcasm detection is an important task in affective computing, requiring large amounts of labeled data. We introduce reactive supervision, a novel data collection method that utilizes the dynamics of online conversations to overcome the limitations of existing data collection techniques. We use the new method to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective labels and new contextual features. The dataset is expected to advance sarcasm detection research. Our method can be adapted to other affective computing domains, thus opening up new research opportunities.

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

IEEE/ACM Transactions on Audio, Speech, and Language Processing, November 2020

Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, and Hsin-Min Wang

Hsin-Min Wang

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspace-based representation, which can extract concealed phonotactic structures from utterances, for language verification and dialect/accent identification. The framework mainly involves two successive parts. The first part involves subspace construction. Specifically, it decodes each utterance into a sequence of vectors filled with phone-posteriors and transforms the vector sequence into a linear orthogonal subspace based on low-rank matrix factorization or dynamic linear modeling. The second part involves subspace learning based on kernel machines, such as support vector machines and the newly developed subspace-based neural networks (SNNs). The input layer of SNNs is specifically designed for the sample represented by subspaces. The topology ensures that the same output can be derived from identical subspaces by modifying the conventional feed-forward pass to fit the mathematical definition of subspace similarity. Evaluated on the "General LR" test of NIST LRE 2007, the proposed method achieved up to 52\\%, 46%, 56%, and 27% relative reductions in equal error rates over the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods and the lattice-based PPR-LM method, respectively. Furthermore, on the dialect/accent identification task of NIST LRE 2009, the SNN-based system performed better than the aforementioned four baseline methods.

Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism

ACM Multimedia Conference, October 2020

Jen-Chun Lin, Wen-Li Wei, Yen-Yu Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao

Jen-Chun Lin Tyng-Luh Liu Hong-Yuan Mark Liao

Interesting and emerging task. It produces a coherent visual story in the form of a shot type sequence, which not only expands the storytelling potential for a song but also facilitates automatic concert video mashup process and storyboard generation. In this study, we present a deep interactive learning (DIL) mechanism for building a compact yet accurate sequence-to-sequence model to accomplish the task. Different from the one-way transfer between a pre-trained teacher network (or ensemble network) and a student network in knowledge distillation (KD), the proposed method enables collaborative learning between an ensemble teacher network and a student network. Namely, the student network also teaches. Specifically, our method first learns a teacher network that is composed of several assistant networks to generate a shot type sequence and produce the soft target (shot types) distribution accordingly through KD. It then constructs the student network that learns from both the ground truth label (hard target) and the soft target distribution to alleviate the difficulty of optimization and improve generalization capability. As the student network gradually advances, it turns to feed back knowledge to the assistant networks, thereby improving the teacher network in each iteration. Owing to such interactive designs, the DIL mechanism bridges the gap between the teacher and student networks and produces more superior capability for both networks. Objective and subjective experimental results demonstrate that both the teacher and student networks can generate more accurate for improving the performance.

Self-similarity Student for Partial Label Histopathology Image Segmentation

16th European Conference on Computer Vision (ECCV), August 2020

Hsien-Tzu Cheng, Chun-Fu Yeh, Po-Chen Kuo, Andy Wei, Keng-Chi Liu, Mong-Chi Ko, Kuan-Hua Chao, Yu-Ching Peng, and Tyng-Luh Liu

Tyng-Luh Liu

Delineation of cancerous regions in gigapixel whole slide images (WSIs) is a crucial diagnostic procedure in digital pathology. This process is time-consuming because of the large search space in the gigapixel WSIs, causing chances of omission and misinterpretation at indistinct tumor lesions. To tackle this, the development of an automated cancerous region segmentation method is imperative. We frame this issue as a modeling problem with partial label WSIs, where some cancerous regions may be misclassified as benign and vice versa, producing patches with noisy labels. To learn from these patches, we propose Self-similarity Student, combining teacher-student model paradigm with similarity learning. Specifically, for each patch, we first sample its similar and dissimilar patches according to spatial distance. A teacher-student model is then introduced, featuring the exponential moving average on both student model weights and teacher predictions ensemble. While our student model takes patches, teacher model takes all their corresponding similar and dissimilar patches for learning robust representation against noisy label patches. Following this similarity learning, our similarity ensemble merges similar patches' ensembled predictions as the pseudo-label of a given patch to counteract its noisy label. On the CAMELYON16 dataset, our method substantially outperforms state-of-the-art noise-aware learning methods by 5% and the supervised-trained baseline by 10% in various degrees of noise. Moreover, our method is superior to the baseline on our TVGH TURP dataset with 2% improvement, demonstrating the generalizability to more clinical histopathology segmentation tasks.

An Adaptive Layer Expansion Algorithm for Efficient Training of Deep Neural Networks

IEEE International Conference on Big Data, December 2020

Leo Chen, Pangfeng Liu, and Jan-Jan Wu

Jan-Jan Wu

In this paper, we propose an adaptive layer expansion algorithm to reduce the training time of deep neural networks without noticeable loss of accuracy. Neu- ral networks have become deeper and wider to improve accuracy. The size of such networks makes them time- consuming to train. Hence, we propose an adaptive layer expansion algorithm that reduces training time by dynam- ically adding nodes in where is necessary, to improve the training efficiency while not losing accuracy. We start with a smaller model of only a fraction of parameters of the original model, then train the network and add nodes to specific layers determined by the stability of gradients. The algorithm repeatedly adds nodes until a threshold is reached, and trains model until the accuracy converges. The experiment results indicate that our algorithm only uses a quarter of computation time of a full model, and achieves 64.1% accuracy on MobileNet with dataset CIFAR100, which is only 2% less than 66.2% accuracy form a full model. The algorithm stops adding nodes when it has only half of the parameters of the original model. As a result, this new model is good for fast inference in those environments where both the computation power and memory storage are very limited, such as mobile devices.

Automated Graph Generation at Sentence Level for Reading Comprehension Based on Conceptual Graphs

The 28th International Conference on Computation Linguistics (COLING), December 2020

Wan-Hsuan Lin and Chun-Shien Lu

Wan-Hsuan Lin Chun-Shien Lu

This paper proposes a novel miscellaneous-context-based method to convert a sentence into a knowledge embedding in the form of a directed graph. We adopt the idea of conceptual graphs to frame for the miscellaneous textual information into conceptual compactness. We first empirically observe that this graph representation method can (1) accommodate the slot-filling challenges in typical question answering and (2) access to the sentence-level graph structure in order to explicitly capture the neighbouring connections of reference concept nodes. Secondly, we propose a task-agnostic semantics-measured module, which cooperates with the graph representation method, in order to (3) project an edge of a sentence-level graph to the space of semantic relevance with respect to the corresponding concept nodes. As a result of experiments on the QA-type relation extraction, the combination of the graph representation and the semantics-measured module achieves the high accuracy of answer prediction and offers humancomprehensible graphical interpretation for every well-formed sample. To our knowledge, our approach is the first towards the interpretable process of learning vocabulary representations with the experimental evidence.

Determinizing Crash Behavior with a Verified Snapshot-Consistent Flash Translation Layer

USENIX Symposium on Operating Systems Design and Implementation (OSDI), November 2020

Yun-Sheng Chang, Yao Hsiao, Tzu-Chi Lin, Che-Wei Tsao, Chun-Feng Wu, Yuan-Hao Chang, Hsiang-Shang Ko, and Yu-Fang Chen

Yun-Sheng Chang Che-Wei Tsao Chun-Feng Wu Yuan-Hao Chang Hsiang-Shang Ko Yu-Fang Chen

We introduce the design of a snapshot-consistent flash translation layer (SCFTL) for flash disks, which has a stronger guarantee about the possible behaviors after a crash than conventional designs. More specifically, the flush operation of SCFTL also has the functionality of making a “disk snapshot.” When a crash occurs, the flash disk is guaranteed to recover to the state right before the last flush. The major benefit of SCFTL is that it allows a more efficient design of upper layers in the storage stack. For example, the file system hosted by SCFTL does not require the use of a journal for crash recovery. Instead, it only needs to perform a flush operation of SCFTL at the end of each atomic transaction. We use a two-layer approach, combining a proof assistant, a symbolic executor, and an SMT solver, to formally verify the correctness of our prototype SCFTL implementation. We optimize the xv6 file system by utilizing SCFTL’s stronger crash guarantee. Evaluation results show that the optimized xv6 is 3 to 30 times faster than the original version.

Cross-batch Reference Learning for Deep Retrieval

IEEE Transactions on Neural Networks and Learning Systems, September 2020

Huei-Fang Yang, Kevin Lin, Ting-Yen Chen, and Chu-Song Chen

Chu-Song Chen

Learning effective representations that exhibit semantic content is crucial to image retrieval applications. Recent advances in deep learning have made significant improvements in performance on a number of visual recognition tasks. Studies have also revealed that visual features extracted from a deep network learned on a large-scale image data set (e.g., ImageNet) for classification are generic and perform well on new recognition tasks in different domains. Nevertheless, when applied to image retrieval, such deep representations do not attain performance as impressive as used for classification. This is mainly because the deep features are optimized for classification rather than for the desired retrieval task. We introduce the cross-batch reference (CBR), a novel training mechanism that enables the optimization of deep networks with a retrieval criterion. With the CBR, the networks leverage both the samples in a single minibatch and the samples in the others for weight updates, enhancing the stochastic gradient descent (SGD) training by enabling interbatch information passing. This interbatch communication is implemented as a cross-batch retrieval process in which the networks are trained to maximize the mean average precision (mAP) that is a popular performance measure in retrieval. Maximizing the cross-batch mAP is equivalent to centralizing the samples relevant to each other in the feature space and separating the samples irrelevant to each other. The learned features can discriminate between relevant and irrelevant samples and thus are suitable for retrieval. To circumvent the discrete, nondifferentiable mAP maximization, we derive an approximate, differentiable lower bound that can be easily optimized in deep networks. Furthermore, the mAP loss can be used alone or with a classification loss. Experiments on several data sets demonstrate that our CBR learning provides favorable performance, validating its effectiveness.

EpiMOLAS: An Intuitive Web-based Framework for Genome-wide DNA Methylation Analysis

BMC Genomics, April 2020

Sheng-Yao Su, I-Hsuan Lu, Wen-Chih Cheng, Wei-Chun Chung, Pao-Yang Chen, Jan-Ming Ho, Shu-Hwa Chen, Chung-Yen Lin

Jan-Ming Ho Chung-Yen Lin

DNA methylation is a crucial epigenomic mechanism in various biological processes. Using whole-genome bisulfite sequencing (WGBS) technology, methylated cytosine sites can be revealed at the single nucleotide level. However, the WGBS data analysis process is usually complicated and challenging.
To alleviate the associated difficulties, we integrated the WGBS data processing steps and downstream analysis into a two-phase approach. First, we set up the required tools in Galaxy and developed workflows to calculate the methylation level from raw WGBS data and generate a methylation status summary, the mtable. This computation environment is wrapped into the Docker container image DocMethyl, which allows users to rapidly deploy an executable environment without tedious software installation and library dependency problems. Next, the mtable files were uploaded to the web server EpiMOLAS_web to link with the gene annotation databases that enable rapid data retrieval and analyses.
To our knowledge, the EpiMOLAS framework, consisting of DocMethyl and EpiMOLAS_web, is the first approach to include containerization technology and a web-based system for WGBS data analysis from raw data processing to downstream analysis. EpiMOLAS will help users cope with their WGBS data and also conduct reproducible analyses of publicly available data, thereby gaining insights into the mechanisms underlying complex biological phenomenon. The Galaxy Docker image DocMethyl is available at piMOLAS_web is publicly accessible at

Piwi Reduction in the Aged Niche Eliminates Germline Stem Cells via Toll-GSK3 Signaling

Nature Communications, June 2020

Kun-Yang Lin, Wen-Der Wang, Chi-Hung Lin, Elham Rastegari, Yu-Han Su, Yi-Chieh Chang, Yu-Tzu Chang, Yung-Feng Liao, Haiwei Pi, Bo-Yi Yu, Shu-Hwa Chen, Chung-Yen Lin, Mei-Yeh Lu, Tsu-Yi Su, Fei-Yang Tzou, Chih-Chiang Chan, and Hwei-jan Hsu

Chung-Yen Lin

Transposons are known to participate in tissue aging, but their effects on aged stem cells remain unclear. Here, we report that in the Drosophila ovarian germline stem cell (GSC) niche, aging-related reductions in expression of Piwi (a transposon silencer) derepress retrotransposons and cause GSC loss. Suppression of Piwi expression in the young niche mimics the aged niche, causing retrotransposon depression and coincident activation of Toll-mediated signaling, which promotes Glycogen synthase kinase 3 activity to degrade β-catenin. Disruption of β-catenin-E-cadherin-mediated GSC anchorage then results in GSC loss. Knocking down gypsy (a highly active retrotransposon) or toll, or inhibiting reverse transcription in the piwi-deficient niche, suppresses GSK3 activity and β-catenin degradation, restoring GSC-niche attachment. This retrotransposon-mediated impairment of aged stem cell maintenance may have relevance in many tissues, and could represent a viable therapeutic target for aging-related tissue degeneration.

Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup

IEEE Transactions on Multimedia, To Appear

W. L. Wei, J. C. Lin, T. L. Liu, H. R. Tyan, H. M. Wang, and H. Y. Mark Liao

Jen-Chun Lin Tyng-Luh Liu Hsin-Min Wang Hong-Yuan Mark Liao

An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multiresolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MFRNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience.

Temporally Guided Music-to-Body-Movement Generation

ACM International Conference on Multimedia (ACM MM), October 2020

Hsuan-Kai Kao and Li Su

Li Su

This paper presents a neural network model to generate virtual violinist’s 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists’ body movements considering key features in musical body movement.

DeepPrefetcher: A Deep Learning Framework for Data Prefetching in Flash Storage Devices

ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), September 2020

Gaddisa Olani Ganfure, Chun-Feng Wu, Yuan-Hao Chang, and Wei-Kuan Shih

Gaddisa Olani Ganfure Chun-Feng Wu Yuan-Hao Chang

In today’s data-driven world, applications access to storage device constitutes the high cost of processing a user request. Data prefetching is a technique used to alleviate storage access latency by predicting future data access and initiate a data fetch. However, the block access requests received by the storage device show poor spatial locality because most file-related locality is absorbed in the higher layers of the memory hierarchy, including the CPU cache and main memory. Besides, the utilization of multithreading strategies in today’s applications typically leads to interleaved block accesses, which makes detecting an access pattern at storage level very challenging for the existing prefetching techniques. Towards this, we propose and assess DeepPrefetcher, a novel Deep Neural Network inspired context-aware prefetching method that adapts to arbitrary memory access patterns. Under DeepPrefetcher, we capture block access pattern contexts using distributed representation and leverage Long Short Tem Memory neural architecture for context-aware prediction to improve the effectiveness of data prefetching. Instead of using the logical block address (LBA) value directly, we model the difference between successive access requests, which contains more patterns than LBA value for modeling. By targeting access pattern sequence in this manner, the DeepPrefetcher can learn the vital context from a long input LBA sequence and learn to predict both the previously seen and unseen access patterns. The experiment result reveals that DeepPrefetcher can increase an average prefetch accuracy, coverage, and speedup by 21.5%, 19.5%, and 17.2%, respectively, contrasted with the baseline prefetching strategies. Overall, the proposed prefetching approach performs better than the conventional prefetching studied on all benchmarks, and the results are encouraging.

Index of Cancer-Associated Fibroblasts Is Superior to the Epithelial-Mesenchymal Transition Score in Prognosis Prediction

Cancers, July 2020

Ying-Chieh Ko, Ting-Yu Lai, Shu-Ching Hsu,Fu-Hui Wang, Sheng-Yao Su, Yu-Lian Chen, Min-Lung Tsai, Chung-Chun Wu, Jenn-Ren Hsiao, Jang-Yang Chang, Yi-Mi Wu, Dan R Robinson, Chung-Yen Lin, Su-Fang Lin

Chung-Yen Lin

In many solid tumors, tissue of the mesenchymal subtype is frequently associated with epithelial-mesenchymal transition (EMT), strong stromal infiltration, and poor prognosis. Emerging evidence from tumor ecosystem studies has revealed that the two main components of tumor stroma, namely, infiltrated immune cells and cancer-associated fibroblasts (CAFs), also express certain typical EMT genes and are not distinguishable from intrinsic tumor EMT, where bulk tissue is concerned. Transcriptomic analysis of xenograft tissues provides a unique advantage in dissecting genes of tumor (human) or stroma (murine) origins. By transcriptomic analysis of xenograft tissues, we found that oral squamous cell carcinoma (OSCC) tumor cells with a high EMT score, the computed mesenchymal likelihood based on the expression signature of canonical EMT markers, are associated with elevated stromal contents featured with fibronectin 1 (Fn1) and transforming growth factor-β (Tgfβ) axis gene expression. In conjugation with meta-analysis of these genes in clinical OSCC datasets, we further extracted a four-gene index, comprising FN1, TGFB2, TGFBR2, and TGFBI, as an indicator of CAF abundance. The CAF index is more powerful than the EMT score in predicting survival outcomes, not only for oral cancer but also for the cancer genome atlas (TCGA) pan-cancer cohort comprising 9356 patients from 32 cancer subtypes. Collectively, our results suggest that a further distinction and integration of the EMT score with the CAF index will enhance prognosis prediction, thus paving the way for curative medicine in clinical oncology.

How to Cultivate a Green Decision Tree without Loss of Accuracy?

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2020

Tseng-Yi Chen, Yuan-Hao Chang, Ming-Chang Yang, and Huang-Wei Chen

Tseng-Yi Chen Yuan-Hao Chang Ming-Chang Yang

that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). In other words, the nodes which will be pruned in the post-pruning process are redundant data. Such unnecessary data will induce high writing energy consumption and long I/O latency onNVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.

How to Cut Out Expired Data with Nearly Zero Overhead for Solid-State Drives

ACM/IEEE Design Automation Conference (DAC), July 2020

Wei-Lin Wang, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Wei-Lin Wang Tseng-Yi Chen Yuan-Hao Chang

Modern flash memory always encounters the issues of huge performance overhead caused by garbage collection process. The most effective solution for minimizing garbage collection overhead is to lower the number of live pages in a flash block. However, current garbage collection strategies will copy all live pages in a to-be-erased flash block to another flash block even through some live pages will no longer be accessed. This is because present flash translation layer (FTL) designs cannot identify disused data from valid pages. In other words, if written data has the lifetime information, the problem can be resolved. Fortunately, an emerging write technology, also known as multi-streamed write technology, can bring additional information (e.g., data lifetime) from host-side system to flash memory device. By such observations, this work propose a dual-time referencing FTL (DTR-FTL) design to deal with disused data and minimize the overhead of garbage collection by referring to data lifetime information and block retention time. Moreover, as the DTR-FTL can store written data to appropriate flash block in the very first beginning, flash lifespan is also extremely lengthened by our proposed design. According to the experimental results, the overhead of live-page copying has been significantly reduced and the flash lifespan has been unbelievably prolonged by the DTR-FTL.

DSTL: A Demand-based Shingled Translation Layer for Enabling Adaptive Address Mapping on SMR Drives

ACM Transactions on Embedded Computing Systems (TECS), July 2020

Yi-Jing Chuang, Shuo-Han Chen, Yuan-Hao Chang, Yu-Pei Liang, Hsin-Wen Wei, and Wei-Kuan Shih

Shuo-Han Chen Yuan-Hao Chang

Shingled magnetic recording (SMR) is regarded as a promising technology for resolving the areal density limitation of conventional magnetic recording hard disk drives. Among different types of SMR drives, drivemanaged SMR (DM-SMR) requires no changes on the host software and is widely used in today’s consumer market. DM-SMR employs a shingled translation layer (STL) to hide its inherent sequential-write constraint from the host software and emulate the SMR drive as a block device via maintaining logical to physical block address mapping entries. However, because most existing STL designs do not simultaneously consider the access pattern and the data update frequency of incoming workloads, those mapping entries maintained within the STL cannot be effectively managed, thus inducing unnecessary performance overhead. To resolve< the inefficiency of existing STL designs, this article proposes a demand-based STL (DSTL) to simultaneously consider the access pattern and update frequency of incoming data streams to enhance the access performance of DM-SMR. The proposed design was evaluated by a series of experiments, and the results show that the proposed DSTL can outperform other SMR management approach by up to 86.69% in terms of read/write performance.

Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression

Cell, July 2020

Yi-Ju Chen, Theodoros I Roumeliotis, Ya-Hsuan Chang, Ching-Tai Chen, Chia-Li Han*, Miao-Hsia Lin, Huei-Wen Chen, Gee-Chen Chang, Yih-Leong Chang, Chen-Tu Wu, Mong-Wei Lin, Min-Shu Hsieh, Yu-Tai Wang, Yet-Ran Chen, Inge Jonassen, Fatemeh Zamanzad Ghavidel, Ze-Shiang Lin, Kuen-Tyng Lin, Ching-Wen Chen, Pei-Yuan Sheu, Chen-Ting Hung, Ke-Chieh Huang, Hao-Chin Yang, Pei-Yi Lin, Ta-Chi Yen, Yi-Wei Lin, Jen-Hung Wang, Lovely Raghav, Chien-Yu Lin, Yan-Si Chen, Pei-Shan Wu, Chi-Ting Lai, Shao-Hsing Weng, Kang-Yi Su, Wei-Hung Chang, Pang-Yan Tsai, Ana I Robles, Henry Rodriguez, Yi-Jing Hsiao, Wen-Hsin Chang, Ting-Yi Sung*, Jin-Shing Chen*, Sung-Liang Yu*, Jyoti S Choudhary*, Hsuan-Yu Chen*, Pan-Chyr Yang*, and Yu-Ju Chen*

Ching-Tai Chen Jen-Hung Wang Ting-Yi Sung

Lung cancer in East Asia is characterized by a high percentage of never-smokers, early onset and predominantEGFR mutations. To illuminate the molecular phenotype of this demographically distinct disease, we performed a deep ㄏcomprehensive proteogenomic study on a prospectively collected cohort in Taiwan, representing early stage, predominantly female, non-smoking lung adenocarcinoma. Integrated genomic, proteomic, and phosphoproteomic analysis delineated the demographically distinct molecular attributes and hallmarks of tumor progression. Mutational signature analysis revealed age- and gender-related mutagenesis mechanisms, characterized by high prevalence of APOBEC mutational signature in younger females and over-representation of environmental carcinogen-like mutational signatures in older females. A proteomics-informed classification distinguished the clinical characteristics of early stage patients with EGFR mutations. Furthermore, integrated protein network analysis revealed the cellular remodeling underpinning clinical trajectories and nominated candidate biomarkers for patient stratification and therapeutic intervention. This multi-omic molecular architecture may help develop strategies for management of early stage never-smoker lung adenocarcinoma.