Institute of Information Science, Academia Sinica

Research Overview


Press Ctrl+P to print from browser

Recent Research Results


Realising intensional S4 and GL modalities

Conference on Computer Science Logic (CSL), February 2022

Liang-Ting Chen and Hsiang-Shang Ko

Liang-Ting Chen Hsiang-Shang Ko


There have been investigations into type-theoretic foundations for metaprogramming, notably Davies and Pfenning’s (2001) treatment in S4 modal logic, where code evaluating to values of type A is given the modal type Code A (□A in the original paper). Recently Kavvos (2017) extended PCF with Code A and intensional recursion, understood as the deductive form of the GL (Gödel-Löb) axiom in provability logic, but the resulting type system is logically inconsistent. Inspired by staged computation, we observe that a term of type Code A is, in general, code to be evaluated in a next stage, whereas S4 modal type theory is a special case where code can be evaluated in the current stage, and the two types of code should be discriminated. Consequently, we use two separate modalities ⊠ and □ to model S4 and GL respectively in a unified categorical framework while retaining logical consistency. Following Kavvos’ (2017) novel approach to the semantics of intensionality, we interpret the two modalities in the 𝒫-category of assemblies and trackable maps. For the GL modality □ in particular, we use guarded type theory to articulate what it means by a ‘next’ stage and to model intensional recursion by guarded recursion together with Kleene’s second recursion theorem. Besides validating the S4 and GL axioms, our model better captures the essence of intensionality by refuting congruence, i.e. two extensionally equal terms may not be intensionally equal, and the generic internal quoting A → □A as well as A → ⊠A. Our results are developed in (guarded) homotopy type theory and formalised in Agda.

Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling

IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2021), December 2021

Ming-Chi Yen, Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Shu-Wei Tsai, Yu Tsao, Tomoki Toda, Jyh-Shing Jang, and Hsin-Min Wang

Hsin-Min Wang


The electrolaryngeal speech (EL speech) is typically spoken with an electrolarynx device that generates excitation signals to substitute human vocal fold vibrations. Because the excitation signals cannot perfectly characterize sound sources generated by vocal folds, the naturalness and intelligibility of the EL speech are inevitably worse than that of the natural speech (NL speech). To improve speech naturalness, statistical models, such as Gaussian mixture models and deep-learning-based models, have been employed for EL speech voice conversion (ELVC). The ELVC task aims to convert EL speech into NL speech through an ELVC model. To implement a frame-wise ELVC system, accurate feature alignment is crucial for model training. However, the abnormal acoustic characteristics of the EL speech cause misalignments and accordingly limit the ELVC performance. To address this issue, we propose a novel ELVC system based on sequence-to-sequence (seq2seq) modeling with text-to-speech (TTS) pretraining. The seq2seq model involves an attention mechanism to concurrently perform representation learning and alignment. Meanwhile, TTS pretraining provides efficient training with limited data. Experimental results show that the proposed ELVC system yields notable improvements in terms of standardized evaluation metrics and subjective listening tests over a well-known frame-wise ELVC system.

Somatic mutation subtypes of lung adenocarcinoma in East Asian reveal divergent biological characteristics and therapeutic vulnerabilities

iScience, June 2021

Wai-Kok Choong and Ting-Yi Sung*

Wai-Kok Choong Ting-Yi Sung


Lung adenocarcinoma (LUAD) patients in East Asia predominantly harbor oncogenic EGFR mutations. However, there remains a limited understanding of the biological characteristics and therapeutic vulnerabilities of the concurrent mutations of EGFR and other genes in LUAD. Here, we performed comprehensive bioinformatics analyses on 88 treatment-na簿ve East Asian LUAD patients. Based on somatic mutation clustering, we identified three somatic mutation subtypes: EGFR + TP53 co-mutation, EGFR mutation, and multiple-gene mutation. A proteogenomic analysis among subtypes revealed varying degrees of dysregulation in cell-cycle-related and immune-related processes. An immune-characteristic analysis revealed higher PDL1 protein expression in the EGFR + TP53 co-mutation subtype than in the EGFR mutation subtype, which may affect the therapeutic efficacy of anti-PD-L1 therapy. Moreover, integrating known and potential therapeutic target analysis reveals therapeutic vulnerabilities of specific subtypes and nominates candidate biomarkers for therapeutic intervention. This study provides new biological insight and therapeutic opportunities with respect to EGFR-mutant LUAD subtypes.

A greedy algorithm for dropping digits (Functional Pearl)

Journal of Functional Programming, 2021

Richard Bird and Shin-Cheng Mu

Shin-Cheng Mu


Consider the following puzzle: given a number, remove k digits such that the resulting number is as large as possible. Various techniques are employed to derive a linear-time solution to the puzzle: we justify the structure of a greedy algorithm by predicate logic, give a constructive proof of the greedy condition using a dependently-typed proof assistant, and calculate the greedy step as well as the final, linear-time optimisation by equational reasoning.

Lying Through One’s Teeth: A Study on Verbal Leakage Cues

Proceedings of the The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), November 2021

Min-Hsuan Yeh and Lun-Wei Ku

Min-Hsuan Yeh Lun-Wei Ku


Although many studies use the LIWC lexicon to show the existence of verbal leakage cues in lie detection datasets, none mention how verbal leakage cues are influenced by means of data collection, or the impact thereof on the performance of models. In this paper, we study verbal leakage cues to understand the effect of the data construction method on their significance, and examine the relationship between such cues and models' validity. The LIWC word-category dominance scores of seven lie detection datasets are used to show that audio statements and lie-based annotations indicate a greater number of strong verbal leakage cue categories. Moreover, we evaluate the validity of state-of-the-art lie detection models with cross- and in-dataset testing. Results show that in both types of testing, models trained on a dataset with more strong verbal leakage cue categories---as opposed to only a greater number of strong cues---yield superior results, suggesting that verbal leakage cues are a key factor for selecting lie detection datasets.

Exploring the Power of Lightweight YOLOv4

IEEE International Conference on Computer Vision (ICCV) ``Low Power Computer Vision'' Workshop, October 2021

C. Y. Wang, H. Y. Mark Liao, I. H. Yeh, Y. Y. Chuang, and Y. L. Lin

Chien-Yao Wang Hong-Yuan Mark Liao


Research on deep learning has always had two main streams: (1) design a powerful network architecture and train it with existing learning methods to achieve the best results, and (2) design better learning methods so that the existing network architecture can achieve the best capability after training. In recent years, because mobile device has become popular, the requirement of low power consumption becomes a must. Under the requirement of low power consumption, we hope to design low-cost lightweight networks that can be effectively deployed at the edge, while it must have enough resources to be used and the inference speed must be fast enough. In this work, we set a very ambitious goal of exploring the power of lightweight neural networks. We utilize the analysis of data space, model’s representational capacity, and knowledge projection space to construct an automated machine learning pipeline. Through this mechanism, we systematically derive the most suitable knowledge projection space between the data and the model. Our method can indeed automatically find learning strategies suitable for the target model and target application through exploration. Experiment results show that the proposed method can significantly enhance the accuracy of lightweight neural networks for object detection. We directly apply the lightweight model trained by our proposed method to a Jetson Xavier NX embedded module and a Kneron KL720 edge AI SoC as system solutions.

You Only Learn One Representation: Unified Network for Multiple Tasks

arXiv:2015.04206v1, May 2021

C. Y. Wang, I. H. Yeh, and H. Y. Mark Liao

Chien-Yao Wang Hong-Yuan Mark Liao


People the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a neural network. As for the additional resource used in embedding implicit knowledge (including the amount of parameters and calculations), the overall extra cost is less than one ten thousandth. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. For example, the proposed unified network achieved comparable accuracy as Scaled-YOLOv4 on object detection and the inference speed has been increased by 88%

Composite Neural Network: Theory and Application to PM2.5 Prediction

IEEE Transactions on Knowledge and Data Engineering, To Appear

Ming-Chuan Yang and Meng Chang Chen

Ming-Chuan Yang Meng-Chang Chen


This work investigates the framework and statistical performance guarantee of the composite neural network, which is composed of a collection of pre-trained and non-instantiated neural network models connected as a rooted directed acyclic graph, for solving complicated applications. A pre-trained neural network model is generally well trained, targeted to approximate a specific function.  The advantages of adopting a pre-trained model as a component in composing a complicated neural network are two-fold. One is benefiting from the intelligence and diligence of domain experts, and the other is saving effort in data acquisition as well as computing resources and time for model training. Despite a general belief that a composite neural network may perform better than any a single component, the overall performance characteristics are not clear. In this work, we propose the framework of a composite network, and prove that a composite neural network performs better than any of its pre-trained components with a high probability. In the study, we explore a complicated application---PM2.5 prediction---to support the correctness of the proposed composite network theory. In the empirical evaluations of PM2.5 prediction, the constructed composite neural network models perform better than other machine learning models.

SurpriseNet: Melody Harmonization Conditioning on User-controlled Surprise Contours

ISMIR2021, November 2021

Yi-Wei Chen, Hung-Shin Lee, Yen-Hsing Chen, and Hsin-Min Wang

Hsin-Min Wang


The surprisingness of a song is an essential and seemingly subjective factor in determining whether the listener likes it. With the help of information theory, it can be described as the transition probability of a music sequence modeled as a Markov chain. In this study, we introduce the concept of deriving entropy variations over time, so that the surprise contour of each chord sequence can be extracted. Based on this, we propose a user-controllable framework that uses a conditional variational autoencoder (CVAE) to harmonize the melody based on the given chord surprise indication. Through explicit conditions, the model can randomly generate various and harmonic chord progressions for a melody, and the Spearman’s correlation and p-value significance show that the resulting chord progressions match the given surprise contour quite well. The vanilla CVAE model was evaluated in a basic melody harmonization task (no surprise control) in terms of six objective metrics. The results of experiments on the Hooktheory Lead Sheet Dataset show that our model achieves performance comparable to the state-of-the-art melody harmonization model.

Learning Unsupervised Metaformer for Anomaly Detection

International Conference on Computer Vision (ICCV), October 2021

Jhih-Ciang Wu, Ding-Jie Chen, Chiou-Shann Fuh and Tyng-Luh Liu

Tyng-Luh Liu


Anomaly detection (AD) aims to address the task of classification or localization of image anomalies. This paper addresses two pivotal issues of reconstruction-based approaches to AD in images, namely, model adaptation and reconstruction gap. The former generalizes an AD model to tackling a broad range of object categories, while the latter provides useful clues for localizing abnormal regions. At the core of our method is an unsupervised universal model, termed as Metaformer, which leverages both meta-learned model parameters to achieve high model adaptation capability and instance-aware attention to emphasize the focal regions for localizing abnormal regions, i.e., to explore the reconstruction gap at those regions of interest. We justify the effectiveness of our method with SOTA results on the MVTec AD dataset of industrial images and highlight the adaptation flexibility of the universal Metaformer with multi-class and few-shot scenarios.

Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

in Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), August 2021

Boaz Shmueli, Soumya Ray and Lun-Wei Ku

Boaz Shmueli Lun-Wei Ku


Datasets with induced emotion labels are scarce but of utmost importance for many NLP tasks. We present a new, automated method for collecting texts along with their induced reaction labels. The method exploits the online use of reaction GIFs, which capture complex affective states. We show how to augment the data with induced emotion and induced sentiment labels. We use our method to create and publish ReactionGIF, a first-of-its-kind affective dataset of 30K tweets. We provide baselines for three new tasks, including induced sentiment prediction and multilabel classification of induced emotions. Our method and dataset open new research opportunities in emotion detection and affective computing.

Plot and Rework: Modeling Storylines for Visual Storytelling

in Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), ACL Findings, August 2021

Chi-yang Hsu, Yun-Wei Chu, Ting-Hao Huang and Lun-Wei Ku

Chi-Yang Hsu Yun-Wei Chu Ting-Hao Huang Lun-Wei Ku


Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

H-FND: Hierarchical False-Negative Denoising for Distant Supervision Relation Extraction

Annual Meeting of Association for Computational Linguistics 2021, August 2021

Jhih-Wei Chen, Tsu-Jui Fu, Chen-Kang Lee, Wei-Yun Ma

Jhih-Wei Chen Tsu-Jui Fu Chen-Kang Lee Wei-Yun Ma


Although distant supervision automatically generates training data for relation extraction, it also introduces false-positive (FP) and false-negative (FN) training instances to the generated datasets. Whereas both types of errors degrade the final model performance, previous work on distant supervision denoising focuses more on suppressing FP noise and less on resolving the FN problem. We here propose H-FND, a hierarchical false-negative denoising framework for robust distant supervision relation extraction, as an FN denoising solution. H-FND uses a hierarchical policy which first determines whether non-relation (NA) instances should be kept, discarded, or revised during the training process. For those learning instances which are to be revised, the policy further reassigns them appropriate relations, making them better training inputs. Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances. In this setting, H-FND can revise FN instances correctly and maintains high F1 scores even when 50% of the instances have been turned into negatives. Experiment on NYT10 is further conducted to shows that H-FND is applicable in a realistic setting.

Space-efficient Graph Data Placement to Save Energy of ReRAM Crossbar

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), July 2021

Ting-Hsuan Lo, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo, and Wei-Chen Wang

Chun-Feng Wu Yuan-Hao Chang


While Processing-In-Memory (PIM) offers a promising approach in running graph applications, crossbar accelerators with Resistive Random-Access Memory (ReRAM) receive attention from the academics. However, in order to match the property of bitline current summation, before being processed, graph data are mapped to adjacency matrices which incur severe sparsity and random access issues. This work provides an offline adjacency matrix index remapping scheme. The strategy targets at sparsity and spatial locality improvement with rational computation overhead and better energy consumption for any given graph partition configuration in adjacency matrix format.

Androgenic sensitivities and ovarian gene expression profiles prior to treatment in Japanese eel (Anguilla japonica)

Marine Biotechnology, June 2021

Yung-Sen Huang, Wen-Chih Cheng, Chung-Yen Lin

Wen-Chih Cheng Chung-Yen Lin


Androgens stimulate ovarian development in eels. Our previous report indicated a correlation between the initial (debut) ovarian status (determined by kernel density estimation (KDE), presented as a probability density of oocyte size) and the consequence of 17MT treatment (change in ovary). The initial ovarian status appeared to be an important factor influencing ovarian androgenic sensitivity. We postulated that the sensitivities of initial ovaries are correlated with their gene expression profiles. Japanese eels underwent operation to sample the initial ovarian tissues, and the samples were stored in liquid nitrogen. Using high-throughput next-generation sequencing (NGS) technology, ovarian transcriptomic data were mined and analyzed based on functional gene classification with cutoff-based differentially expressed genes (DEGs); the ovarian status was transformed into gene expression profiles globally or was represented by a set of gene list. Our results also implied that the initial ovary might be an important factor influencing the outcomes of 17MT treatments, and the genes related with neuronal activities or neurogenesis seemed to play an essential role in the positive effect.

Sequence to General Tree: Knowledge-Guided Geometry Word Problem Solving

ACL-IJCNLP2021, August 2021

Shih-hung Tsai, Chao-Chun Liang, Hsin-Min Wang, and Keh-Yih Su

Hsin-Min Wang Keh-Yih Su


With the recent advancements in deep learning, neural solvers have gained promising results in solving math word problems. However, these SOTA solvers only generate binary expression trees that contain basic arithmetic operators and do not explicitly use the math formulas. As a result, the expression trees they produce are lengthy and uninterpretable because they need to use multiple operators and constants to represent one single formula. In this paper, we propose sequence-to-general tree (S2G) that learns to generate interpretable and executable operation trees where the nodes can be formulas with an arbitrary number of arguments. With nodes now allowed to be formulas, S2G can learn to incorporate mathematical domain knowledge into problem-solving, making the results more interpretable. Experiments show that S2G can achieve a better performance against strong baselines on problems that require domain knowledge

AlloST: Low-resource Speech Translation without Source Transcription

Interspeech2021, August 2021

Yao-Fei Cheng, Hung-Shin Lee, and Hsin-Min Wang

Hsin-Min Wang


The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attention-based sequence-to-sequence model, where the encoder generates the phonetic embeddings and phone-aware acoustic representations, and the decoder controls the fusion of the two embedding streams to produce the target token sequence. In addition to investigating different fusion strategies, we explore the specific usage of byte pair encoding (BPE), which compresses a phone sequence into a syllablelike segmented sequence. Due to the conversion of symbols, a segmented sequence represents not only pronunciation but also language-dependent information lacking in phones. Experiments conducted on the Fisher Spanish-English and TaigiMandarin drama corpora show that our method outperforms the conformer-based baseline, and the performance is close to that of the existing best method using source transcription.

iTARGEX analysis of yeast deletome reveals novel regulators of transcriptional buffering in S phase and protein turnover

Nucleic Acids Research, July 2021

Huang J.H., Liao, Y.R., Lin, T.C., Tsai, C.H., Lai, W.Y., Chou, Y.K., Leu, J.Y., Tsai, H.K.*, and Kao, C.F.*

Huai-Kuang Tsai


Integrating omics data with quantification of biological traits provides unparalleled opportunities for discovery of genetic regulators by in silico inference. However, current approaches to analyze genetic-perturbation screens are limited by their reliance on annotation libraries for prioritization of hits and subsequent targeted experimentation. Here, we present iTARGEX (identification of Trait-Associated Regulatory Genes via mixture regression using EXpectation maximization), an association framework with no requirement ofa priori knowledge of gene function. After creating this tool, we used it to test associations between gene expression profiles and two biological traits in single-gene deletion budding yeast mutants, including transcription homeostasis during S phase and global protein turnover. For each trait, we discovered novel regulators without prior functional annotations. The functional effects of the novel candidates were then validated experimentally, providing solid evidence for their roles in the respective traits. Hence, we conclude that iTARGEX can reliably identify novel factors involved in given biological traits. As such, it is capable of converting genome-wide observations into causal gene function predictions. Further application of iTARGEX in other contexts is expected to facilitate the discovery of new regulators and provide observations for novel mechanistic hypotheses regarding different biological traits and phenotypes.

Scaled-YOLOv4: Scaling Cross Stage Partial Network

Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, June 2021

C. Y. Wang, Alexey Bochkovskiy, H. Y. Mark Liao

Chien-Yao Wang Hong-Yuan Mark Liao


We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4- large model achieves state-of-the-art results: 55.4% AP (73.3% AP50) for the MS COCO dataset at a speed of 15 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 55.8% AP (73.2 AP50). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP50) at a speed of 443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.

Enabling Write-reduction Multiversion Scheme with Efficient Dual-range Query over NVRAM

IEEE Transactions on Very Large Scale Integration Systems (TVLSI), June 2021

I-Ju Wang, Yu-Pei Liang, Tseng-Yi Chen, Yuan-Hao Chang, Bo-Jun Chen, Hsin-Wen Wei, and Wei-Kuan Shih

Tseng-Yi Chen Yuan-Hao Chang


Due to cyber-physical systems, a large-scale multiversion indexing scheme has garnered significant attention in recent years. However, modern multiversion indexing schemes have significant drawbacks (e.g., heavy write traffic and weak key- or version-range-query performance) while being applied to a computer system with a nonvolatile random access memory (NVRAM) as its main memory. Unfortunately, with the considerations of high memory cell density and zero-static power consumption, NVRAM has been regarded as a promising candidate to substitute for dynamic random access memory (DRAM) in future computer systems. Therefore, it is critical to make a multiversion indexing scheme friendly for an NVRAM-based system. For tackling this issue with modern multiversion indexing schemes, this article proposes a write-reduction multiversion indexing scheme with efficient dual-range queries. According to the experiments, our scheme effectively reduces the amount of write traffic generated by the multiversion indexing scheme to NVRAM. It offers efficient dual-range queries by consolidating the proposed version forest and the multiversion tree.

On Minimizing Internal Data Migrations of Flash Devices via Lifetime-Retention Harmonization

IEEE Transactions on Computers (TC), March 2021

Ming-Chang Yang, Chun-Feng Wu, Shuo-Han Chen, Yi-Ling Lin, Che-Wei Chang, and Yuan-Hao Chang

Ming-Chang Yang Chun-Feng Wu Shuo-Han Chen Yuan-Hao Chang


With the emerge of high-density triple-level-cell (TLC) and 3D NAND flash, the access performance and endurance of flash devices are degraded due to the downscaling of flash cells. In addition, we observe that the mismatch between data lifetime requirement and flash block retention capability could further worsen the access performance and endurance. This is because the “lifetime-retention mismatch” could result in massive internal data migrations during garbage collection and data refreshing, and further aggravate the already-worsened access performance and endurance of high-density NAND flash devices. Such an observation motivates us to resolve the lifetime-retention mismatch problem by proposing a “time harmonization strategy”, which coordinates the flash block retention capability with the data lifetime requirement to enhance the performance of flash devices with very limited endurance degradation. Specifically, this study aims to lower the amount of internal data migrations caused by garbage collection and data refreshing via storing data of different lifetime requirement in flash blocks with suitable retention capability. The trace-driven evaluation results reveal that the proposed design can effectively reduce the average response time by about 99 percent on average without sacrificing the overall endurance, as compared with the state-of-the-art designs.

Optimizing Lifetime Capacity and Read Performance of Bit-Alterable 3D NAND Flash

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), February 2021

Shuo-Han Chen, Ming-Chang Yang, and Yuan-Hao Chang

Shuo-Han Chen Ming-Chang Yang Yuan-Hao Chang


With the technology advance of bit-alterable 3-D NAND flash, bit-level program and erase operations have been realized and provide the possibility of “bit-level rewrite.” Bit-level rewrite is predicted to be highly beneficial to the performance of the densely packed, bit-error-prone 3-D NAND flash because bit-level rewrites can remove error bits at bit-level granularity, shorten the error correction latency, and boost the read performance. Distinctly, bit-level rewrite can curtail the lifetime expense of refresh operations via correcting the error bit stored in the individual flash cell directly without a full-page rewrite, which is employed by previous refresh techniques. However, because bit-level rewrite is predicted to have similar latency and wearing as conventional full-page rewrites, the throughput of bit-level rewrites needs to be examined to avoid low rewrite efficiency. This observation inspires us to investigate and propose the bit-level error removal (BER) scheme to utilize the bit-level rewrites for optimizing both the read performance and lifetime capacity in a most-efficient way. The experimental results are encouraging and showed that the read performance can be improved by an average of 25.22% with 40.39% reduction of lifetime expense.

A data-independent acquisition-based global phosphoproteomics system enables deep profiling

Nature Communications, May 2021

Reta Birhanu Kitata, Wai-Kok Choong, Chia-Feng Tsai, Pei-Yi Lin, Bo-Shiun Chen, Yun-Chien Chang, Alexey I. Nesvizhskii, Ting-Yi Sung and Yu-Ju Chen

Wai-Kok Choong Ting-Yi Sung


Phosphoproteomics can provide insights into cellular signaling dynamics. To achieve deep and robust quantitative phosphoproteomics profiling for minute amounts of sample, we here develop a global phosphoproteomics strategy based on data-independent acquisition (DIA) mass spectrometry and hybrid spectral libraries derived from data-dependent acquisition (DDA) and DIA data. Benchmarking the method using 166 synthetic phosphopeptides shows high sensitivity (<0.1 ng), accurate site localization and reproducible quantification (~5% median coefficient of variation). As a proof-of-concept, we use lung cancer cell lines and patient-derived tissue to construct a hybrid phosphoproteome spectral library covering 159,524 phosphopeptides (88,107 phosphosites). Based on this library, our single-shot streamlined DIA workflow quantifies 36,350 phosphosites (19,755 class 1) in cell line samples within two hours. Application to drug-resistant cells and patient-derived lung cancer tissues delineates site-specific phosphorylation events associated with resistance and tumor progression, showing that our workflow enables the characterization of phosphorylation signaling with deep coverage, high sensitivity and low between-run missing values.

Null Space Component Analysis of One-Shot Single-Channel Source Separation Problem

IEEE Transactions on Signal Processing, To Appear

Wen-Liang Hwang and Jinn Ho

Wen-Liang Hwang Jinn Ho


Extracting multiple unknown sources from a single observation of a single-channel is an ill-posed problem encountered in a variety of applications. This paper characterizes the ambiguity of solutions to the source separation problem, and then proposes a novel adaptive-operator-based approach to deriving solutions based on a combination of separation operators and domain-specific knowledge related to sources. The proposed scheme involves transforming the original problem into a new problem, in which data-dependent operators and the unknown sources are variables to be optimized. We demonstrate that a solution to the proposed optimization problem must reside in the null spaces of the operators, and any such solution also provides an optimal value to the original problem. We then demonstrate the applicability of the proposed method to the separation of sparse sources as well as AM-FM sources. Note that the proposed scheme outperformed corresponding state-of-the-art methods in noiseless as well as noisy environments. Finally, we demonstrate the efficacy of the proposed scheme in separation tasks based on real-world ECG data (i.e., extracting fetal ECG signals from noisy observations in which maternal and fetal ECGs recordings are superimposed) and electrical data (i.e.,separating singularities from harmonic components in an observation of noisy data related to surges in electrical current).

Knowledge Based Hyperbolic Propagation

in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Chienkun Huang, Liangying Huang, Lun-Wei Ku

Chang-You Tai Chien-Kun Huang Lun-Wei Ku


There has been significant progress in utilizing heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. However, existing KG-aware recommendation models rely solely on Euclidean space, neglecting hyperbolic space, which has already been shown to possess a superior ability to separate embeddings by providing more ``room''. We propose a knowledge based hyperbolic propagation framework (KBHP) which includes hyperbolic components for calculating the importance of KG attributes' relatives to achieve better knowledge propagation. In addition to the original relations in the knowledge graph, we propose a user purchase relation to better represent logical patterns in hyperbolic space, which bridges users and items for modeling user preference. Experiments on four real-world benchmarks show that KBHP is significantly more accurate than state-of-the-art models. We further visualize the generated embeddings to demonstrate that the proposed model successfully clusters attributes that are relevant to items and highlights those that contain useful information for recommendation.

User-Centric Path Reasoning towards Explainable Recommendation

in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), July 2021

Chang-You Tai, Liangying Huang, Chienkun Huang, Lun-Wei Ku

Chang-You Tai Chien-Kun Huang Lun-Wei Ku


There has been significant progress in the utilization of heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. Reasoning over KG paths sheds light on the user's decision making process. Previous methods focus on formulating this process as a multi-hop reasoning problem. However, without some form of guidance in the  reasoning process, such a huge search space results in poor accuracy and little explanation diversity. In this paper, we propose UCPR, a user-centric path reasoning network that constantly guides the search from the aspect of user demand and enables explainable recommendation. In this network, a multi-view structure leverages not only local sequence reasoning information but also a panoramic view of the user's demand portfolio while inferring subsequent user decision-making steps. Experiments on five real-world benchmarks show UCPR is significantly more accurate than state-of-the-art methods. Besides, we show that the proposed model successfully identifies users' concerns and increases reasoning diversity to enhance explainability. 

Beyond Fair Pay: Ethical Implications of Crowdsourcing NLP Task

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021), June 2021

Boaz Shmueli, Jan Fell, Soumya Ray, Lun-Wei Ku

Boaz Shmueli Lun-Wei Ku


The use of crowdworkers in NLP research is growing rapidly, in tandem with the expo-nential increase in research production in ma-chine learning and AI. Ethical discussion re-garding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions, such as fair pay. We draw attention to the lack of risk mitigation related to the various tasks performed by workers, including data label-ing, text evaluation, and text production. We find that the Final Rule, the common ethical framework used by researchers, did not antici-pate the use of online crowdsourcing platforms for data collection, and this results in potential gaps between the spirit and practice of human-subjects ethics in NLP research. We enu-merate common scenarios where crowdwork-ers performing NLP tasks are at risk of harm. We thus recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report. We also clarify some common misconceptions re-garding the Institutional Review Board review process. We hope this paper will serve to re-open the discussion within our community re-garding the ethical use of crowdworkers.

End-to-end Recurrent Cross-Modality Attention for Video Dialogue

IEEE/ACM Transactions on Audio, Speech and Language Processing, March 2021

Yun-Wei Chu, Kuan-Yen Lin, Chao-Chun Hsu, Lun-Wei Ku

Yun-Wei Chu Lun-Wei Ku


Visual dialogue systems need to understand dynamic visual scenes and comprehend semantics in order to converse with users. Constructing video dialogue systems is more challenging than traditional image dialogue systems because the large feature space of videos makes it difficult to capture semantic information. Furthermore, the dialogue system also needs to precisely answer users’ question based on comprehensive understanding of the videos and the previous dialogue. In order to improve the performance of video dialogue system, we proposed an end-to-end recurrent cross-modality attention (ReCMA) model to answer a series of questions about a video from both visual and textual modality. The answer representation of the question is updated based on both visual representation and textual representation in each step of the reasoning process to have a better understanding of both modalities’ information. We evaluate our method on the challenging DSTC7 video scene-aware dialog dataset and the proposed ReCMA achieves a relative 20.8% improvement over the baseline on CIDEr.