中央研究院 資訊科學研究所




Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup

IEEE Transactions on Multimedia, To Appear

W. L. Wei, J. C. Lin, T. L. Liu, H. R. Tyan, H. M. Wang, and H. Y. Mark Liao

Jen-Chun Lin Tyng-Luh Liu Hsin-Min Wang Hong-Yuan Mark Liao

An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multiresolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MFRNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience.

Temporally Guided Music-to-Body-Movement Generation

ACM International Conference on Multimedia (ACM MM), October 2020

Hsuan-Kai Kao and Li Su

Li Su

This paper presents a neural network model to generate virtual violinist’s 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists’ body movements considering key features in musical body movement.

Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism

ACM Multimedia Conference, October 2020

Jen-Chun Lin, Wen-Li Wei, Yen-Yu Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao

Jen-Chun Lin Tyng-Luh Liu Hong-Yuan Mark Liao

Learning from music to visual storytelling of shots is an interesting and emerging task. It produces a coherent visual story in the form of a shot type sequence, which not only expands the storytelling potential for a song but also facilitates automatic concert video mashup process and storyboard generation. In this study, we present a deep interactive learning (DIL) mechanism for building a compact yet accurate sequence-to-sequence model to accomplish the task. Different from the one-way transfer between a pre-trained teacher network (or ensemble network) and a student network in knowledge distillation (KD), the proposed method enables collaborative learning between an ensemble teacher network and a student network. Namely, the student network also teaches. Specifically, our method first learns a teacher network that is composed of several assistant networks to generate a shot type sequence and produce the soft target (shot types) distribution accordingly through KD. It then constructs the student network that learns from both the ground truth label (hard target) and the soft target distribution to alleviate the difficulty of optimization and improve generalization capability. As the student network gradually advances, it turns to feed back knowledge to the assistant networks, thereby improving the teacher network in each iteration. Owing to such interactive designs, the DIL mechanism bridges the gap between the teacher and student networks and produces more superior capability for both networks. Objective and subjective experimental results demonstrate that both the teacher and student networks can generate more attractive shot sequences from music, thereby enhancing the viewing and listening experience.

DeepPrefetcher: A Deep Learning Framework for Data Prefetching in Flash Storage Devices

ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), September 2020

Gaddisa Olani Ganfure, Chun-Feng Wu, Yuan-Hao Chang, and Wei-Kuan Shih

Gaddisa Olani Ganfure Chun-Feng Wu Yuan-Hao Chang

In today’s data-driven world, applications access to storage device constitutes the high cost of processing a user request. Data prefetching is a technique used to alleviate storage access latency by predicting future data access and initiate a data fetch. However, the block access requests received by the storage device show poor spatial locality because most file-related locality is absorbed in the higher layers of the memory hierarchy, including the CPU cache and main memory. Besides, the utilization of multithreading strategies in today’s applications typically leads to interleaved block accesses, which makes detecting an access pattern at storage level very challenging for the existing prefetching techniques. Towards this, we propose and assess DeepPrefetcher, a novel Deep Neural Network inspired context-aware prefetching method that adapts to arbitrary memory access patterns. Under DeepPrefetcher, we capture block access pattern contexts using distributed representation and leverage Long Short Tem Memory neural architecture for context-aware prediction to improve the effectiveness of data prefetching. Instead of using the logical block address (LBA) value directly, we model the difference between successive access requests, which contains more patterns than LBA value for modeling. By targeting access pattern sequence in this manner, the DeepPrefetcher can learn the vital context from a long input LBA sequence and learn to predict both the previously seen and unseen access patterns. The experiment result reveals that DeepPrefetcher can increase an average prefetch accuracy, coverage, and speedup by 21.5%, 19.5%, and 17.2%, respectively, contrasted with the baseline prefetching strategies. Overall, the proposed prefetching approach performs better than the conventional prefetching studied on all benchmarks, and the results are encouraging.

Index of Cancer-Associated Fibroblasts Is Superior to the Epithelial-Mesenchymal Transition Score in Prognosis Prediction

Cancers, July 2020

Ying-Chieh Ko, Ting-Yu Lai, Shu-Ching Hsu,Fu-Hui Wang, Sheng-Yao Su, Yu-Lian Chen, Min-Lung Tsai, Chung-Chun Wu, Jenn-Ren Hsiao, Jang-Yang Chang, Yi-Mi Wu, Dan R Robinson, Chung-Yen Lin, Su-Fang Lin

Chung-Yen Lin

In many solid tumors, tissue of the mesenchymal subtype is frequently associated with epithelial-mesenchymal transition (EMT), strong stromal infiltration, and poor prognosis. Emerging evidence from tumor ecosystem studies has revealed that the two main components of tumor stroma, namely, infiltrated immune cells and cancer-associated fibroblasts (CAFs), also express certain typical EMT genes and are not distinguishable from intrinsic tumor EMT, where bulk tissue is concerned. Transcriptomic analysis of xenograft tissues provides a unique advantage in dissecting genes of tumor (human) or stroma (murine) origins. By transcriptomic analysis of xenograft tissues, we found that oral squamous cell carcinoma (OSCC) tumor cells with a high EMT score, the computed mesenchymal likelihood based on the expression signature of canonical EMT markers, are associated with elevated stromal contents featured with fibronectin 1 (Fn1) and transforming growth factor-β (Tgfβ) axis gene expression. In conjugation with meta-analysis of these genes in clinical OSCC datasets, we further extracted a four-gene index, comprising FN1, TGFB2, TGFBR2, and TGFBI, as an indicator of CAF abundance. The CAF index is more powerful than the EMT score in predicting survival outcomes, not only for oral cancer but also for the cancer genome atlas (TCGA) pan-cancer cohort comprising 9356 patients from 32 cancer subtypes. Collectively, our results suggest that a further distinction and integration of the EMT score with the CAF index will enhance prognosis prediction, thus paving the way for curative medicine in clinical oncology.

How to Cultivate a Green Decision Tree without Loss of Accuracy?

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), August 2020

Tseng-Yi Chen, Yuan-Hao Chang, Ming-Chang Yang, and Huang-Wei Chen

Tseng-Yi Chen Yuan-Hao Chang Ming-Chang Yang

that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). In other words, the nodes which will be pruned in the post-pruning process are redundant data. Such unnecessary data will induce high writing energy consumption and long I/O latency onNVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.

How to Cut Out Expired Data with Nearly Zero Overhead for Solid-State Drives

ACM/IEEE Design Automation Conference (DAC), July 2020

Wei-Lin Wang, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Wei-Lin Wang Tseng-Yi Chen Yuan-Hao Chang

Modern flash memory always encounters the issues of huge performance overhead caused by garbage collection process. The most effective solution for minimizing garbage collection overhead is to lower the number of live pages in a flash block. However, current garbage collection strategies will copy all live pages in a to-be-erased flash block to another flash block even through some live pages will no longer be accessed. This is because present flash translation layer (FTL) designs cannot identify disused data from valid pages. In other words, if written data has the lifetime information, the problem can be resolved. Fortunately, an emerging write technology, also known as multi-streamed write technology, can bring additional information (e.g., data lifetime) from host-side system to flash memory device. By such observations, this work propose a dual-time referencing FTL (DTR-FTL) design to deal with disused data and minimize the overhead of garbage collection by referring to data lifetime information and block retention time. Moreover, as the DTR-FTL can store written data to appropriate flash block in the very first beginning, flash lifespan is also extremely lengthened by our proposed design. According to the experimental results, the overhead of live-page copying has been significantly reduced and the flash lifespan has been unbelievably prolonged by the DTR-FTL.

DSTL: A Demand-based Shingled Translation Layer for Enabling Adaptive Address Mapping on SMR Drives

ACM Transactions on Embedded Computing Systems (TECS), July 2020

Yi-Jing Chuang, Shuo-Han Chen, Yuan-Hao Chang, Yu-Pei Liang, Hsin-Wen Wei, and Wei-Kuan Shih

Shuo-Han Chen Yuan-Hao Chang

Shingled magnetic recording (SMR) is regarded as a promising technology for resolving the areal density limitation of conventional magnetic recording hard disk drives. Among different types of SMR drives, drivemanaged SMR (DM-SMR) requires no changes on the host software and is widely used in today’s consumer market. DM-SMR employs a shingled translation layer (STL) to hide its inherent sequential-write constraint from the host software and emulate the SMR drive as a block device via maintaining logical to physical block address mapping entries. However, because most existing STL designs do not simultaneously consider the access pattern and the data update frequency of incoming workloads, those mapping entries maintained within the STL cannot be effectively managed, thus inducing unnecessary performance overhead. To resolve< the inefficiency of existing STL designs, this article proposes a demand-based STL (DSTL) to simultaneously consider the access pattern and update frequency of incoming data streams to enhance the access performance of DM-SMR. The proposed design was evaluated by a series of experiments, and the results show that the proposed DSTL can outperform other SMR management approach by up to 86.69% in terms of read/write performance.

Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression

Cell, July 2020

Yi-Ju Chen, Theodoros I Roumeliotis, Ya-Hsuan Chang, Ching-Tai Chen, Chia-Li Han*, Miao-Hsia Lin, Huei-Wen Chen, Gee-Chen Chang, Yih-Leong Chang, Chen-Tu Wu, Mong-Wei Lin, Min-Shu Hsieh, Yu-Tai Wang, Yet-Ran Chen, Inge Jonassen, Fatemeh Zamanzad Ghavidel, Ze-Shiang Lin, Kuen-Tyng Lin, Ching-Wen Chen, Pei-Yuan Sheu, Chen-Ting Hung, Ke-Chieh Huang, Hao-Chin Yang, Pei-Yi Lin, Ta-Chi Yen, Yi-Wei Lin, Jen-Hung Wang, Lovely Raghav, Chien-Yu Lin, Yan-Si Chen, Pei-Shan Wu, Chi-Ting Lai, Shao-Hsing Weng, Kang-Yi Su, Wei-Hung Chang, Pang-Yan Tsai, Ana I Robles, Henry Rodriguez, Yi-Jing Hsiao, Wen-Hsin Chang, Ting-Yi Sung*, Jin-Shing Chen*, Sung-Liang Yu*, Jyoti S Choudhary*, Hsuan-Yu Chen*, Pan-Chyr Yang*, and Yu-Ju Chen*

Ching-Tai Chen Jen-Hung Wang Ting-Yi Sung

Lung cancer in East Asia is characterized by a high percentage of never-smokers, early onset and predominantEGFR mutations. To illuminate the molecular phenotype of this demographically distinct disease, we performed a deep ㄏcomprehensive proteogenomic study on a prospectively collected cohort in Taiwan, representing early stage, predominantly female, non-smoking lung adenocarcinoma. Integrated genomic, proteomic, and phosphoproteomic analysis delineated the demographically distinct molecular attributes and hallmarks of tumor progression. Mutational signature analysis revealed age- and gender-related mutagenesis mechanisms, characterized by high prevalence of APOBEC mutational signature in younger females and over-representation of environmental carcinogen-like mutational signatures in older females. A proteomics-informed classification distinguished the clinical characteristics of early stage patients with EGFR mutations. Furthermore, integrated protein network analysis revealed the cellular remodeling underpinning clinical trajectories and nominated candidate biomarkers for patient stratification and therapeutic intervention. This multi-omic molecular architecture may help develop strategies for management of early stage never-smoker lung adenocarcinoma.

MVIN: Learning multiview items for recommendation

the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), July 2020

Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu and Lun-Wei Ku

Chang-You Tai Meng-Ru Wu Yun-Wei Chu Lun-Wei Ku

Researchers have begun to utilize heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item’s properties from multiple perspectives. To address these issues, we propose the multi-view item network (MVIN), a GNN-based recommendation model which provides superior recommendations by describing items from a unique mixed view from user and entity angles. MVIN learns item representations from both the user view and the entity view. From the user view, user-oriented modules score and aggregate features to make recommendations from a personalized perspective constructed according to KG entities which incorporates user click information. From the entity view, the mixing layer contrasts layer-wise GCN information to further obtain comprehensive features from internal entity-entity interactions in the KG. We evaluate MVIN on three real-world datasets: MovieLens-1M (ML-1M), LFM-1b 2015 (LFM-1b), and Amazon-Book (AZ-book). Results show that MVIN significantly outperforms state-of-the-art methods on these three datasets. In addition, from user-view cases, we find that MVIN indeed captures entities that attract users. Figures further illustrate that mixing layers in a heterogeneous KG plays a vital role in neighborhood information aggregation.

Biomedical Named Entity Recognition and Linking Datasets: Survey and Our Recent Development

Briefings in Bioinformatics, July 2020

Ming-Siang Huang, Po-Ting Lai, Pei-Yen Lin, Yu-Ting You, Richard Tzong-Han Tsai and Wen-Lian Hsu

Wen-Lian Hsu

Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein–protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein–protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Request Flow Coordination for Growing-Scale Solid-State Drives

IEEE Transactions on Computers (TC), June 2020

Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Chun-Feng Wu

Ming-Chang Yang Yuan-Hao Chang Chun-Feng Wu

With the emerge of high-density triple-level-cell (TLC) and 3D NAND flash, the access performance and endurance of flash devices are degraded due to the downscaling of flash cells. In addition, we observe that the mismatch between data lifetime requirement and flash block retention capability could further worsen the access performance and endurance. This is because the ¨lifetime-retention mismatch〃 could result in massive internal data migrations during garbage collection and data refreshing, and further aggravate the already-worsened access performance and endurance of high-density NAND flash devices. Such an observation motivates us to resolve the lifetime-retention mismatch problem by proposing a ¨time harmonization strategy〃, which coordinates the flash block retention capability with the data lifetime requirement to enhance the performance of flash devices with very limited endurance degradation. Specifically, this study aims to lower the amount of internal data migrations caused by garbage collection and data refreshing via storing data of different lifetime requirement in flash blocks with suitable retention capability. The trace-driven evaluation results reveal that the proposed design can effectively reduce the internal data migrations by about 33% on average with nearly no degradation on the overall endurance, as compared with the state-of-the-art designs.

Beyond Address Mapping: A User-Oriented Multi-Regional Space Management Design for 3D NAND Flash Memory

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), June 2020

Shuo-Han Chen, Che-Wei Tsao, and Yuan-Hao Chang

Shuo-Han Chen Che-Wei Tsao Yuan-Hao Chang

Due to the ever-growing demands of larger capacity of flash storage devices, various new manufacturing techniques have been proposed to provide high-density and large-capacity NAND flash devices. Among these new techniques, 3D NAND flash is regarded as one of the most promising candidates for the next-generation flash storage devices. 3D NAND flash brings high bit density and significant cost saving via stacking memory cells vertically. However, the read/write and erase units of 3D NAND flash also grows larger than those of traditional planner flash devices. This growing trend of read/write and erase units for 3D NAND flash imposes significant management difficulties, such as the grown size of mapping information, decreased garbage collection efficiency, and worsened write amplification issue. To alleviate these negative impacts of the growing read/write and erase units, this paper proposes a multi-regional space management design to achieve subpage-level management while adaptively adjusting mapping granularity by considering the user behaviors. The proposed design was evaluated by a series of experiments, and results show that the access performance can be improved by 64%.

Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall

IEEE Transactions on Computers (TC), May 2020

Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo

Chun-Feng Wu Yuan-Hao Chang Ming-Chang Yang

To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the ``great memory wall'' caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this paper, we present a joint management framework of host/CPU and NVDIMM to break down the great memory wall by bridging the process information gap between host/CPU and NVDIMM. In this framework, a page semantic-aware strategy is proposed to precisely predict, mark, and relocate data or memory pages to the fast memory in advance by exploiting the process access patterns, so that the frequency of the slow memory accesses can be further reduced. The proposed framework with the proposed strategy was evaluated with several well-known benchmarks and the results are encouraging.

YOLOv4: Optimal Speed and Accuracy of Object Detection

arXiv:2004.10934vl, April 2020

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao

Chien-Yao Wang Hong-Yuan Mark Liao

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Somefeaturesoperateoncertainmodelsexclusively andforcertainproblemsexclusively,oronlyforsmall-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN,DropBlockregularization,andCIoUloss,andcombinesomeofthemtoachievestate-of-the-artresults: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ∼65 FPS on Tesla V100. Source code is at https://github.com/AlexeyAB/darknet.

MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides for proteogenomic analysis

Journal of Proteomics, July 2020

Wai-Kok Choong, Jen-Hung Wang, Ting-Yi Sung

Wai-Kok Choong Jen-Hung Wang Ting-Yi Sung

Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments.

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on ``Low power computer vision'', June 2020

C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh

Chien-Yao Wang Hong-Yuan Mark Liao

Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or evensuperioraccuracyontheImageNetdataset,andsignificantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet. 

Distributed Compressive Sensing: Performance Analysis with Diverse Signal Ensembles

IEEE Trans. on Signal Processing, June 2020

Sung-Hsien Hsieh, Wei-Jie Liang, Chun-Shien Lu, and Soo-Chang Pei

Chun-Shien Lu

Distributed compressive sensing (DCS) is a framework that considers joint sparsity within signal ensembles along with multiple measurement vectors (MMVs).
However, current theoretical bounds of the probability of perfect recovery for MMVs are derived to be essentially identical to that of a single MV (SMV); this is because characteristics of the signal ensemble are ignored.
In this paper, we introduce two key ingredients, called ``Euclidean distances between signals'' and ``decay rate of signal ensemble,'' to conduct a performance analysis of a deterministic signal model under the MMVs framework.
We show that, by taking the size of signal ensembles into consideration, MMVs indeed exhibit better performance than SMV.
Although our extension can be broadly applied to CS algorithms with MMVs, a case study conducted on a greedy solver, which is commonly known as simultaneous orthogonal matching pursuit (SOMP), will be explored in this paper.
When incorporated with our concept by modifying the steps of support detection and signal estimation, we show that the performance of SOMP will be improved to a meaningful extent, especially for short Euclidean distances between signals.
Performance of the modified SOMP is verified to meet our theoretical prediction.
Moreover, we design a new method based on modified SOMP algorithms for a key application known as cooperative spectrum sensing (CSS).
The simulation results demonstrate that our method can benefit from more than one measurement vector, especially when the length of the measurement vectors is smaller than the sparsity of the signals, which is where traditional CS algorithms fail.

Difference-Seeking Generative Adversarial Network--Unseen Sample Generation

International Conference on Learning Representations (ICLR), April 2020

Yi-Lin Sung, Sung-Hsien Hsieh, Soo-Chang Pei, and Chun-Shien Lu

Chun-Shien Lu

Unseen data, which are not samples from the distribution of training data and are difficult to collect, have exhibited importance in numerous applications, ({\em e.g.,} novelty detection, semi-supervised learning, and adversarial training). In this paper, we introduce a general framework called \textbf{d}ifference-\textbf{s}eeking \textbf{g}enerative \textbf{a}dversarial \textbf{n}etwork (DSGAN), to generate various types of unseen data. Its novelty is the consideration of the probability density of the unseen data distribution as the difference between two distributions $p_{\bar{d}}$ and $p_{d}$ whose samples are relatively easy to collect.
The DSGAN can learn the target distribution, $p_{t}$, (or the unseen data distribution) from only the samples from the two distributions, $p_{d}$ and $p_{\bar{d}}$. In our scenario, $p_d$ is the distribution of the seen data, and $p_{\bar{d}}$ can be obtained from $p_{d}$ via simple operations, so that we only need the samples of $p_{d}$ during the training.
Two key applications, semi-supervised learning and novelty detection, are taken as case studies to illustrate that the DSGAN enables the production of various unseen data. We also provide theoretical analyses about the convergence of the DSGAN.

GSAlign: an efficient sequence alignment tool for intra-species genomes

BMC Genomics, February 2020

Hsin-Nan Lin ,Wen-Lian Hsu

Hsin-Nan Lin Wen-Lian Hsu

Background: Personal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants. Results: In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy. Conclusions: As more genome sequences become available, the demand for genome comparison is increasing. Therefore an efficient and robust algorithm is most desirable. We believe GSAlign can be a useful tool. It exhibits the abilities of ultra-fast alignment as well as high accuracy and sensitivity for detecting sequence variations.

Inferring the transcriptional regulatory mechanism of signal-dependent gene expression via an integrative computational approach

FEBS Letters, February 2020

Chiang, S., Shinohara, H., Huang, J.H., Tsai, H. K., and Okada, M.

Huai-Kuang Tsai

Eukaryotic transcription factors (TFs) coordinate different upstream signals to regulate the target genes. To unveil this network regulation in B cell receptor signaling, we developed a computational pipeline to systematically analyze ERK- and IKK-dependent transcriptome response. We combined a linear regression method and a kinetic modeling to identify the signal-to-TF and TF-to-gene dynamics, respectively, from the time-course experimental data. We show that the combination of TFs differentially controlled by ERK and IKK could contribute divergent expression dynamics in orchestrating the B cell response. Our finding elucidates the regulatory mechanism of the signal-dependent gene expression responsible for eukaryotic cell development.