Institute of Information Science
Recent Research Results
Current Research Results
Authors: Andreas Heinecke, Jinn Ho, Wen-Liang Hwang

Wen-LiangHwang JinnHo AndreasHeinecke Abstract:
We construct a highly regular and simple structured class of sparsely connected convolutional neural networks with rectifier activations that provide universal function approximation in a coarse-to-fine manner with increasing number of layers. The networks are localized in the sense that local changes in the function to be approximated only require local changes in the final layer of weights. At the core of the construction lies the fact that the characteristic function can be derived from a convolution of characteristic functions at the next coarser resolution via a rectifier passing. The latter refinement result holds for all higher order univariate B-splines.
Current Research Results
Authors: Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Hsiao-Rong Tyan, Hsin-Min Wang, and Hong-Yuan Mark Liao

MarkLiao Hsin-MinWang Tyng-LuhLiu Jen-ChunLin Abstract:
An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multiresolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MFRNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience.
"MVIN: Learning multiview items for recommendation," the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), July 2020.
Authors: Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu and Lun-Wei Ku

Lun-WeiKu Yun-WeiChu Meng-RuWu Chang-YouTai Abstract:
Researchers have begun to utilize heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item’s properties from multiple perspectives. To address these issues, we propose the multi-view item network (MVIN), a GNN-based recommendation model which provides superior recommendations by describing items from a unique mixed view from user and entity angles. MVIN learns item representations from both the user view and the entity view. From the user view, user-oriented modules score and aggregate features to make recommendations from a personalized perspective constructed according to KG entities which incorporates user click information. From the entity view, the mixing layer contrasts layer-wise GCN information to further obtain comprehensive features from internal entity-entity interactions in the KG. We evaluate MVIN on three real-world datasets: MovieLens-1M (ML-1M), LFM-1b 2015 (LFM-1b), and Amazon-Book (AZ-book). Results show that MVIN significantly outperforms state-of-the-art methods on these three datasets. In addition, from user-view cases, we find that MVIN indeed captures entities that attract users. Figures further illustrate that mixing layers in a heterogeneous KG plays a vital role in neighborhood information aggregation.
Current Research Results
Authors: Ming-Siang Huang, Po-Ting Lai, Pei-Yen Lin, Yu-Ting You, Richard Tzong-Han Tsai and Wen-Lian Hsu

Wen-LianHsu Abstract:
Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein–protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein–protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Current Research Results
"Request Flow Coordination for Growing-Scale Solid-State Drives," IEEE Transactions on Computers (TC), June 2020.
Authors: Ming-Chang Yang, Yuan-Hao Chang, Tei-Wei Kuo, and Chun-Feng Wu

Yuan-HaoChang Ming-ChangYang Abstract:
With the emerge of high-density triple-level-cell (TLC) and 3D NAND flash, the access performance and endurance of flash devices are degraded due to the downscaling of flash cells. In addition, we observe that the mismatch between data lifetime requirement and flash block retention capability could further worsen the access performance and endurance. This is because the ¨lifetime-retention mismatch〃 could result in massive internal data migrations during garbage collection and data refreshing, and further aggravate the already-worsened access performance and endurance of high-density NAND flash devices. Such an observation motivates us to resolve the lifetime-retention mismatch problem by proposing a ¨time harmonization strategy〃, which coordinates the flash block retention capability with the data lifetime requirement to enhance the performance of flash devices with very limited endurance degradation. Specifically, this study aims to lower the amount of internal data migrations caused by garbage collection and data refreshing via storing data of different lifetime requirement in flash blocks with suitable retention capability. The trace-driven evaluation results reveal that the proposed design can effectively reduce the internal data migrations by about 33% on average with nearly no degradation on the overall endurance, as compared with the state-of-the-art designs.
Current Research Results
"Beyond Address Mapping: A User-Oriented Multi-Regional Space Management Design for 3D NAND Flash Memory," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), June 2020.
Authors: Shuo-Han Chen, Che-Wei Tsao, and Yuan-Hao Chang

Yuan-HaoChang Che-WeiTsao Abstract:
Due to the ever-growing demands of larger capacity of flash storage devices, various new manufacturing techniques have been proposed to provide high-density and large-capacity NAND flash devices. Among these new techniques, 3D NAND flash is regarded as one of the most promising candidates for the next-generation flash storage devices. 3D NAND flash brings high bit density and significant cost saving via stacking memory cells vertically. However, the read/write and erase units of 3D NAND flash also grows larger than those of traditional planner flash devices. This growing trend of read/write and erase units for 3D NAND flash imposes significant management difficulties, such as the grown size of mapping information, decreased garbage collection efficiency, and worsened write amplification issue. To alleviate these negative impacts of the growing read/write and erase units, this paper proposes a multi-regional space management design to achieve subpage-level management while adaptively adjusting mapping granularity by considering the user behaviors. The proposed design was evaluated by a series of experiments, and results show that the access performance can be improved by 64%.
Current Research Results
"Joint Management of CPU and NVDIMM for Breaking Down the Great Memory Wall," IEEE Transactions on Computers (TC), May 2020.
Authors: Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo

Yuan-HaoChang Chun-Feng Wu Abstract:
To provide larger memory space with lower costs, NVDIMM is a production-ready device. However, directly placing NVDIMM as the main memory would seriously degrade the system performance because of the ``great memory wall'' caused by the fact that in NVDIMM, the slow memory (e.g., flash memory) is several orders of magnitude slower than the fast memory (e.g., DRAM). In this paper, we present a joint management framework of host/CPU and NVDIMM to break down the great memory wall by bridging the process information gap between host/CPU and NVDIMM. In this framework, a page semantic-aware strategy is proposed to precisely predict, mark, and relocate data or memory pages to the fast memory in advance by exploiting the process access patterns, so that the frequency of the slow memory accesses can be further reduced. The proposed framework with the proposed strategy was evaluated with several well-known benchmarks and the results are encouraging.
"YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv:2004.10934vl, April 2020.
Authors: Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao

MarkLiao MarkLiao Abstract:
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Somefeaturesoperateoncertainmodelsexclusively andforcertainproblemsexclusively,oronlyforsmall-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN,DropBlockregularization,andCIoUloss,andcombinesomeofthemtoachievestate-of-the-artresults: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ∼65 FPS on Tesla V100. Source code is at https://github.com/AlexeyAB/darknet.
Current Research Results
Authors: Wai-Kok Choong, Jen-Hung Wang, Ting-Yi Sung

Ting-YiSungJen-HungWangWai-kokChoong Abstract:
Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments.
"CSPNet: A New Backbone that can Enhance Learning Capability of CNN," IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on ``Low power computer vision'', June 2020.
Authors: C. Y. Wang, H. Y. Mark Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh

MarkLiaoChien YaoWang Abstract:
Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or evensuperioraccuracyontheImageNetdataset,andsignificantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet. 
Current Research Results
Authors: Sung-Hsien Hsieh, Wei-Jie Liang, Chun-Shien Lu, and Soo-Chang Pei

Chun-ShienLu Abstract:
Distributed compressive sensing (DCS) is a framework that considers joint sparsity within signal ensembles along with multiple measurement vectors (MMVs).
However, current theoretical bounds of the probability of perfect recovery for MMVs are derived to be essentially identical to that of a single MV (SMV); this is because characteristics of the signal ensemble are ignored.
In this paper, we introduce two key ingredients, called ``Euclidean distances between signals'' and ``decay rate of signal ensemble,'' to conduct a performance analysis of a deterministic signal model under the MMVs framework.
We show that, by taking the size of signal ensembles into consideration, MMVs indeed exhibit better performance than SMV.
Although our extension can be broadly applied to CS algorithms with MMVs, a case study conducted on a greedy solver, which is commonly known as simultaneous orthogonal matching pursuit (SOMP), will be explored in this paper.
When incorporated with our concept by modifying the steps of support detection and signal estimation, we show that the performance of SOMP will be improved to a meaningful extent, especially for short Euclidean distances between signals.
Performance of the modified SOMP is verified to meet our theoretical prediction.
Moreover, we design a new method based on modified SOMP algorithms for a key application known as cooperative spectrum sensing (CSS).
The simulation results demonstrate that our method can benefit from more than one measurement vector, especially when the length of the measurement vectors is smaller than the sparsity of the signals, which is where traditional CS algorithms fail.
"Difference-Seeking Generative Adversarial Network--Unseen Sample Generation," International Conference on Learning Representations (ICLR), April 2020.
Authors: Yi-Lin Sung, Sung-Hsien Hsieh, Soo-Chang Pei, and Chun-Shien Lu

Chun-ShienLu Abstract:
Unseen data, which are not samples from the distribution of training data and are difficult to collect, have exhibited importance in numerous applications, ({\em e.g.,} novelty detection, semi-supervised learning, and adversarial training). In this paper, we introduce a general framework called \textbf{d}ifference-\textbf{s}eeking \textbf{g}enerative \textbf{a}dversarial \textbf{n}etwork (DSGAN), to generate various types of unseen data. Its novelty is the consideration of the probability density of the unseen data distribution as the difference between two distributions $p_{\bar{d}}$ and $p_{d}$ whose samples are relatively easy to collect.
The DSGAN can learn the target distribution, $p_{t}$, (or the unseen data distribution) from only the samples from the two distributions, $p_{d}$ and $p_{\bar{d}}$. In our scenario, $p_d$ is the distribution of the seen data, and $p_{\bar{d}}$ can be obtained from $p_{d}$ via simple operations, so that we only need the samples of $p_{d}$ during the training.
Two key applications, semi-supervised learning and novelty detection, are taken as case studies to illustrate that the DSGAN enables the production of various unseen data. We also provide theoretical analyses about the convergence of the DSGAN.
Current Research Results
Authors: Hsin-Nan Lin ,Wen-Lian Hsu

Wen-LianHsuHsin-NanLin Abstract:
Background: Personal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants. Results: In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy. Conclusions: As more genome sequences become available, the demand for genome comparison is increasing. Therefore an efficient and robust algorithm is most desirable. We believe GSAlign can be a useful tool. It exhibits the abilities of ultra-fast alignment as well as high accuracy and sensitivity for detecting sequence variations.
Current Research Results
Authors: Chiang S., Shinohara H., Huang J.H., Tsai H. K., and Okada M.

Huai-KuangTsai Abstract:
Eukaryotic transcription factors (TFs) coordinate different upstream signals to regulate the target genes. To unveil this network regulation in B cell receptor signaling, we developed a computational pipeline to systematically analyze ERK- and IKK-dependent transcriptome response. We combined a linear regression method and a kinetic modeling to identify the signal-to-TF and TF-to-gene dynamics, respectively, from the time-course experimental data. We show that the combination of TFs differentially controlled by ERK and IKK could contribute divergent expression dynamics in orchestrating the B cell response. Our finding elucidates the regulatory mechanism of the signal-dependent gene expression responsible for eukaryotic cell development.
Current Research Results
"Hardware-Assisted MMU Redirection for In-guest Monitoring and API Profiling," IEEE Transactions on Information Forensics & Security, To Appear.
Authors: Shun-Wen Hsiao, Yeali Sun, Meng Chang Chen

Meng ChangChen Abstract:
With the advance of hardware, network, and virtualization technologies, cloud computing has prevailed and become the target of security threats such as the cross virtual machine (VM) side channel attack, with which malicious users exploit vulnerabilities to gain information or access to other guest virtual machines. Among the many virtualization technologies, the hypervisor manages the shared resource pool to ensure that the guest VMs can be properly served and isolated from each other. However, while managing the shared hardware resources, due to the presence of the virtualization layer and different CPU modes (root and non-root mode), when a CPU is switched to non-root mode and is occupied by a guest machine, a hypervisor cannot intervene with a guest at runtime. Thus, the execution status of a guest is like a black box to a hypervisor, and the hypervisor cannot mediate possible malicious behavior at runtime. To rectify this, we propose a hardware-assisted VMI (virtual machine introspection) based in-guest process monitoring mechanism which supports monitoring and management applications such as process profiling. The mechanism allows hooks placed within a target process (which the security expert selects to monitor and profile) of a guest virtual machine and handles hook invocations via the hypervisor. In order to facilitate the needed monitoring and/or management operations in the guest machine, the mechanism redirects access to in-guest memory space to a controlled, self-defined memory within the hypervisor by modifying the extended page table (EPT) to minimize guest and host machine switches. The advantages of the proposed mechanism include transparency, high performance, and comprehensive semantics. To demonstrate the capability of the proposed mechanism, we develop an API profiling system (APIf) to record the API invocations of the target process. The experimental results show an average performance degradation of about 2.32%, far better than existing similar systems.
"Declarative pearl: deriving monadic Quicksort," Functional and Logic Programming (FLOPS 2020), 2020.
Authors: Shin-Cheng Mu and Tsung-Ju Chiang

Shin-ChengMu Abstract:
To demonstrate derivation of monadic programs, we present a specification of sorting using the non-determinism monad, and derive pure quicksort on lists and state-monadic quicksort on arrays. In the derivation one may switch between point-free and pointwise styles, and deploy techniques familiar to functional programmers such as pattern matching and induction on structures or on sizes. Derivation of stateful programs resembles reasoning backwards from the postcondition.
"Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation," the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), February 2020.
Authors: YunZhu Song, Hong-Han Shuai, Sung-Lin Yeh, Yi-Lun Wu, Lun-Wei Ku, Wen-Chih Peng

Lun-WeiKu Image Image Abstract:
With the rapid proliferation of online media sources and pub-lished news, headlines have become increasingly importantfor  attracting  readers  to  news  articles,  since  users  may  beoverwhelmed with the massive information. In this paper, wegenerate inspired headlines that preserve the nature of newsarticles and catch the eye of the reader simultaneously. Thetask of inspired headline generation can be viewed as a specific form of Headline Generation (HG) task, with the em-phasis on creating an attractive headline from a given newsarticle. To generate inspired  headlines,  we propose a novelframework  called  POpularity-Reinforced  Learning  for  in-spired Headline Generation (PORL-HG). PORL-HG exploitsthe extractive-abstractive architecture with 1) Popular TopicAttention (PTA) for guiding the extractor to select the attrac-tive sentence from the article  and 2) a popularity predictorfor guiding the abstractor to rewrite the attractive sentence.Moreover, since the sentence selection of the extractor is notdifferentiable, techniques of reinforcement learning (RL) areutilized to bridge the gap with rewards obtained from a pop-ularity score predictor. Through quantitative and qualitativeexperiments,  we  show  that  the  proposed  PORL-HG  signif-icantly  outperforms  the  state-of-the-art  headline  generationmodels in terms of attractiveness evaluated by both human(71.03%) and the predictor (at least 27.60%), while the faith-fulness of PORL-HG is also comparable to the state-of-the-art generation model.
"Knowledge-Enriched Visual Storytelling," the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), February 2020.
Authors: Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih-Chia Li, Tzu-Yuan Lin, Ting-Hao Huang, Lun-Wei Ku

Lun-WeiKu Abstract:
Stories  are  diverse  and  highly  personalized,  resulting  in  alarge possible output space for story generation. Existing end-to-end approaches produce monotonous stories because theyare limited to the vocabulary and knowledge in a single training dataset. This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to produce interesting  stories.  KG-Story  distills  a  set  of  representative  wordsfrom the input prompts, enriches the word set by using external knowledge graphs, and finally generates stories basedon the enriched word set. This distill-enrich-generate framework  allows  the  use  of  external  resources  not  only  for  the enrichment  phase,  but  also  for  the  distillation  and  generation  phases.  In  this  paper,  we  show  the  superiority  of  KG-Story for visual storytelling, where the input prompt is a sequence  of  five  photos  and  the  output  is  a  short  story.  Perthe human ranking evaluation, stories generated by KG-Storyare  on  average  ranked  better  than  that  of  the  state-of-the-art  systems.  Our  code  and  output  stories  are  available  athttps://github.com/zychen423/KE-VIST.
Current Research Results
"A Partial Page Cache Strategy for NVRAM-Based Storage Devices," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), February 2020.
Authors: Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih

Yuan-HaoChang Abstract:
Non-volatile random access memory (NVRAM) is becoming a popular alternative as the memory and storage medium in battery-powered embedded systems because of its fast read/write performance, byte-addressability, and non-volatility. A well-known example is phase-change memory (PCM) that has much longer life expectancy and faster access performance than NAND flash. When NVRAM is considered as both main memory and storage in battery-powered embedded systems, existing page cache mechanisms have too many unnecessary data movements between main memory and storage. To tackle this issue, we propose the concept of 'union page cache,' to jointly manage data of the page cache in both main memory and storage. To realize this concept, we design a partial page cache strategy that considers both main memory and storage as its management space. This strategy can eliminate unnecessary data movements between main memory and storage without sacrificing the data \textbf{integrity} of file systems. A series of experiments was conducted on an embedded platform. The results show that the proposed strategy can improve the file accessing performance up to 85.62% when PCM used as a case study.
Current Research Results
"Un-rectifying Non-linear Networks for Signal Representation," IEEE Transactions on Signal Processing, December 2019.
Authors: Wen-Liang Hwang, Andreas Heinecke

Wen-LiangHwang Abstract:
We consider deep neural networks with rectifier activations and max-pooling from a signal representation per- spective. In this view, such representations mark the transition from using a single linear representation for all signals to utilizing a large collection of affine linear representations that are tailored to particular regions of the signal space. We propose a novel technique to “un-rectify” the nonlinear activations into data-dependent linear equations and constraints, from which we derive explicit expressions for the affine linear operators, their domains and ranges in terms of the network parameters. We show how increasing the depth of the network refines the domain partitioning and derive atomic decompositions for the corresponding affine mappings that process data belonging to the same partitioning region. In each atomic decomposition the connections over all hidden network layers are summarized and interpreted in a single matrix. We apply the decompositions to study the Lipschitz regularity of the networks and give sufficient conditions for network-depth-independent stability of the representation, drawing a connection to compressible weight distributions. Such analyses may facilitate and promote further theoretical insight and exchange from both the signal processing and machine learning communities.

More

Academia Sinica Institue of Information Science Academia Sinica