Accepted Papers

  • Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing
  • Incremental Deep Hidden Attribute Learning
  • Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector
  • Visual Domain Adaptation with Manifold Embedded Distribution Alignment
  • Object-Difference Attention: A simple relational attention for Visual Question Answering
  • Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes
  • Multi-Human Parsing Machines
  • Deep Priority Hashing
  • CropNet: Real-Time Thumbnailing
  • Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search
  • Supervised Online Hashing via Hadamard Codebook Learning
  • Shared Linear Encoder-based Gaussian Process Latent Variable Model for Visual Classification
  • Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval
  • Fine-grained Grocery Product Recognition by One-shot Learning
  • Fine-grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
  • Style Separation and Synthesis via Generative Adversarial Networks
  • Attention-based Pyramid Aggregation Network for Visual Place Recognition
  • Dance with Melody : An LSTM-autoencoder Approach on Music-oriented Dance Synthesis
  • Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
  • Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data
  • Post Tuned Hashing: A New Approach to Indexing High-dimensional Data
  • Joint Sign Language Recognition and Education System with ST-Net
  • Aesthetic-Driven Image Enhancement by Adversarial Learning
  • Cascaded Feature Augmentation with Diffusion for Image Retrieval
  • Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM
  • Temporal Sequence Distillation: Towards Few Frame Action Recognition
  • Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval
  • Multi-View Image Generation from a Single-View
  • Slackliner — An Interactive Slackline Training Assistant
  • Hierarchical Memory Modelling for Video Captioning
  • Group Re-Identification: Leveraging and Integrating Multi-Grain Information
  • Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions
  • Multi-modal Preference Modeling for Product Search
  • GraphNet: Learning Image Pseudo Annotations for Weakly-Supervised Semantic Segmentation
  • Deep Triplet Quantization
  • Previewer for Multiple-Scale Object Detector
  • QARC: Video Quality Aware Rate Control for Real-Time Video Streaming based on Deep Reinforcement Learning
  • What dress fits me best? Fashion Recommendation on the Clothing Style for Personal Body Shape
  • SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval
  • OSMO: Online Specific Models for Occlusion in Multiple Object Tracking under Surveillance Scene
  • Cross-modal Moment Localization in Videos
  • Attribute-Aware Attention Model for Fine-grained Representation Learning
  • Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency
  • Learning Discriminative Features with Multiple Granularities for Person Re-Identification
  • StripNet: Towards Topology Consistent Strip Structure Segmentation
  • Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment
  • An End-to-End Quadrilateral Regression Network for Comic Panel Extraction
  • CLS: A Cross-user Learning based System for Improving QoE in 360-degree Video Adaptive Streaming
  • Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training
  • Life-long Cross-media Correlation Learning
  • Text-to-image Synthesis via Symmetrical Distillation Networks
  • Multi-Scale Correlation for Sequential Cross-modal Hashing Learning
  • Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking
  • Feature Constrained by Pixel: Hierarchical Adversarial Deep Domain Adaptation
  • Explore Multi-Step Reasoning in Video Question Answering
  • Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network
  • Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond
  • Watch, Think and Attend: End-to-End Video Classification via Dynamic Knowledge Evolution Modeling
  • Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
  • Fast and Light Manifold CNN based 3D Facial Expression Recognition across Pose Variations
  • Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation
  • Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset
  • Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling
  • Participation-Contributed Temporal Dynamic Model for Group Activity Recognition
  • A Unified Generative Adversarial Framework for Image Generation and Person Re-identification
  • Facial Expression Recognition in the Wild: A Cycle-Consistent Adversarial Attention Transfer Approach
  • Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs
  • Mining Semantics-Preserving Attention for Group Activity Recognition
  • Causally Regularized Learning on Data with Agnostic Bias
  • I read, I saw, I tell: Texts Assisted Fine-Grained Visual Classification
  • Context-Aware Unsupervised Text Stylization
  • Bridge The Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Database and A Deep Learning Model
  • When to Learn What: Deep Cognitive Subspace Clustering
  • Look Deeper See Richer: Depth-aware Image Paragraph Captioning
  • Depth Structure Preserving Scene Image Generation
  • CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification
  • Learning Multimodal Taxonomy via Variational Deep Graph Embedding and Clustering
  • Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
  • GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning
  • A Distributed Approach for Bitrate Selection in HTTP Adaptive Streaming
  • Generative Adversarial Product Quantisation
  • EmotionGAN: Unsupervised Domain Adaptation for Learning Discrete Probability Distributions of Image Emotions
  • Few-Shot Adaptation for Video Semantic Indexing
  • Historical Context-based Style Classification of Painting Images via Label Distribution Learning
  • Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation
  • High-Quality Exposure Correction of Underexposed Photos
  • Fashion Sensitive Clothing Recommendation using Hierarchical Collocation Model
  • A Margin-based MLE for Crowdsourced Partial Ranking
  • Personalized Serious Games for Cognitive Intervention with Lifelog Visual Analytics
  • PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
  • iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera
  • Face-Voice Matching using Cross-modal Embeddings
  • Multi-Scale Context Attention Network for Image Retrieval
  • When Deep Fool Meets Deep Prior: Adversarial Attack on Image Super-Resolution
  • Musicality-Novelty Generative Adversarial Nets for Algorithmic Composition
  • Knowledge-aware Multimodal Dialogue Systems
  • Cross-Domain Adversarial Feature Learning for Sketch Re-identification
  • Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval
  • Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning
  • CSAN: Contextual Self-Attention Network for User Sequential Recommendation
  • Semantic Human Matting
  • Visual Spatial Attention Network for Relationship Detection
  • Geometry Guided Adversarial Facial Expression Synthesis
  • Personalized multiple facial action unit recognition through generative adversarial recognition network
  • Learning Joint Multimodal Representation with Adversarial Attention Networks
  • Detecting Abnormality without Knowing Normality: A Two-stage Approach for Unsupervised Video Abnormal Event Detection
  • WildFish: A Large Benchmark for Fish Recognition in the Wild
  • Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction
  • BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network
  • Songle Sync: A Large-Scale Web-based Platform for Controlling Various Devices in Synchronization with Music
  • CloudVR: Cloud Accelerated Interactive Mobile Virtual Reality
  • RGCNN: Regularized Graph CNN for Point Cloud Segmentation
  • Video-based Person Re-identification via Self-Paced Learning and Deep Reinforcement Learning Framework
  • Photo Squarization by Deep Multi-Operator Retargeting
  • Predicting Visual Context for Unsupervised Event Segmentation in Continuous Photo-streams
  • Semantic Image Inpainting with Progressive Generative Networks
  • Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia
  • Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval
  • LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation
  • Direction-aware Neural Style Transfer
  • Reconfigurable Inverted Index
  • Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization
  • Context-Aware Visual Policy Network for Sequence-Level Image Captioning
  • A Unified Framework for Multimodal Domain Adaptation
  • Trusted Guidance Pyramid Network for Human Parsing
  • USAR: an interactive user-specific aesthetic ranking framework for images
  • Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining
  • Structure Guided Photorealistic Style Transfer
  • Tracking-assisted Weakly Supervised Online Visual Object Segmentation in Unconstrained Videos
  • An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks
  • Decoupled Novel Object Captioner
  • ThoughtViz: Visualizing Human Thoughts Using Generative Adversarial Network
  • Optimizing Personalized Interaction Experience in Crowd-Interactive Livecast: A Cloud-Edge Approach
  • End-to-End Blind Quality Assessment of Compressed Video Using Deep Neural Networks
  • Dynamic Sound Field Synthesis for Speech and Music Optimization
  • Local Convolutional Neural Networks for Person Re-Identification
  • Interpretable Multimodal Retrieval for Fashion Products
  • Conditional Expression Synthesis with Face Parsing Transformation
  • A Feature-Adaptive Semi-Supervised Framework for Co-Saliency Detection
  • Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification
  • iSPA-Net: Iterative Semantic Pose Alignment Network
  • Extractive Video Summarizer with Memory Augmented Neural Networks
  • ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
  • Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images
  • From data to knowledge: deep learning model compression, transmission and communication
  • ChipGAN: A Generative Adversarial Network for Chinese Ink Wash Painting Style Transfer
  • Dest-ResNet: a Deep Spatiotemporal Residual Network for Hotspot Traffic Speed Prediction
  • Boosting Scene Parsing Performance via Reliable Scale Prediction
  • Deep Cross modal learning for Caricature Verification and Identification (CaVINet)
  • Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern
  • Adaptive Temporal Encoding Network for Video Instance-level Human Parsing
  • User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks
  • Enhancing Visual Question Answering Using Dropout
  • Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology
  • Improving QoE of ABR Streaming Sessions through QUIC Retransmissions
  • Temporal Cross-Media SubSpaces Learning with Soft-Constraints
  • Learning Local Descriptors with Adversarial Enhancer from Volumetric Geometry Patches
  • SibNet: Sibling Convolutional Encoder for Video Captioning
  • Context-Dependent Diffusion Network for Visual Relationship Detection
  • Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction
  • Generating Defensive Plays in Basketball Games
  • Connectionist temporal fusion for Sign Language Translation
  • JPEG Decompression in the Homomorphic Encryption Domain
  • BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs
  • Support Neighbor Loss for Person Re-Identification
  • A Large Scale RGB-D Database for Arbitrary-view Human Action Recognition
  • FlexStream: Towards Flexible Adaptive Video Streaming on End Devices using Extreme SDN
  • Spotting and Aggregating Salient Regions for Video Captioning
  • Structural inpainting
  • Partial Multi-View Subspace Clustering
  • FoV-Aware Edge Caching for Adaptive 360° Video Streaming
  • Attentive LSTM Crowd Flow Machines
  • Perceptual Temporal Incoherence Aware Stereo Video Retargeting
  • Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels
  • Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval
  • Investigation of Small Group Social Interactions using Deep Visual Activity-Based Nonverbal Features
  • Dissimilarity Representation Learning for Generalized Zero-Shot Recognition
  • Examine before You Answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA
  • Cumulative Nets for Edge Detection
  • Beyond the Product: Discovering Image Posts for Brands in Social Media
  • Robustness and Discrimination Oriented Hashing Combining Texture and Invariant Vector Distance
  • SLIONS: A Karaoke Application to Enhance Foreign Language Learning
  • Drawing in a Virtual 3D Space – Introducing VR Drawing in Elementary School Art Education
  • Semi-Supervised DFF: Decoupling Detection and Feature Flow for Video Object Detectors
  • Residual-Guide Feature Fusion Network for Single Image Deraining
  • Paragraph generation network with visual relationship detection
  • Hybrid Point Cloud Attribute Compression Using Slice-based Layered Structure and Block-based Intra Prediction
  • CIRCE: Real-Time Caching for Instance Recognition on Cloud Environments and Multi-Core Architectures
  • From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition
  • Unsupervised Learning of 3D Model Reconstruction from Hand-Drawn Sketches
  • Learning to Synthesize 3D Indoor Scenes from Monocular Images
  • DASH for 3D Networked Virtual Environment
  • PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition
  • The Effect of Foveation on High Dynamic Range Video Perception
  • GestureGAN for Hand Gesture-to-Gesture Translation in the Wild
  • MiniView Layout for Bandwidth-Efficient 360-Degree Video
  • An Efficient Deep Quantized Compressed Sensing Coding Framework of Natural Images
  • Deep Multimodal Image-Repurposing Detection
  • Video-to-Video Translation with Global Temporal Consistency
  • Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals
  • Cross-Species Learning: A Low-Cost Approach to Learning Human Fight from Animal Fight
  • PoB: Toward Reasoning Patterns of Beauty in Image Data
  • Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
  • Deep Adaptive Temporal Pooling for Activity Recognition
  • Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder
  • Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning
  • Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation
  • Person Re-identification with Hierarchical Deep Learning Feature and efficient XQDA Metric
  • EmoCeleb: Emotion recognition in speech using Cross-Modal Transfer in the wild