Full Papers

Tuesday 23 October – Conference Day 2


15:30 – 16:30 Crystal Ballroom1

Oral Session: Deep – 1

  • Structure Guided Photorealistic Style Transfer Yuheng Zhi (Shanghai Jiao Tong University), Huawei Wei (Shanghai Jiao Tong University), Bingbing Ni (Shanghai Jiao Tong University)
  • Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation Xuewen Yang (Stony Brook University), Dongliang Xie (Beijing University of Posts and Telecommunications), Xin Wang (Stony Brook University)
  • Multi-View Image Generation from a Single-View Bo Zhao (Southwest Jiaotong University and National University of Singapore), Xiao Wu (Southwest Jiaotong University), Zhi-Qi Cheng (Southwest Jiaotong University), Hao Liu (Tencent YouTu Lab), Zequn Jie (Tencent AI Lab), Jiashi Feng (National University of Singapore)
  • Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation Jichao Zhang (Shandong University), Yezhi Shu (Shandong University), Songhua Xu (Xi’an Jiaotong University), Gongze Cao (Zhejiang University), Fan Zhong (Shandong university), Meng Liu (Shandong University), Xueying Qin (Shandong university)


15:30 – 16:30 Crystal Ballroom2

Oral Session: Vision – 1

  • Visual Domain Adaptation with Manifold Embedded Distribution Alignment Jindong Wang (Chinese Academy of Sciences), Wenjie Feng (Chinese Academy of Sciences), Yiqiang Chen (Chinese Academy of Sciences), Han Yu (Nanyang Technological University), Meiyu Huang (China Academy of Space Technology), Philip S. Yu (University of Illinois at Chicago)
  • Causally Regularized Learning with Agnostic Data Selection Bias Zheyan Shen (Tsinghua University), Peng Cui (Tsinghua University), Kun Kuang (Tsinghua University), Bo Li (Tsinghua University), Peixuan Chen (Tencent)
  • Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals Yanjie Liang (Xiamen University), Qiangqiang Wu (Xiamen University), Yi Liu (Xiamen University), Yan Yan (Xiamen University), Hanzi Wang (Xiamen University)
  • A Unified Framework for Multimodal Domain Adaptation Fan Qi (Hefei University of Technology), Xiaoshan Yang (Chinese Academy of Sciences and University of Chinese Academy of Sciences), Changsheng Xu (Hefei University of Technology, Chinese Academy of Sciences, and University of Chinese Academy of Sciences)


15:30 – 16:30 Crystal Ballroom3

Oral Session: Multimedia – 1

  • What dress fits me best? Fashion Recommendation on the Clothing Style for Personal Body Shape Shintami Chusnul Hidayati (Academia Sinica), Cheng-Chun Hsu (National Taiwan University of Science and Technology), Yu-Ting Chang (Academia Sinica), Kai-Lung Hua (National Taiwan University of Science and Technology), Jianlong Fu (Microsoft Research), Wen-Huang Cheng (National Chiao Tung University)
  • CSAN: Contextual Self-Attention Network for User Sequential Recommendation Xiaowen Huang (Chinese Academy of Sciences and University of Chinese Academy of Sciences), Shengsheng Qian (Chinese Academy of Sciences), Quan Fang (Chinese Academy of Sciences), Jitao Sang (Beijing Jiaotong University and Nanjing University), Changsheng Xu (Chinese Academy of Sciences and University of Chinese Academy of Sciences)
  • Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia Jun Hu (Hefei University of Technology), Shengsheng Qian (Chinese Academy of Sciences), Quan Fang (Chinese Academy of Sciences), Changsheng Xu (Hefei University of Technology, Chinese Academy of Sciences, and University of Chinese Academy of Sciences)
  • Beyond the Product: Discovering Image Posts for Brands in Social Media Francesco Gelli (National University of Singapore), Tiberio Uricchio (Università degli Studi di Firenze), Xiangnan He (National University of Singapore), Alberto Del Bimbo (Università degli Studi di Firenze), Tat-Seng Chua (National University of Singapore)


17:00 – 18:00 Crystal Ballroom1

Oral Session: Vision – 2

  • Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions Lishi Zhang (Beihang University), Chenghan Fu (Beihang University), Jia Li (Beihang University)
  • GraphNet: Learning Image Pseudo Annotations for Weakly-Supervised Semantic Segmentation Mengyang Pu (Beijing Jiaotong University), Yaping Huang (Beijing Jiaotong University), Qingji Guan (Beijing Jiaotong University), Qi Zou (Beijing Jiaotong University)
  • Boosting Scene Parsing Performance via Reliable Scale Prediction Hengcan Shi (University of Electronic Science and Technology of China), Hongliang Li (University of Electronic Science and Technology of China), Qingbo Wu (University of Electronic Science and Technology of China), Fanman Meng (University of Electronic Science and Technology of China), King N. Ngan (Chinese University of Hong Kong and University of Electronic Science and Technology of China)
  • Learning to Synthesize 3D Indoor Scenes from Monocular Images Fan Zhu (Inception Institute of Artificial Intelligence), Li Liu (Inception Institute of Artificial Intelligence), Jin Xie (Nanjing University of Science and Technology), Fumin Shen (University of Electronic Science and Technology of China), Ling Shao (Inception Institute of Artificial Intelligence), Yi Fang (New York University Abu Dhabi)


17:00 – 18:00 Crystal Ballroom2

Oral Session: Multimodal – 1

  • Visual Spatial Attention Network for Relationship Detection Chaojun Han (University of Electronic Science and Technology of China), Fumin Shen (University of Electronic Science and Technology of China), Li Liu (Inception Institute of Artificial Intellegience), Yang Yang (University of Electronic Science and Technology of China), Heng Tao Shen (University of Electronic Science and Technology of China)
  • Object-Difference Attention: A simple relational attention for Visual Question Answering Chenfei Wu (Beijing University of Posts and Telecommunications), Jinlai Liu (Beijing University of Posts and Telecommunications), Xiaojie Wang (Beijing University of Posts and Telecommunications), Xuan Dong (Beijing University of Posts and Telecommunications)
  • Life-long Cross-media Correlation Learning Jinwei Qi (Peking University), Yuxin Peng (Peking University), Yunkan Zhuo (Peking University)
  • Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder Yue Gu (Rutgers University), Xinyu Li (Rutgers University and Amazon Inc.), Kaixiang Huang (Meitu Inc. and Rutgers University), Shiyu Fu (Rutgers University), Kangning Yang (Rutgers University), Shuhong Chen (Rutgers University), Moliang Zhou (Amazon Inc.), Ivan Marsic (Rutgers University)


17:00 – 18:00 Crystal Ballroom3

Oral Session: System – 1

  • End-to-End Blind Quality Assessment of Compressed Video Using Deep Neural Networks Wentao Liu (University of Waterloo), Zhengfang Duanmu (University of Waterloo), Zhou Wang (University of Waterloo)
  • FlexStream: Towards Flexible Adaptive Video Streaming on End Devices using Extreme SDN Ibrahim Ben Mustafa (Old Dominion University), Tamer Nadeem (Virginia Commonwealth University), Emir Halepovic (AT&T Labs – Research)
  • CLS: A Cross-user Learning based System for Improving QoE in 360-degree Video Adaptive Streaming Lan Xie (Peking University and Beijing Hulu Software Technology Development Co., Ltd.), Xinggong Zhang (Peking University and Cooperative Medianet Innovation Center), Zongming Guo (Peking University and Cooperative Medianet Innovation Center)
  • A Distributed Approach for Bitrate Selection in HTTP Adaptive Streaming Abdelhak Bentaleb (National University of Singapore), Ali C. Begen (Ozyegin University), Saad Harous (United Arab Emirates University), Roger Zimmermann (National University of Singapore)

 

Wednesday 24 October – Conference Day 3

10:30 – 11:30 Crystal Ballroom1-3

Best Paper Session

  • GestureGAN for Hand Gesture-to-Gesture Translation in the Wild Hao Tang (University of Trento), Wei Wang (École Polytechnique Fédérale de Lausanne and University of Trento), Dan Xu (University of Oxford and University of Trento), Yan Yan (Texas State University), Nicu Sebe (University of Trento)
  • Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training Bei Liu (Kyoto University), Jianlong Fu (Microsoft Research Asia), Makoto P. Kato (Kyoto University), Masatoshi Yoshikawa (Kyoto University)
  • Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing Jian Zhao (National University of Singapore and National University of Defense Technology), Jianshu Li (National University of Singapore), Yu Cheng (National University of Singapore), Terence Sim (National University of Singapore), Shuicheng Yan (National University of Singapore and Qihoo 360 AI Institute), Jiashi Feng (National University of Singapore)
  • Knowledge-aware Multimodal Dialogue Systems Lizi Liao (National University of Singapore), Yunshan Ma (National University of Singapore), Xiangnan He (National University of Singapore), Richang Hong (Hefei University of Technology), Tat-Seng Chua (National University of Singapore)

15:30 – 16:30 Crystall Ballroom1

Oral Session: Deep – 2

  • Mining Semantics-Preserving Attention for Group Activity Recognition Yansong Tang (Tsinghua University), Zian Wang (Tsinghua University), Peiyang Li (Tsinghua University), Jiwen Lu (Tsinghua University), Ming Yang (Horizon Robotics, Inc.), Jie Zhou (Tsinghua University)
  • Participation-Contributed Temporal Dynamic Model for Group Activity Recognition Rui Yan (Nanjing University of Science and Technology), Jinhui Tang (Nanjing University of Science and Technology), Xiangbo Shu (Nanjing University of Science and Technology), Zechao Li (Nanjing University of Science and Technology), Qi Tian (Huawei and University of Texas at San Antonio)
  • WildFish: A Large Benchmark for Fish Recognition in the Wild Peiqin Zhuang (Chinese Academy of Sciences), Yali Wang (Chinese Academy of Sciences), Yu Qiao (Chinese Academy of Sciences)
  • PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition Haoxuan You (Tsinghua University), Yifan Feng (Xiamen University), Rongrong Ji (Xiamen University), Yue Gao (Tsinghua University)


15:30 – 16:30 Crystall Ballroom2

Oral Session: Multimedia – 2

  • EmotionGAN: Unsupervised Domain Adaptation for Learning Discrete Probability Distributions of Image Emotions Sicheng Zhao (University of California, Berkeley), Xin Zhao (Tsinghua University), Guiguang Ding (Tsinghua University), Kurt Keutzer (University of California, Berkeley)
  • USAR: an interactive user-specific aesthetic ranking framework for images Pei Lv (Zhengzhou University), Meng Wang (Zhengzhou University), Yongbo Xu (Zhengzhou University), Ze Peng (Zhengzhou University), Junyi Sun (Zhengzhou University), Shimei Su (Zhengzhou University), Bing Zhou (Zhengzhou University), Mingliang Xu (Zhengzhou University)
  • Deep Multimodal Image-Repurposing Detection Ekraam Sabir (University of Southern California), Wael AbdAlmageed (University of Southern California), Yue Wu (University of Southern California), Prem Natarajan (University of Southern California)
  • Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning Bowen Pan (University of Science and Technology of China), Shangfei Wang (University of Science and Technology of China)


17:00 – 18:00 Crystall Ballroom1

Oral Session: Vision – 3

  • Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training Xiangteng He (Peking University), Yuxin Peng (Peking University)
  • LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation Kecheng Zheng (University of Science and Technology of China), Zheng-Jun Zha (University of Science and Technology of China), Yang Cao (University of Science and Technology of China), Xuejin Chen (University of Science and Technology of China), Feng Wu (University of Science and Technology of China)
  • Robustness and Discrimination Oriented Hashing Combining Texture and Invariant Vector Distance Ziqing Huang (Tianjin University), Shiguang Liu (Tianjin University)
  • Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval Shuhui Wang (Institute of Computing Technology and CAS), Yangyu Chen (CAS and University of Chinese Academy of Sciences), Junbao Zhuo (Institute of Computing Technology, CAS and University of Chinese Academy of Sciences), Qingming Huang (Institute of Computing Technology, CAS and University of Chinese Academy of Sciences), Qi Tian (Huawei Noah’s Ark Lab and University of Texas at San Antonio)


17:00 – 18:00 Crystall Ballroom2

Oral Session: Multimodal – 2

  • Text-to-image Synthesis via Symmetrical Distillation Networks Mingkuan Yuan (Peking University), Yuxin Peng (Peking University)
  • Context-Aware Visual Policy Network for Sequence-Level Image Captioning Daqing Liu (University of Science and Technology of China), Zheng-Jun Zha (University of Science and Technology of China), Hanwang Zhang (Nanyang Technological University), Yongdong Zhang (University of Science and Technology of China), Feng Wu (University of Science and Technology of China)
  • SibNet: Sibling Convolutional Encoder for Video Captioning Sheng Liu (State University of New York at Buffalo), Zhou Ren (Snap Research), Junsong Yuan (State University of New York at Buffalo)
  • Paragraph Generation Network with Visual Relationship Detection Wenbin Che (Harbin Institute of Technology), Xiaopeng Fan (Harbin Institute of Technology), Ruiqin Xiong (Peking University), Debin Zhao (Harbin Institute of Technology)

 

Thursday 25 October – Conference Day 4


10:30 – 11:30 Crystall Ballroom1

Oral Session: Multimedia – 3

  • Supervised Online Hashing via Hadamard Codebook Learning Mingbao Lin (Xiamen University), Rongrong Ji (Xiamen University), Hong Liu (Xiamen University), Yongjian Wu (Tencent Youtu Lab, Tencent Technology (Shanghai) Co., Ltd.)
  • Cascaded Feature Augmentation with Diffusion for Image Retrieval Yuanqiang Fang (University of Science and Technology of China), Wengang Zhou (University of Science and Technology of China), Yijuan Lu (Texas State University), Jinhui Tang (Nanjing University of Science and Technology), Qi Tian (Huawei and University of Texas at San Antonio), Houqiang Li (University of Science and Technology of China)
  • Deep Triplet Quantization Bin Liu (Tsinghua University and Beijing National Research Center for Information Science and Technology), Yue Cao (Tsinghua University and Beijing National Research Center for Information Science and Technology), Mingsheng Long (Tsinghua University and Beijing National Research Center for Information Science and Technology), Jianmin Wang (Tsinghua University and Beijing National Research Center for Information Science and Technology), Jingdong Wang (Microsoft Research Asia)
  • Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels Xingbo Liu (Shandong University), Xiushan Nie (Shandong University of Finance and Economics), Wenjun Zeng (Microsoft Research Asia), Chaoran Cui (Shandong University of Finance and Economics), Lei Zhu (Shandong Normal University), Yilong Yin (Shandong University)


10:30 – 11:30 Crystall Ballroom2

Oral Session: Experience – 1

  • ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations Shuai Zheng (eBay Inc.), Fan Yang (eBay Inc.), M. Hadi Kiapour (eBay Inc.), Robinson Piramuthu (eBay Inc.)
  • SLIONS: A Karaoke Application to Enhance Foreign Language Learning Dania Murad (National University of Singapore), Riwu Wang (National University of Singapore), Douglas Turnbull (Ithaca College), Ye Wang (National University of Singapore)
  • Context-Aware Unsupervised Text Stylization Shuai Yang (Peking University), Jiaying Liu (Peking University), Wenhan Yang (Peking University), Zongming Guo (Peking University)
  • Songle Sync: A Large-Scale Web-based Platform for Controlling Various Devices in Synchronization with Music Jun Kato (National Institute of Advanced Industrial Science and Technology (AIST)), Masa Ogata (National Institute of Advanced Industrial Science and Technology (AIST)), Takahiro Inoue (National Institute of Advanced Industrial Science and Technology (AIST)), Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST)


10:30 – 11:30 Crystall Ballroom3

Oral Session: System – 2

  • Fine-grained Grocery Product Recognition by One-shot Learning Weidong Geng (Zhejiang University), Feilin Han (Zhejiang University), Jiangke Lin (Zhejiang University), Liuyi Zhu (Zhejiang University), Jieming Bai (Zhejiang University), Suzhen Wang (Zhejiang University), Lin He (Zhejiang University), Qiang Xiao (Zhejiang University), Zhangjiong Lai (Zhejiang University)
  • Reconfigurable Inverted Index Yusuke Matsui (National Institute of Informatics), Ryota Hinami (University of Tokyo), Shin’ichi Satoh (National Institute of Informatics)
  • Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes Hiroshi Sankoh (KDDI Research, Inc.), Sei Naito (KDDI Research, Inc.), Keisuke Nonaka (KDDI Research, Inc.), Houari Sabirin (KDDI Research, Inc.), Jun Chen (KDDI Research, Inc.)
  • iHuman3D: Intelligent Human Body 3D Reconstruction using a Single Flying Camera Wei Cheng (Tsinghua University and Hong Kong University of Sci. and Tech.), Lan Xu (Tsinghua University and Hong Kong University of Sci. and Tech.), Lei Han (Tsinghua University and Hong Kong University of Sci. and Tech.), Yuanfang Guo (Tsinghua University), Lu Fang (Tsinghua University)


15:30 – 16:30 Crystall Ballroom1

Oral Session: Deep – 3

  • Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond Risheng Liu (Dalian University of Technology and Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province), Yi He (Dalian University of Technology and Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province), Shichao Cheng (Dalian University of Technology and Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province), Xin Fan (Dalian University of Technology and Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province), Zhongxuan Luo (Dalian University of Technology and Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province)
  • When Deep Fool Meets Deep Prior: Adversarial Attack on Super-Resolution Network Minghao Yin (Tsinghua University), Yongbing Zhang (Tsinghua University), Xiu Li (Tsinghua University), Shiqi Wang (City University of Hong Kong)
  • Semantic Image Inpainting with Progressive Generative Networks Haoran Zhang (Hefei University of Technology), Zhenzhen Hu (Hefei University of Technology), Changzhi Luo (Hefei University of Technology), Wangmeng Zuo (Harbin Institute of Technology), Meng Wang (Hefei University of Technology)
  • Structural inpainting Huy V. Vo (Ecole Polytechnique), Ngoc Q. K. Duong (Technicolor), Patrick Pérez (Valeo.ai)


17:00 – 18:00 Crystall Ballroom1

Oral Session: Vision – 4

  • Fine-grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding Tianshui Chen (Sun Yat-sen University), Wenxi Wu (Sun Yat-sen University), Yuefang Gao (South China Agricultural University), Le Dong (University of Electronic Science and Technology of China), Xiaonan Luo (Guilin University of Electronic Technology), Liang Lin (Sun Yat-sen University)
  • Dissimilarity Representation Learning for Generalized Zero-Shot Recognition Gang Yang (Renmin University of China), Jinlu Liu (Renmin University of China), Jieping Xu (Multimedia Computing Lab, School of Information), Xirong Li (Multimedia Computing Lab, School of Information)
  • Attribute-Aware Attention Model for Fine-grained Representation Learning Kai Han (Peking University and Alibaba Group), Jianyuan Guo (Peking University), Chao Zhang (Peking University), Mingjian Zhu (Peking University)
  • GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning Siyu Huang (Zhejiang University), Xi Li (Zhejiang University), Zhi-Qi Cheng (Southwest Jiaotong University), Zhongfei Zhang (Zhejiang University), Alexander Hauptmann (Carnegie Mellon University)