Regular Papers


ACM Multimedia is the premier conference in multimedia, a research field that discusses emerging computing methods from a perspective in which each medium — e.g. images, text, audio — is a strong component of the complete, integrated exchange of information. The multimedia community has a tradition of being able to handle big data, it has been a pioneer in large scale evaluations and dataset creations, and is uniquely angled towards novel applications and cutting edge industrial challenges. As such the conference openly embraces new intellectual angles from both industry as well as academia and welcomes submissions from related fields, such as data science, HCI and signal processing. ACM Multimedia 2017 calls for research papers presenting novel theoretical and algorithmic solutions addressing problems across the domain of multimedia and related applications. The conference also calls for papers presenting novel, thought-provoking ideas and promising (preliminary) results in realizing these ideas. From 2017 on, the conference invites research papers of varying length from 6 to 8 pages, plus additional pages for the reference pages; i.e., the reference page(s) are not counted to the page limit of 6 to 8 pages. Please note that there is no longer the distinction between long and short papers, but the authors may themselves decide on the appropriate length of the paper. All papers will undergo the same review process and review period. The conference invites papers in four major themes of multimedia: Understanding, Engagement, Experience, and Systems. Each of the themes will be led by a program chair of the conference. Please find in the call a description of the themes and the topics of interest.

Submissions are invited in the following 15 topic areas grouped into 4 themes:


Multimedia data types by their very nature are complex and often involve intertwined instances of different kinds of information. We can leverage this multi-modal perspective in order to extract meaning and understanding of the world, often with surprising results. Specific topics addressed this year include:

  • Multimedia and Vision
  • Multimodal Analysis and Description
  • Deep Learning for Multimedia


The engagement of multimedia with society as whole requires research that addresses how multimedia can be used to connect people with multimedia artifacts that meet their needs in a variety of contexts. The topic areas included under this theme include:

  • Emotional and Social Signals In Multimedia
  • Multimedia Search and Recommendation
  • Social Multimedia


One of the core tenants of our research community is that multimedia data contributes to the user experience in a rich and meaningful manner. The topics organized under this theme are concerned with innovative uses of multimedia to enhance the user experience, how this experience is manifested in specific domains, and metrics for qualitatively and quantitatively measuring that experience in useful and meaningful ways. Specific topic areas addressed this year include:

  • Multimedia HCI and Quality of Experience
  • Multimedia Art, Entertainment and Culture
  • Multimedia for Collaboration in Education & Distributed Environments
  • Music, Speech and Audio Processing in Multimedia


Research in multimedia systems is generally concerned with understanding fundamental tradeoffs between competing resource requirements, developing practical techniques and heuristics for realizing complex optimization and allocation strategies, and demonstrating innovative mechanisms and frameworks for building large-scale multimedia applications. Within this theme, we have focussed on four target topic areas:

  • Mobile Multimedia
  • Multimedia Scalability and Management System
  • Multimedia Systems and Middleware
  • Multimedia Telepresence and Virtual/Augmented Reality
  • Multimedia Transport and Delivery

Multimedia and  Vision

Vision plays an important role for multimedia content understanding, and enables various emerging applications such as surveillance, game and healthcare. Nevertheless, vision often only tells one side of story and how to integrate vision with other modalities (e.g., language, audio, sensor information, user related data) remains a big challenges towards context aware multimedia understanding. On the other hand, in some application scenarios, wise leveraging of different modalities and sensors to complement vision analysis can tackle ill-posed problems in computer vision. Therefore, this area seeks for high-quality papers that either propose novel multimedia approaches (e.g., through multi-modal analysis) for computer vision problems or modification of vision techniques for multimedia problems (e.g., search, annotation, coding).

Topics of interest include, but are not restricted to:

– Multi-modal analysis for vision problems
– Scene analysis and object recognition for multimedia understanding
– 3D analysis and modeling for multimedia
– Vision and language for multimedia
– Multi-modal domain adaptation and translation
– Vision-based multimedia content editing and human-computer interaction
– Vision-based multimedia analytics
– Vision-driven multimedia application (e.g., forensic, healthcare, medicine, game, cultural heritage, automotive)
– Visual / drone / multi-camera surveillance
– Vision-directed compression and quality enhancement

Area Chairs:

  • Marco Bertini, University of Florence, Italy
  • Ionescu Bogdan, University Politehnica of Bucharest, Romania
  • Jianfei Cai, Nanyang Technological University, Singapore
  • Rita Cucchiara, Università degli Studi di Modena e Reggio Emilia, Italy
  • Alex Hauptmann, Carnegie Mellon University, USA
  • Wei Liu, Tencent AI Lab, China
  • Chong-Wah Ngo, City University of Hong Kong, Hong Kong
  • Elisa Ricci, University of Perugia, Italy
  • Nicu Sebe, University of Trento, Italy
  • Jingdong Wang, Microsoft Research Asia, China
  • Junsong Yuan, State University of New York at Buffalo, USA
  • Jian Zhang, University of Technology Sydney, Australia
  • Zheng-Jun Zha, University of Science and Technology of China, China

Multimodal Analysis and Description

Analysis of multimedia content enables us to better understand what the content is about in order to improve its indexing, representation, and consumption for the purpose of retrieval, content creation/enhancement and interactive applications. Research so far has mostly focused on mono-modal analysis of multimedia content, such as looking only into images, only into text, or only into video, but ignoring other modalities like the text floating around an image on a web page or the audio accompanied with the video.

The goal of this area is to attract novel multimodal/multisensor analysis research that takes multiple modalities into account when it comes to multimedia content analysis and better description of the multimedia content. The different modalities may be temporally synchronized (e.g., video clips and corresponding audio transcripts, animations, multimedia presentations), spatially related (images embedded in text, object relationships in 3D space), or otherwise semantically connected (combined analysis of collections of videos, set of images created by one’s social network).

This area calls for submissions that reveal the information encoded in different modalities, combine this information in a non-trivial way and exploit the combined information to significantly expand the current possibilities for handling and interacting with multimedia content. In addition, the submitted works are expected to support effective and efficient interaction with large-scale multimedia collections and to stretch across mobile and desktop environments in order to address changing demands of multimedia consumers.

Topics of interest include, but are not restricted to:

– Multimodal feature extraction, representation, and fusion
– Multimodal semantic concept detection, object recognition and segmentation
– Multimodal approaches to complex activity detection and event analysis
– Multimodal approaches to temporal or structural analysis of multimedia data
– Cross-modal image/video captioning/storytelling or caption to image/video generation
– Cross-modal metric learning and retrieval
– Simple yet effective multimodal features and similarity functions
– Scalable processing and scalability issues in multimodal content analysis
– Evaluation metrics and methodologies for multimodal analysis
– Multimodal/multisensor 360 degree video content analysis

Area Chairs:

  • Phoebe Chen, La Trobe University, Australia
  • Benoit Huet, Eurecom, France
  • Qin Jin, Renming University of China, China
  • Gunhee Kim, Seoul National University, Korea
  • Liqiang Nie, Shandong University, China
  • Heng Tao Shen, University of Electronic Science and Technology of China, China
  • Qi Tian, University of Texas San Antonio, USA
  • Marcel Worring, University of Amsterdam, The Netherlands
  • Lei Zhang, Microsoft Research, USA
  • Toshihiko Yamasaki, The University of Tokyo, Japan

Deep Learning for Multimedia

Deep Learning has recently found success in a large variety of domains, from computer vision to speech recognition, natural language processing, web search ranking, and even online advertising. Deep Learning’s power comes from learning rich representations of data that can be tuned for the task of interest. The ability of Deep Learning methods to capture the semantics of data is however limited by both the complexity of the models and the intrinsic richness of the input to the system. In particular, most of the current methods only consider a single modality leading to an impoverished model of the world. Sensory data are inherently multimodal instead: images are often associated with text; videos contain both visual and audio signals; text is often related to social content from public media; etc. Considering cross-modality structure may yield a big leap forward in machine understanding of the world.

Learning from multimodal inputs is technically challenging because different modalities have different statistics and different kinds of representation. For instance, text is discrete and often represented by very large and sparse vectors, while images are represented by dense tensors that exhibit strong local correlations. Fortunately, Deep Learning has the promise to learn adaptive representations from the input, potentially bridging the gap between these different modalities.

In this area, we encourage submissions that effectively deploy Deep Learning to advance the state of the art across the domain of multimedia and related applications.

Topics of interest include, but are not restricted to:

Deep learning applications involving multiple modalities, such as images, videos, audio, text, clicks or any other kind of (social) content and context

– Deploying deep learning to learn features from multimodal inputs
– Deploying deep learning to generate one modality from other modalities
– Deploying deep learning to increase the robustness to missing modalities
– Deep learning based methods that leverage multiple modalities and also account for temporal dynamics
– Novel deep learning theories, network architectures and/or learning methods for multimodal data analytics
– Cross-modality transfer learning for efficient data analysis
– Novel deep learning techniques for multimedia data management and retrieval
– Novel adversarial deep learning techniques for generative multimedia applications

Area Chairs:

  • Shuqiang Jiang, Chinese Academy of Sciences, China
  • Yu-Gang Jiang, Fudan University, China
  • Vasileios Mezaris, Information Technologies Institute, Greece
  • Mei-Ling Shyu, University of Miami, USA
  • Lamberto Ballan, University of Padova, Italy
  • Changsheng Xu, Chinese Academy of Sciences, China
  • Ting Yao, Microsoft Research Asia, China

Emotional and Social Signals in Multimedia

Machine understanding of human communicative behaviors, in particular, social and emotional signals, is one of the most active areas in human-centered multimedia computing. The interpretation, analysis, and synthesis of social and emotional signals requires a different expertise that draws from a combination of signal processing, machine learning, pattern recognition, behavioral and social psychology and cognitive science. Analyzing multimedia content, where humans spontaneously express and respond to social or affect signals, helps to attribute meaning to users’ attitudes, preferences, relationships, feelings, personality, etc., as well as to understand the social and affective context of activities occurring in people’s everyday life.

This area focuses on the analysis of emotional, cognitive (e.g. brain-based) and interactive social behavior in the spectrum of individual to small group settings. It calls for novel contributions with a strong human-centered focus specializing in supporting or developing automated techniques for analyzing, processing, interpreting, synthesizing, or exploiting human social, affective and cognitive signals for multimedia applications. Special emphasis is put on multimodal approaches leveraging multiple streams when analyzing the verbal and/or non-verbal social and emotional signals during interactions. These interactions could be remote or co-located, and can include e.g. interactions between multiple people, humans with computer systems/robots, or humans with conversational agents.

Topics of interest include, but are not restricted to:

– Human social, emotional, and/or affective cue extraction
– Cross-media and/or multimodal fusion of interactive social and/or affective signals
– The analysis of social and/or emotional behavior
– Novel methods for the interpretation of interactive social and/or affective signals
– Novel methods for the classification and representation of interactive social and/or emotional signals
– Real-time processing of interactive social and emotional signals for interactive/assistive multimedia systems
– Emotionally and socially aware dialogue modeling
– Affective (emotionally sensitive) interfaces
– Socially interactive and/or emotional multimedia content tagging
– Social interactions and/or affective behavior for quality of delivery of multimedia systems
– Collecting large scale affective and/or social signal data
– Multimedia tools for affective or interactive social behavior
– Facilitating and understanding ecological validity for emotionally and socially aware multimedia
– Annotation, evaluation measures, and benchmarking

Area Chairs:

  • Elisabeth Andre, University of Augsburg, Germany
  • Wen-Huang Cheng, Sinica, Taiwan
  • Jia Jia, Tsinghua University, China
  • Hayley Hung, Technical University of Delft, The Netherlands

Multimedia Search and Recommendation

Navigating, indexing, searching and discovering content in large collections of multimedia is a key concern for users and service providers. In spite of great progress in this area, several technical challenges still persist, in particular with respect to performing semantically aware search and recommendations. In the past decade, there has been an explosive growth of multimedia contents on the Web, desktops, and mobile devices. The deluge of multimedia leads to “information overload” and poses new challenges and requirements for effective and efficient access to multimedia content. Multimedia search and recommendation techniques are essential in order to provide information relevant to users’ information needs.

This area calls for contributions on reporting novel problems, solutions, models, and/or theories that tackle the key issues in searching, recommending, and discovering multimedia content, as well as a variety of multimedia applications based on search and recommendation technologies.

Topics of interest include, but are not restricted to:

– Large-scale multimedia indexing, ranking, and re-ranking
– Novel representation and scalable quantization for efficient multimedia retrieval
– Interactive and collaborative multimedia search
– Search, ranking and recommendation for social media data
– User intent modeling, query suggestion and feedback mechanisms
– Multimedia search in specific domains (e.g., scientific, enterprise, social, fashion)
– Summarization and organization of multimedia collections
– Knowledge discovery from massive multimedia data
– Learning to rank multimedia content
– Cross-modal and multi-modal representation learning
– Context and location-aware search
– Data representations for recommendation tasks
– Multimedia-focused recommendation models
– Cross-modal recommendations
– Personalization in recommendation
– New interaction strategies for recommendation
– Diversity in search and recommendation
– Generalization of recommendations to new users and/or new content (cold start)

Area Chairs:

  • Xian-Sheng Hua, Alibaba, China
  • Alexis Joly, INRIA, France
  • Yiannis Kompatsiaris, Information Technologies Institute, Greece
  • Jitao Sang, Beijing Jiaotong University, China

Social Multimedia

This area seeks novel contributions investigating online social interactions around multimedia systems, streams, and collections. Social media (such as Facebook, Twitter, Flickr, YouTube etc.) has substantially and pervasively changed the communication among organizations, communities, and individuals. Sharing of multimedia objects, such as images, videos, music, associated text messages, and recently even digital activity traces such as fitness tracking measurements, constitutes a prime aspect of many online social systems nowadays. This gives us valuable opportunities to understand user-multimedia interaction mechanisms, to predict user behavior, to model the evolution of multimedia content and social graphs, or to design human-centric multimedia applications and services informed by social media, like analysing and predicting related real-world phenomena.

The submissions in this area should look specifically at methods and systems wherein social factors, such as user profiles, user behaviors and activities, and social relations are organically integrated with online multimedia data to understand media content, media use in an online social environment. Or they should leverage the socially created data to solve challenging problems in traditional multimedia computing, enable applications addressing real-world problems (e.g. sales prediction, brand and environmental monitoring) or address new research problems emergent in the online social media scenario.

The proposed contributions are expected to scale up to serve large online user communities. They should exploit massive online collective behavior by looking at e.g., large-group online interactions and group sentiments aggregated across many users in an online community. They should also be able to handle large, heterogeneous and noisy multimedia collections typical for social media environments. Special emphasis is put on multimodal approaches leveraging multiple information sources and modalities to be found in the social media context.

Topics of interest include, but are not restricted to:

– Social media data collection, filtering, and indexing
– Social media data representation and understanding
– User profiling from social media
– Personal information disclosure and privacy aspects of social media
– Modeling collective behavior in social media
– Multimedia propagation in online social environments
– Spatial-temporal context analysis in social media
– Monitoring, sensing, prediction and forecasting applications with social media
– Multimedia-enabled social sharing of information
– Detection and analysis of emergent events in social media collections
– Verification of social media content
– Evaluation of user engagement around shared media
– Convergence between Internet of Things, wearables and social media
– Systems and analysis of location-based social media
– Network theory and algorithms in social multimedia systems
– Models for the spatiotemporal characteristics of social media
– Models and systems for analyzing large-scale online sentiments

 Area Chairs:

  • Peng Cui, Tsinghua University, China
  • Hayley Hung, Technical University of Delft, The Netherlands
  • Vivek Singh, Rutgers University, USA

Multimedia HCI and Quality of Experience

There is a growing evolution of media towards interactivity, which is prompted by the intrinsically interactive nature of devices used for media consumption, as well as progress in media content description that makes it amenable to direct access and manipulation. The same advances also provide a basis for capturing and understanding user experience.The Multimedia HCI and QoE area follows on from similar areas in previous editions of the conference, which have been dedicated to Human-Centered Multimedia, Media Interactions, or Multimedia HCI. The specific inclusion of Quality of Experience (QoE) recognizes the need for user studies to focus on the most specific aspects of media consumption.

Topics of interest include, but are not restricted to:

– Design and implementation of novel interactive media: interactive films and narrative, storyfication of social media
– Human-centred multimedia, including immersive multimedia
– Novel interaction modalities for accessing media content, including multimodal, affective, and brain-computer interfaces
– Systems and architectures for multimodal and multimedia integration
– User interaction for media content authoring or media production
– Subjective assessment methodologies to estimate the QoE in multimedia systems
– Influencing parameters, models and objective metrics to measure QoE in multimedia systems
– System design and implementations taking advantage of a direct multimedia and multimodal QoE measurements
– Datasets, benchmarks and validation of multimedia quality of experience

Area Chairs:

  • Max Mühlhäuser, Technische Universität Darmstadt, Germany
  • Kuk-Jin Yoon, Gwangju Institute of Science and Technology

Multimedia Art, Entertainment and Culture

Multimedia plays a significant role in engaging the public with art and other forms of cultural expression, as well facilitates development of tools that provide rich entertainment experiences to users. A key challenge is therefore to develop techniques that enable effective engagement within these applications. The focus of this area is on the innovative use of digital multimedia technology in arts, entertainment, and culture, to support the creation of multimedia content, creative applications of multimedia technologies, artistic interactive and multimodal installations, the analysis of media consumption and user experience, or cultural preservation.

We seek full and short papers in a broad range of integrated artistic and scientific statements describing digital systems for arts, entertainment, and culture. Successful papers should achieve a balance between sophisticated technical content and artistic or cultural purpose.

Topics of interest include, but are not restricted to:

– Models of interactivity specifically addressing arts and entertainment
– Active experience of multimedia artistic content by means of socio-mobile multimodal systems
– Analysis of spectator experience in interactive systems or digitally-enhanced performances
– Virtual and augmented reality artworks, including hybrid physical/digital installations
– Dynamic, generative and interactive multimedia artworks
– Creativity support tools
– Computational creativity (case studies and systems) using multimedia technologies
– Computational aesthetics in multimedia and multimodal systems
– Tools for or case studies on cultural preservation or curation

Area Chairs:

  • Rongrong Ji, Xiamen University, China
  • Dhiraj Joshi, IBM T J Watson, USA
  • Jongseok Lee, Yonsei University, Korea
  • Miriam Redi, Bell Labs Cambridge, UK

Multimedia for Collaboration in Education and Distributed Environments 

Remote collaboration has become a common element of consumer and enterprise communication. At the same time, video-conferencing practices built on steadily improved networking and computing infrastructure have yet to move significantly beyond the audio-driven “talking-heads” experience. We solicit papers that describe new interactions and technologies that more fully exploit improvements in underlying technologies to enhance and transform the end user experience. Better integration of context detection methods can help to drive novel interaction designs. Immersive and tangible technologies can also be considered, as well as software solutions for both synchronous and asynchronous collaboration. Other cooperative settings build on sharing multimedia in collaborative or public spaces using a mix of shared and personal devices, as well as automatically sensed data streams. Different modalities must be explored in interfaces for groups and their members, whether they cooperate tightly or loosely, in public or in private, synchronously or asynchronously. Educational scenarios represent an application area with structured content for student consumption and potentially large student communities that need the ability to interact effectively with each other and with instructors often asynchronously. Interfaces that allow students efficient access both to content and collaborative opportunities are needed. As multimedia communication expands its importance in our private, professional, and social lives, the design space for technologies that allow its effective use needs to expand in novel and disruptive directions. We welcome submissions addressing any of these challenges.

Topics of interest include, but are not restricted to:

– Gesture- and tangible-based interaction for remote collaboration
– Multi-device, multi-display interaction for collaboration
– Multimedia interaction in public spaces
– Interaction techniques for sync and async collaboration
– Novel interactions with multimedia in-the- wild
– Mobile collaborative techniques
– Multimedia-based interaction techniques for remote assistance
– Context-aware multimedia experiences
– Enabling technologies and novel modalities for remote collaborations with and through multimedia
– Field trials, user studies and quality of experience in remote collaboration
– Augmented reality enhancing remote collaboration
– Multi-modal information processing for enhanced remote collaboration

Area Chairs:

  • Matt Cooper, FXPAL, USA
  • Youn-ah Kang, Yonsei University, Korea
  • Mark Liao, Sinica, Taiwan

Music, Speech and Audio Processing in Multimedia

As a core part of multimedia data, the acoustic modality is of great importance as a source of information that is orthogonal to other modalities like video or text. Incorporating this modality in multimedia systems allows for richer information to be extracted when performing multimedia content analysis and provides richer means for interacting with multimedia data collections.

In this area, we call for strong technical submissions revolving around music, speech and audio processing in multimedia. These submissions may address analysis of the acoustic signals in order to extract information from multimedia content (e.g. what notes are being played, what is being said, or what sounds appear), or from the context (e.g. vocal timbre, language spoken, age and gender of the speaker, localization using sound). They may also address the synthesis of acoustic content for multimedia purposes (e.g. speech synthesis, expressive singing synthesis, acoustic scene synthesis), or novel ways to represent acoustic data as multimedia, for example, by combining audio analysis with recordings of gestures in the visual channel. We are also interested in the submissions addressing novel multimedia interfaces and interaction concepts enabled by the inclusion of acoustics, as well as the changing paradigms of analyzing, indexing, organizing, retrieving, recommending, consuming, creating, and enjoying music by taking into account contextual, social, and affective aspects and content from other modalities or other information sources in general.

While the acoustic modality is central in this area, the submissions should consider this modality in the multimedia context, for instance by developing methods and approaches addressing multimedia items, applications or systems, or by explicitly considering information sources from different modalities.

Topics of interest include, but are not restricted to:

– Multimodal approaches to music, speech, and audio analysis and synthesis
– Multimodal approaches to music, speech, and audio indexing, classification, retrieval, and recommendation
– Multimodal and multimedia context models for music, speech, and audio
– Computational approaches to music, speech, and audio inspired by other domains (e.g. computer vision, information retrieval, musicology, psychology)
– Multimedia localization using acoustic information
– Social data, user models, and personalization in music, speech, and audio
– Music, audio, and aural aspects in multimedia user interfaces
– Multimedia and/or interactive musical instruments and systems
– Multimedia applications, services, or platforms around music, speech, and audio
– Music, speech, and audio coding, transmission, and storage for multimedia applications

Area Chairs:

  • Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan
  • Jialie Shen, Newcastle University, UK
  • Xiao-Ping Zhang, Ryerson University, Canada

Mobile Multimedia

With a multitude of sensors, such as accelerometer, GPS, multiple cameras (video and still), microphone and speakers, mobile devices are arguably the truest embodiment of multimedia. Additionally, emerging wearable devices, such as smart glasses, head-mounted displays, wristbands, and smartwatches, have opened opportunities for new multimedia applications in different fields, including but not limited to healthcare, education, and computational social science.

As processing power, sensor quality, and display resolution continue to improve at an exponential pace, the potential of these devices is seemingly unbounded. At the very same time, however, much more limited growth in battery life and communication capacity create distinct systems-level challenges that must be addressed to tap that potential. Likewise, visions such as the Internet of Things (IoT), where billions of physical devices are embodied with cameras, displays, or other sensors, enable endless opportunities, but also tremendous challenges for multimedia system design and implementation.

Topics of interest include, but are not restricted to:

– Media streaming to/from/between mobile devices
– Low-power and power-aware media services and multimedia applications
– Mobility models and simulations in service of multimedia research
– Innovative uses of single or multiple sensor data (e.g., location) for novel applications
– Performance and quality-of- experience studies of multimedia applications in a mobile context
– Real-time interactive mobile-to- mobile conferencing
– Crowdsourcing within mobile multimedia applications
– Sensor-based interaction with mobile multimedia (including touch, gestures, kinesthetic, and vision-based interaction)
– Vehicle-based multimedia systems
– Peer-to- peer mobile media and applications
– Wearable device systems and applications
– Multimedia systems for IoT (Internet of Things)

Area Chairs:

  • Winston Hsu, National Taiwan University, Taiwan
  • Wolfgang Hurst, Utrecht University, The Netherlands
  • Zhisheng Yan, Georgia State University, USA

Multimedia Scalability and Management System

The past few years have witnessed several breakthroughs in deep learning algorithms, explosive growth of multimedia data, and scalable distributed computing platforms. Such breakthroughs not only have resulted in a blossom of novel multimedia applications and systems, but also pose new challenges particularly in large-scale multimedia understanding, search, sharing, and management.

With the aim to bridge the gap between long-term research and fast-evolving large-scale real systems, this area calls for submissions from both academia and industry that either explore novel solutions or describe solid implementations for addressing the scalability and management challenges in multimedia systems. This includes efficient algorithms for multimedia content processing, indexing, and serving; new programing models for multimedia computing and communication; novel tools and platforms for developing multimedia cloud services; and scalable storage systems for managing explosively growing multimedia data.

Topics of interest include, but are not restricted to:

– Scalable systems for multimedia indexing, search, and mining
– Scalable techniques for multimedia data storage and management
– Distributed systems for processing, fusion, and analysis of large-scale multimedia data
– Real-time processing, aggregation, and analysis of streaming multimedia data
– Novel scenarios and solutions in multimedia verticals (e.g. mobile visual recognition of landmarks, products, book covers, barcode, etc.)
– Tools and infrastructure for developing multimedia services on cloud and mobile platforms
– Reliability, availability, serviceability of multimedia services
– AI oriented large -scale video management for smart city: technologies, standards and beyond


Area Chairs:

  • Ling-Yu Duan, Peking University, China
  • Mohan Kankanhalli, National University of Singapore, Singapore

Multimedia Systems and Middleware

The Multimedia Systems and Middleware area targets applications, mechanisms, algorithms, and tools that enable the design and development of efficient, robust, and scalable multimedia systems. In general, it includes solutions at various levels in the software and hardware stacks.

We call for submissions that explore the design of architectures and software for large-scale and/or distributed multimedia systems, multimedia in pervasive computing applications, as well as mobile multimedia. This includes tools and middleware to build multimedia applications, such as content adaptation and transcoding, stream processing, scheduling and synchronization, and cloud multimedia systems.

Multimedia architectures and systems are continuously evolving, and hardware technology changes influence middleware and applications. We therefore also solicit submissions of new research on host-device interaction in heterogeneous systems, applications for transactional memory, and multi-level memory architectures (e.g., RAM – SSDs – spinning disks) for operating systems and storage functions.

Topics of interest include, but are not restricted to:

– Efficient implementations of processing frameworks for multimedia workloads
– System and middleware implementation with graphics processing units (GPUs), network processors, and field-programmable gate arrays (FPGAs)
– Multimedia systems over new generation computing platforms, e.g., social, cloud, Software-Defined Everything (SDx), and human-based computing
– Middleware systems for mobile multimedia, especially for emerging wireless technologies, e.g., 5G mobile Internet, Low-Power Wide-Area Network (LPWAN), and LiFi
– Real-time multimedia systems and middleware
– Embedded multimedia systems and middleware
– QoS and QoE of multimedia systems and middleware
– System prototypes, deployment, and measurements
– Energy-efficiency for multimedia systems and middleware
– Multimedia storage systems
– Immersive, 360° and virtual world systems and middleware
– Augmented and virtual reality systems and middleware, e.g. visual SLAM and visual odometry
– Performance and energy optimization of multimedia applications on mobile embedded systems, e.g. binarization of convolutional neural networks

Area Chairs:

  • Xin Yang, Huazhong University of Science and Technology, China
  • Ketan Mayer-Patel, University of North Carolina, USA
  • Roger Zimmerman, National University of Singapore, Singapore

Multimedia Telepresence and Virtual/Augmented Reality

Telepresence and virtual/augmented reality have for a long time been grand challenges for researchers and industry alike. High-resolution 3D telepresence can dramatically improve sense of presence for interpersonal and group communication. This is paramount for supporting non-verbal and subconscious communication that is currently lost in video and audio conferencing environments. Realistic virtual/augmented reality enables a wide spectrum of important applications including tele-medicine, training for hazardous situations, scientific visualization, and engineering prototyping. Addressing the challenges of telepresence and virtual/augmented reality requires the development of new media representation, processing and streaming techniques as well as innovations in human-computer interaction.

Topics of interest include, but are not restricted to:

– Multi-camera coding and streaming
– 3D video coding
– Image-based rendering for virtual/augmented environments
– Virtual/augmented reality user interface design and evaluation
– Haptic interfaces for virtual/augmented reality
– Virtual-world design and authoring tools
– 3D sound rendering in virtual/augmented environments
– Multi-viewpoint stereo for group telepresence
– Automated group telepresence capture and control
– Distributed multi-user virtual/augmented reality systems
– Real-time bandwidth adaptation for VR and telepresence
– Innovative VR and telepresence applications
– Quality of experience models and evaluation for VR and telepresence

Area Chairs:

  • Pablo Cesar, CWI, The Netherlands
  • James She, Hong Kong University of Science and Technology, Hong Kong
  • Zhu Li, University of Missouri, Kansas City, USA

Multimedia Transport and Delivery

Today’s internet traffic is dominated by multimedia content which calls for novel paradigms and techniques subject to this area. The Multimedia Transport and Delivery area invites research in that area. Contributions are devoted to presenting technological advancements and innovations in multimedia transmission of all kinds over computer networks (internet). The contributions are expected to provide new theoretical or experimental insights into transport and delivery mechanisms, enhancements to one or more system components, complete systems or applications.

Topics of interest include, but are not restricted to:

– Dynamic adaptive multimedia streaming within heterogeneous environments
– Content placement and distribution mechanisms
– Quality of Service (QoS) and Quality of Experience (QoE)
– Future multimedia internetworking: information-centric networking and access networks (5G)
– Next-generation/future video coding support for networked media applications and services
– Network-distributed video coding and network-based media processing
– Multimedia content-aware pre-fetching and caching, multimedia analysis and recommendations for media distribution and caching
– New deployment concepts, such as network function virtualization and software defined networking in the context of multimedia transport and delivery
– Transport and delivery of immersive media like virtual reality (VR), augmented reality (AR), 360° video and multi-sensory systems
– Machine learning (ML) and artificial intelligence (AI) technologies for multimedia transport and delivery
– Cloud-based multimedia transport and delivery
– Hybrid approaches to multimedia transport and video analytics
– Standardization: DASH, MMT, CMAF, OMAF, WebRTC, HTTP/2, QUIC, MPTCP, MSE/EME, WebVR, Hybrid Media, WAVE, etc.
– Applications: social media, game streaming, personal broadcast, healthcare, industry 4.0, education, transportation, etc.

Area Chairs:

  • Christian Timmerer, Alpen-Adria-Universität Klagenfurt, Austria
  • Britta Meixner, CWI, The Netherlands
  • Yonggang Wen, Nanyang Technological University, Singapore