Research Panel Discussion

The Future of Audio Multimedia

6th November, 2pm-3:30PM Palm 2


Media sharing sites on the Internet and the one-click upload capability of smartphones have led to a deluge of online multimedia content. In 1 month, more videos content is uploaded to YouTube than all the US media companies produced in 60 years, creating an ever-growing demand for methods to make them easier to retrieve, search, and index. While visual information is a very important part of a video, acoustic information often complements it. This is especially true for the analysis of consumer-produced, “unconstrained” videos from social media networks, such as YouTube uploads or Flickr content. In the past years, however, the content track of ACM Multimedia has traditionally focused on the machine vision tasks of video and image analysis. It is time to induce a shift in this conference and introduce audio as an equally-weighted focus. By audio, we mean any audio that a computer might encounter in YouTube videos or on mobile device microphones: speech, music, environmental sounds, and noise. Collectively, the analysis of these signals is known as “machine listening” or “computational audition” and is a growing area of research. This panel will bring together experts in machine listening to discuss the future of the field and its relationship to multimedia analysis. It will be aimed at the general multimedia audience, including both researchers who study machine listening and those who study image and video analysis, with the hope of seeding new cross-disciplinary ideas and collaborations. Panelists will specifically address the information complementary to video that is easily obtained from audio, including the linguistic and emotional content of speech, characteristics of talkers, characteristics of acoustic scenes and events, and characteristics of wildlife and natural events.

Potential Discussion Topics:

  • The promise of audio analysis on YouTube and mobile phones
  • Emerging topics in audio-driven interfaces for multimedia systems and applications
  • What are the most synergistic topics and techniques between audio and vision?
  • How could audio research be presented more usefully to ACM Multimedia?
  • How could ACM Multimedia attract more audio researchers?
  • What are the killer apps of audio multimedia?

Looking for Emotional and Social signals in Multimedia: Where art thou?

4th November, 2pm-3:30pm Palm 2

The panel discussion will focus around the following questions:

Where and What? Where are emotional and social signals in multimedia? What are they? Do the two areas, namely Emotional and Social Signals in MM and Social Media and Presence, cover all the different types of emotional and social signals in MM?

Context or Content? What if the meaning of the content can be better obtained from the context surrounding the content? Does introducing emotional and social dimensions help? Couldn’t we simply use social media to solve the content analysis problem? So for images, for example, rather than exploiting the GPS, time stamp, bluetooth information, should we be exploiting socially or emotionally relevant context such as, who you were with when you took the photo, etc.

The Scale Issue: At what scale are we formulating and providing solutions to the problem of emotional and social behaviour analysis? At the Individual, small groups, or large-scale? Does the treatment of emotional and social signals at all scales essentially address the same research problems? Or are the expertise and techniques fundamentally different? Are the expected findings and underlying theory different, why?

Closing the Gap: Has anything changed since 2006? Could or should MM Systems be designed to incorporate emotional and social signals? Where are the gaps? What are the problems that the community should be addressing? What topics should we be recommending our PhD students to focus on?

The future: If you were starting a PhD in this area, what would you choose as (one or more) thesis topic?




