ACM Multimedia 2014 Tutorials address the state-of-the-art research and developments regarding all aspects of multimedia, and will be of interest to the entire multimedia community, from novices in the world of multimedia to the most seasoned researchers, from people working in academia to industry professionals.

Immersive 3D Communication

The last few decades have witnessed tremendous advances in telecommunication, with the invention of technologies such as radio, telephone, voice-over-IP, and video conferencing. While all these communication tools are useful and valuable, the ultimate goal of telecommunication is to enable fully immersive remote interaction in a way that simulates or even surpasses the face-to-face experience. Immersive 3D communication technologies are developed aiming at that goal. The objective of this tutorial is to present an overview of the recent advances in immersive 3D communication. Topics include the basics of human 3D perception, new systems and algorithms in real-time 3D scene capture and reconstruction, 3D data compression and dissemination, 3D displays, etc. We intend to provide insights into the latest immersive 3D communication technologies, and highlight some open research challenges for the future.


Wanmin Wu

Wanmin is an Advisory Research Scientist at Ricoh Innovations Corp., at Menlo Park, Silicon Valley. She received her Ph.D. from University of Illinois at Urbana-Champaign. Her research lies between the intersection of 3D multimedia systems and human-centric computing. She is the recipient of several research and industry awards, among them ACM Computing Reviews' Best Articles of 2012, SIGMM Best PhD Thesis Award 2012, ACM Multimedia Best Student Paper Award 2011, IEEE ICME Best Paper Award Nominee 2009, IBM Watson Emerging Leaders in Multimedia 2008, and Yahoo Key Technical Challenge Award 2007. She is on the editorial board of the ACM Transactions on Multimedia Computing, Communications and Applications journal. She also served as an area chair for ACM Multimedia 2013, and a reviewer/TPC member for a number of conferences and journals.

Cha Zhang

Cha Zhang is a Senior Researcher in the Multimedia, Interaction and Communication Group at Microsoft Research. He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. His current research focuses on applying various audio/image/video processing and machine learning techniques to multimedia applications, in particular, multimedia teleconferencing. Dr. Zhang has published more than 80 technical papers and holds 20+ U.S. patents. He won the best paper award at ICME 2007, the top 10% award at MMSP 2009, and the best student paper award at ICME 2010.

Dr. Zhang is a Senior Member of IEEE. He was the Program Co-Chair for the first Immersive Telecommunication Conference (IMMERSCOM) in 2007, and the Program Co-Chair for VCIP 2012. He currently serves as an Associate Editor for IEEE Trans. on Circuits and Systems for Video Technology, and IEEE Trans. on Multimedia.

Social Multimedia Computing

The emergence of social multimedia has brought challenges as well as opportunities to computing. On one hand, most social multimedia services are user-oriented, making it important to understand user demands from their interactions with the multimedia content. On the other hand, while multimedia content analysis still remains open, the participatory property of social multimedia offers a new solution perspective. Social multimedia computing, a multi-disciplinary research and application field, has been developed to understand social multimedia content and connect the social multimedia content with users by exploiting the various social interactions. The potential applications range from information service, communication, entertainment, to healthcare, security, etc.

Thanks to the wide prevalence of social multimedia data and the increasing demands for social multimedia services, there has been a growing number of research on social multimedia computing, evidenced by the volume of papers produced, and many related tracks and special issues in prestigious multimedia conferences and journals. This tutorial reviews recent progresses in social multimedia computing from two perspectives: social-sensed multimedia computing (3 hours) and user-centric social multimedia computing (3 hours).


Peng Cui

Peng Cui is now an Assistant Professor in Tsinghua University, China. He received his PhD degree from Tsinghua University in 2010. He is an active researcher dedicated to novel algorithms and systems in social multimedia computing, and he is keen to promote the convergence of social media data mining and multimedia computing technologies. Dr. Cui has strong backgrounds in both data mining and multimedia communities. He has published more than 30 papers in prestigious conferences and journals in data mining and multimedia, including ACM MM, SIGKDD, SIGIR, AAAI, IEEE TMM, IEEE TKDE, IEEE TIP etc. His recent research won the ACM MM12 Grand Challenge Multimodal Award, and MMM13 Best Paper Award. He is the Area Chair of ACM MM 2014, ICASSP 2013, Associate Editor of Frontier of Computer Science journal, Guest Editor of Information Retrieval journal, and co-organized several special sessions and workshops on social multimedia in ICMR, ICME, ACM MM and WSDM.

Lexing Xie

Lexing Xie is Senior Lecturer in the Research School of Computer Science at the Australian National University. She was a research staff member at IBM T.J. Watson Research Center in New York from 2005 to 2010, and adjunct assistant professor at Columbia University 2007-2009. She received B.S. from Tsinghua University, Beijing, China, and M.S. and Ph.D. degrees from Columbia University, all in Electrical Engineering. Her research interests are in applied machine learning, multimedia, social media. Her recent projects are on multimedia analysis, social media tracking, visual semantics, large-scale image and video search, geo-spatial event prediction and recommendation. Lexing's research has received five best student paper and best paper awards between 2002 and 2011, and a Grand Challenge Multimodal Prize at ACM Multimedia 2012. She was the 2005 IBM Research Josef Raviv Memorial Postdoc fellow in Computer Science and Engineering. She is an associate editor of IEEE Transactions on Multimedia, and ACM Transactions on Multimedia Computing, Communications and Applications (TOMM).

Jitao Sang

Dr. Jitao Sang is assistant professor in National Laboratory of Pattern Recognition at Chinese Academy of Sciences (CAS). He graduated with highest honor for CAS PhD students, the special prize of CAS president scholarship. His research interest is in social multimedia computing, where the recent research in user-centric social multimedia computing has attracted increasing attentions, with award-winning publications in the prestigious conferences (best paper finalist in MM2012 and MM2013, best student paper in MMM2013). He practiced the idea of “From user, On user, For user” in various multimedia researches. He designed a general framework to introduce user factor into multimedia content analysis, by understanding the social multimedia sharing ecosystem from the user-multimedia-tag ternary perspective. Among the first ones to conduct social network analysis in multimedia community, he discovers the advanced social relation and exploit it towards personalized multimedia services. Most recently, his work on user-based cross-network analysis has extended the potentials of social multimedia applications. This has broken the limitations in utilizing separate social media information and brought disruptive models in understanding “multimedia” in the big data era.

Changsheng Xu

Dr. Changsheng Xu is professor in National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences and Executive Director of China-Singapore Institute of Digital Media. His research interests include multimedia content analysis/indexing/retrieval, pattern recognition and computer vision. He holds 30 granted/pending patents and published over 200 refereed research papers in these areas. Dr. Xu is an Associate Editor of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications and Applications and ACM/Springer Multimedia Systems Journal. He received the Best Associate Editor Award of ACM Trans. on Multimedia Computing, Communications and Applications in 2012 and the Best Editorial Member Award of ACM/Springer Multimedia Systems Journal in 2008. He served as Program Chair of ACM Multimedia 2009. He has served as associate editor, guest editor, general chair, program chair, area/track chair, special session organizer, session chair and TPC member for over 20 IEEE and ACM prestigious multimedia journals, conferences and workshops. He is IEEE Fellow and ACM Distinguished Scientist.

Learning Knowledge Bases for Text and Multimedia

Knowledge acquisition, representation, and reasoning have been one of the long-standing challenges in artificial intelligence and related application areas. Only in the past few years, massive amounts of structured and semi-structured data that directly or indirectly encode human knowledge became widely available, turning the knowledge representation problems into a computational grand challenge with feasible solutions in sight. The research and development on knowledge bases is becoming a lively fusion area among web information extraction, machine learning, databases and information retrieval, with knowledge over images and multimedia emerging as another new frontier of representation and acquisition. This tutorial aims to present a gentle overview of knowledge bases on text and multimedia, including representation, acquisition, and inference. The content of this tutorial is intended for surveying the field, as well as for educating practitioners and aspiring researchers.


Haixun Wang

Haixun Wang is a research scientist at Google Research. Before joining Google, he was a senior researcher at Microsoft Research Asia in Beijing, China, where he manages the group of Data Management, Analytics, and Services. He had also been a research staff member at IBM T. J. Watson Research Center for 9 years. Haixun Wang has published more than 120 research papers in referred international journals and conference proceedings. He is on the editorial board of Distributed and Parallel Databases (DAPD), IEEE Transactions of Knowledge and Data Engineering (TKDE), Knowledge and Information System (KAIS), Journal of Computer Science and Technology (JCST). He is PC co-Chair of WWW 2013 (P&E), ICDE 2013 (Industry), CIKM 2012, ICMLA 2011, WAIM 2011. Haixun Wang got the ICDM 10-Year Highest Impact Paper Award in 2014, ER 2008 best paper award (DKE 25 year award), and ICDM 2009 Best Student Paper run-up award.

Emerging Topics on Personalized and Localized Multimedia Information Systems

We are experiencing an era with a rapid increase of data relevant to different aspects of users’ daily life. On the one hand, such data contains personal information of each individual user. On the other hand, it also reflects user behaviors related to the society as data of more users is aggregated. These data could not only be very beneficial for studying various lifestyle patterns, but also be used to generate more descriptive and explanatory analysis across the landscape of diverse multimedia data. Using personal mobile devices and web services to systematically explore interesting aspects of people world has attracted much attention recently. This is a full-day tutorial that addresses emerging topics on personalized and localized multimedia technologies and applications and emphasizes knowledge sensing and discovery in multimedia landscape. This tutorial aims to deliver an overall introduction to multimedia landscapes with multimedia processing, contextual data acquisition, people activity logs, data analytics, geographic-aware multimedia sharing and delivery, and serves as an important lecture on fundamental and advanced research areas of personalized and localized multimedia information systems.


Yi Yu

Yi Yu currently works at School of Computing in National University of Singapore. Her research covers a diverse set of topics including large-scale multimedia information processing, social media analysis, and physical analytics. She was in the top winners (out of 29 teams) awarded by ACM SIGSPATIAL GIS Cup 2013, received a best paper award from IEEE ISM 2012. She is a co-chair of 1st International Workshop on Internet-Scale Multimedia Management co-located with ACM Multimedia 2014. Yu received a Ph.D. in Information and Computer Science from Nara Women’s University, Japan.

Kiyoharu Aizawa

Kiyoharu Aizawa received the B.E., the M.E., and the Dr.Eng. degrees in Electrical Engineering all from the University of Tokyo, in 1983, 1985, 1988, respectively. He is currently a Professor at the Department of Information and Communication Engineering of the University of Tokyo. He was a Visiting Assistant Professor at University of Illinois from 1990 to 1992. His research interest is in image processing and multimedia applications. He is one of the pioneers of life logging fields, and very recently he has been focusing on multimedia food logging for dietary assessment. The outcome of his research, FoodLog, is made available to the general public. He received the 1987 Young Engineer Award and the 1990, 1998 Best Paper Awards, the 1991 Achievement Award, 1999 Electronics Society Award from IEICE Japan, and the 1998 Fujio Frontier Award, the 2002 and 2009 Best Paper Award, and 2013 Achievement award from ITE Japan. He received the IBM Japan Science Prize in 2002. He is currently on the Editorial Board of ACM TOMM and Journal of Visual Communications, Image Processing., APSIPA Transactions on Signal and Information Processing, and International Journal of Multimedia Information Retrieval. He served as the Editor in Chief of Journal of ITE Japan, an Associate Editor of IEEE Trans. Image Processing, IEEE Trans. CSVT and IEEE Trans. Multimedia. He has served a number of international and domestic conferences; he was a General co-Chair of ACM Multimedia 2012 and IEEE VCIP2012.

Toshihiko Yamasaki

Toshihiko Yamasaki received the B.S. degree in electronic engineering, the M.S. degree in information and communication engineering, and the Ph.D. degree from The University of Tokyo in 1999, 2001, and 2004, respectively. He is currently an Associate Professor at Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo. He was a visiting scientist at Cornell University from 2011 to 2013. His current research interests include 3D video processing, object recognition, medical image analysis, and so on. Dr. Yamasaki is a member of IEICE, IEEE, ACM, and so on.

Roger Zimmermann

Roger Zimmermann received the MS and PhD degrees from the University of Southern California (USC) in 1994 and 1998. He is currently an associate professor in the Department of Computer Science at the National University of Singapore (NUS). He is also a deputy director with the Interactive and Digital Media Institute (IDMI) at NUS and a co-director of the Centre of Social Media Innovations for Communities (COSMIC). His research interests are in the areas of streaming media architectures, distributed and peer-to-peer systems, mobile and go-referenced video management, collaborative environments, spatio-temporal information management, and mobile location-based services. He has co-authored a book, six patents, and more than 150 conference publications, journal articles, and book chapters. He is a senior member of the IEEE and a member of ACM.

An Introduction to Arts and Digital Culture inside Multimedia

The Arts and Digital Culture program has offered a high quality forum for the presentation of interactive and arts-based multimedia applications at the annual ACM Multimedia conference for over a decade. This tutorial will explore the evolution of this program as a guide to new authors considering future participation in this program. By surveying both past technical and past exhibited contributions, this tutorial will offer guidance to artists, researchers and practitioners on success at this multifaceted, interdisciplinary forum at ACM Multimedia.


David A. Shamma

David A. Shamma (Yahoo! Labs, USA) is a senior research scientist and head of the HCI Research group at Yahoo! Labs. His personal research investigates synchronous environments and connected experiences both online and in-the-world. Focusing on creative expression and sharing frameworks, he designs and prototypes systems for multimedia-mediated communication, as well as, develops targeted methods and metrics for understanding how people communicate online in small environments and at web scale. Ayman is the creator and lead investigator on the Yahoo! Zync project, is the scientific liaison to Flickr, and is on the iSchool at Berkeley’s Data Science Advisory board. Additionally, Ayman serves on the ACM MM Steering Committee, the ACM TVx Steering Committee, and is a co-editor for Arts & Digital Culture for SIGMM. He recently was a Visiting Senior Research Fellow at the National University of Singapore’s CUTE Center in the Interactive Digital Media Institute. In the past, he has worked at the Medill School of Journalism and NASA Ames Research Center. He has a Ph.D. in Computer Science from Northwestern University and a M.S./B.S. in Computer Science from the University of West Florida.

Daragh Byrne

Daragh Byrne is the Intel Integrative Design Fellow at the School of Design, Carnegie Mellon University, as well as a Research Scientist in the Visible Process Lab, where he explores the design of experiential media system through process-oriented methods. Both at CMU and in his previous role as an Assistant Research Professor at Arizona State University’s School of Arts, Media, and Engineering, he manages the NSF Funded XSEAD project which seeks to support interdisciplinary collaboration by bridging arts and design perspectives with science and engineering to foster innovation and advance outcomes. He defended his PhD at Dublin City University in August 2011, holds a M.Res. degree in Design and Evaluation of Advanced interactive Systems from Lancaster University and a BSc. in Computer Applications from DCU. During his research career, he has published over 40 papers and has been engaged with the life-logging community to explore the capture and representation of personal experience through digital means. His doctoral work represents a first of its kind exploration where long-term multimodal life-log collections were established to explore the creation of personal digital stories. This research interest continues with a current focus on experience capture, participatory documentation, and community curation.

Video Hyperlinking

Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video hyperlinking approaches and in relevant enabling technologies, such as video analysis and multimedia indexing and retrieval. Several alternative strategies, based on text, visual and/or audio information are introduced, evaluated and discussed, providing the audience with details on what works and what doesn’t on real broadcast material.


Vasileios Mezaris

Dr. Vasileios Mezaris is a Senior Researcher (Researcher B) with the Information Technologies Institute / Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece. He received his bachelor’s and Ph.D. in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, Thessaloniki, Greece, in 2001 and 2005, respectively. His research interests include image and video analysis, event detection in multimedia, machine learning for multimedia analysis, content-based and semantic image and video retrieval, application of image and video analysis technologies in specific domains (medical images, ecological data). He is the co-author of 27 papers in refereed international journals, 12 book chapters, two patents and more than 100 papers in international conferences. He has co-organized a number of events in the area of multimedia processing and understanding, including the Winter School on Multimedia Processing and Applications (WMPA’14; Dublin, Ireland) and the 1st Int. Workshop on Social Events in Web Multimedia (SEWM'14) at ACM ICMR (Glasgow, UK) in 2014, and workshops held in conjunction with ACM Multimedia, IEEE ICME, EuroITV and the World Wide Web (WWW) conferences during 2013. He also co-organized the "MediaMixer/VideoLectures.net" Grand Challenge at ACM Multimedia 2013, and is one of the organizers of the annual Social Event Detection (SED) Task at the MediaEval benchmarking initiative since the SED Task’s introduction in 2011 and until today. He has participated in many European and National research projects, at present being the Scientific Responsible of CERTH in the EU FP7 large-scale integrating projects “LinkedTV: Television Linked to the Web” and “ForgetIT: Concise Preservation by combining Managed Forgetting and Contextualized Remembering”. He currently serves as an Associate Editor for the IEEE Transactions on Multimedia and as a Guest Editor for special issues in other journals. He is a Senior Member of the IEEE.

Benoit Huet

Dr. Benoit Huet is Associate Professor in the multimedia information processing group of Eurecom (France). He received his BSc degree in computer science and engineering from the École Supérieure de Technologie Électrique (Groupe ESIEE, France) in 1992. In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction, where he then spent two years working as a research and teaching assistant. He received his DPhil degree in Computer Science from the University of York (UK) for his research on the topic of object recognition from large databases. He was awarded the HDR (Habilitation to Direct Research) from the University of Nice Sophia Antipolis, France, in October 2012 on the topic of Multimedia Content Understanding: Bringing Context to Content. He is associate editor for IEEE Multimedia, Multimedia Tools and Application (Springer) and Multimedia Systems (Springer) and has been guest editor for a number of special issues (EURASIP Journal on Image and Video Processing, IEEE Multimedia). He regularly serves on the technical program committee of the top conference of the field (ACM MM/ICMR, IEEE ICME/ICIP). He is chairing the IEEE MMTC Interest Group on Visual Analysis, Interaction and Content Management (VAIG). He is vice-chair of the IAPR Technical Committee 14 Signal Analysis for Machine Intelligence. His research interests include computer vision, large-scale multimedia data mining and indexing (still and/or moving images), content-based retrieval, semantic labelling and annotation of multimedia content, multimodal fusion, and pattern recognition.

Over-the-Top Content Delivery: State of the Art and Challenges Ahead

In this tutorial, we present state of the art and challenges ahead in over-the-top content delivery. In particular, the goal of this tutorial is to provide an overview of adaptive media delivery, specifically in the context of HTTP adaptive streaming (HAS) including the recently ratified MPEG-DASH standard. The main focus of the tutorial will be on the common problems in HAS deployments such as client design, QoE optimization, multi-screen and hybrid delivery scenarios, and synchronization issues. For each problem, we will examine proposed solutions along with their pros and cons. In the last part of the tutorial, we will look into the open issues and review the work-in-progress and future research directions.


Christian Timmerer

Christian Timmerer is an Associate Professor at the Institute of Information Technology (ITEC), Multimedia Communication Group (MMC), Alpen-Adria-Universität Klagenfurt, Austria. His research interests include immersive multimedia communication, streaming, adaptation, and Quality of Experience (QoE). He was the general chair of QoMEX’13, WIAMIS’08, AVSTP2P’10 (co-located with ACMMM’10), WoMAN’11 (co-located with ICME’11), and TPC co-chair of QoMEX’12. He has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, and ICoSOLE. He is an Associate Editor for IEEE Computer Science Computing Now, Area Editor for Elsevier Signal Processing: Image Communication, Review Board Member of IEEE MMTC, editor of ACM SIGMM Records, and member of ACM SIGMM Open Source Software Committee. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and DASH (incl. DASH Industry Forum). He received his PhD in 2006 from the Klagenfurt University. Follow him on http://www.twitter.com/timse7 and subscribe to his blog http://blog.timmerer.com.

Ali C. Begen

Ali C. Begen is with the Video and Content Platforms Research and Advanced Development Group at Cisco. His interests include networked entertainment, Internet multimedia, transport protocols and content delivery. Ali is currently working on architectures and protocols for next-generation video transport and distribution over IP networks. He is an active contributor in the IETF and MPEG, and has given a number of keynotes, tutorials and guest lectures in these areas. Ali holds a Ph.D. degree in electrical and computer engineering from Georgia Tech. He received the Best Student-paper Award at IEEE ICIP 2003, the Most-cited Paper Award from Elsevier Signal Processing: Image Communication in 2008, and the Best-paper Award at Packet Video Workshop 2012. Ali has been an editor for the Consumer Communications and Networking series in the IEEE Communications Magazine since 2011 and an associate editor for the IEEE Transactions on Multimedia since 2013. He served as a general co-chair for ACM Multimedia Systems 2011 and Packet Video Workshop 2013. He is a senior member of the IEEE and a senior member of the ACM. Further information on Ali’s projects, publications, presentations and professional activities can be found at http://ali.begen.net.


