ACM Multimedia 2015 Tutorials will address the state-of-the-art research and developments regarding all aspects of multimedia, and will be of interest to the entire multimedia community, from novices in the world of multimedia to the most seasoned researchers, from people working in academia to industry professionals.
All tutorials will be run on 26 October, 2015.
The following tutorials have been confirmed.
- VM Hub: Building Cloud Service and Mobile Application for Image/Video/Multimedia Services
- Interactive Video Search
- Learning Knowledge Bases for Multimedia in 2015
- An Introduction to Arts and Digital Culture inside Multimedia
- Human-centric Images and Videos Analysis
- User-centric Cross-OSN Multimedia Computing
- Image Tag Assignment, Refinement and Retrieval
- Emotional and Social Signals for Multimedia Research
This is an advanced tutorial intended for professionals, researchers and students who are interested to build: 1) image/video/multimedia recognition services that are hosted in the cloud, and 2) cross platform app (on iOS, android, and windows phone) that consumes their own image/video/multimedia recognition services or services provided by other parties. The class will leverage VHub, a largely open sourced image/video/multimedia hub hosted in Azure. We expect that the audience to have basic knowledge of multimedia programming.
Speaker: Jin Li
Dr. Jin Li is a Partner Research Manager of the Cloud Computing and Storage (CCS) group in Microsoft Research. He leads a small yet high performance group of researchers engaged research in an end-to-end approach, and believes that the ultimate milestone of cool research is a product of significant impact. He and his group has architected (and in many cases written the code for) the solution they have shipped in Microsoft. His work on Local Reconstruction Code (LRC) in Windows Azure Storage has led to hundreds of millions of dollars of savings for Microsoft, a Best Paper Award at USENIX ATC 2012 and a 2013 Microsoft Technical Community Network Storage Technical Achievement Award. LRC is also shipped in Windows Storage Space in Windows 8 and Windows Server 2012 R2. His group has architected and implemented the Primary Data Deduplication feature in Windows Server 2012 and End-to-End Deduplication for Storage Virtualization in Windows Server 2012 R2, which is among top 3 features for Windows File Server introduced at Windows Server 2012, received rave reviews from press, with evidence that some customers upgrading to Windows Server 2012 for the primary data deduplication feature only. His group has assisted to architect and implement the RemoteFX for WAN feature in Windows 8 and Windows Server 2012, which provides fast and fluid user experience in a remote session running over any WAN and wireless networks. Dr. Li was the recipient of Young Investigator Award from Visual Communication and Image Processing’98 (VCIP) in 1998, and the ICME 2009 Best Paper Award. He was the General Chair of PV2009, the lead Program Chair of ICME 2011. He currently serves as ICME steering committee chair and a TPC Co-Chair of ACM Multimedia 2016. He is an IEEE Fellow.
With an increasing amount of video data in our daily life, the need for content-based search in videos increases as well. Though a lot of research has been spent on video retrieval tools and methods, which allow for automatic search in videos through content-based queries, still the performance of automatic video retrieval is far from optimal. One problem of automatic video retrieval is the fact that the actual search process is performed by the video retrieval engine, which is a black box for the user. More importantly, there are situations where the common query-and-browse-results approach cannot be employed, for example when users are not able to formulate their search needs through a query – or when they simply want to browse the content without any concrete query in mind. Interactive video search tools provide a more flexible way of content-based search in videos. They provide various content interaction features and give full control of the search process to the user, who will know best which features to use and how, in order to solve a search problem. In this tutorial we will discuss (i) proposed solutions for improved video content navigation, (ii) typical interaction of content-based querying features, and (iii) advanced video content visualization methods. Moreover, we will discuss and demonstrate interactive video search systems and ways to evaluate their performance.
Speakers: Klaus Schoeffmann, Frank Hopfgartner
Klaus Schoeffmann is Associate Professor at the Institute of Information Technology (ITEC) at Klagenfurt University, Austria, where he also received his Ph.D. degree in Computer Science. His current research focuses on Human-Computer-Interaction with multimedia data (e.g., image and video browsing), mobile multimedia, and video processing. He has co-authored more than 60 publications on various topic in multimedia and he has co-organized international conferences, special sessions and workshops (e.g., MMM2012, CBMI 2013, VisHMC 2014, MMC 2014, and MMC 2015). He is organizer of the Video Browser Showdown evaluation competition (VBS). Further, he is an editorial board member of the Springer International Journal on Multimedia Tools and Applications (MTAP) and a steering committee member of the International Conference on MultiMedia Modelling (MMM). Additionally, he is member of the IEEE and the ACM and a regular reviewer for international conferences and journals in the eld of Multimedia. Prof. Schoeffmann teaches various courses in computer science (including interactive multimedia applications, media technology, multimedia systems, operating systems, distributed systems).
Frank Hopfgartner is Lecturer in Information Studies at University of Glasgow. He received a PhD in Computing Science from the same university with a thesis on multimedia information retrieval. His research to date can be placed in the intersection of interactive systems and multimedia content access. He (co-)authored over 100 publications in above mentioned research elds, including a book on smart information systems, various book chapters and papers in peer-reviewed journals, conferences and workshops. Frank co-organizes LifeLog, a shared task at NTCIR-12 on different methods of retrieval and access of multimedia lifelogging data. Besides, he has held various roles in the organization of multimedia conferences (MMM’17, MeasuringBehaviour’16, MMM’14, ICMR’14, MMM’12) and has co-organized workshops, sessions and tutorials at major venues such as SIGIR, RecSys, ICME, Ubicomp, ECIR, Hypertext, iConference and IIiX. Moreover, he was involved in the organization of a summer school on multimedia semantics (SSMS’07). He is a regular reviewer of various renowned journals and has been PC member of international conferences (e.g., SIGIR, MM, ESWC, WWW, RecSys) and workshops.
Knowledge acquisition, representation, and reasoning has been one of the long-standing challenges in artificial intelligence and related application areas. Only in the past few years, massive amounts of structured and semi-structured data that directly or indirectly encode human knowledge became widely available, turning the knowledge representation problems into a computational grand challenge with feasible solutions in sight. The research and development on knowledge bases is becoming a lively fusion area among web information extraction, machine learning, databases and information retrieval, with knowledge over images and multimedia emerging as another new frontier of representation and acquisition. This tutorial aims to present a gentle overview of knowledge bases on text and multimedia, including representation, acquisition, and inference. In particular, the 2015 edition of the tutorial will include recent progress from several active research communities: web, natural language processing, and computer vision and multimedia.
Speakers: Lexing Xie, Haixun Wang
Lexing Xie is Senior Lecturer of Computer Science at the Australian National University. She was a research staff member at IBM T.J. Watson Research Center in New York from 2005 to 2010, and adjunct assistant professor at Columbia University 2007-2009. She received B.S. from Tsinghua University, China, and M.S. and Ph.D. degrees from Columbia University, all in Electrical Engineering. Her research interests are in applied machine learning, multimedia, social media. Lexing’s research has received six best student paper and best paper awards between 2002 and 2015, and a Grand Challenge Multimodal Prize at ACM Multimedia 2012. Her service roles include associate editorship for both the IEEE and ACM Transactions on Multimedia, and the program and organizing committees of major multimedia, machine learning, web and social media conferences.
Haixun Wang is a research scientist at Facebook. Before joining Facebook, he was a research scientist at Google Research; senior researcher at Microsoft Research Asia in Beijing, China, where he manages the group of Data Management, Analytics, and Services; he had also been a research staff member at IBM T. J. Watson Research Center for 9 years. Haixun Wang has published more than 120 research papers in referred international journals and conference proceedings. He is on the editorial board of Distributed and Parallel Databases (DAPD), IEEE Transactions of Knowledge and Data Engineering (TKDE), Knowledge and Information System (KAIS), Journal of Computer Science and Technology (JCST). He is PC co-Chair of WWW 2013 (P&E), ICDE 2013 (Industry), CIKM 2012, ICMLA 2011, WAIM 2011. Haixun Wang got the ICDM 10-Year Highest Impact Paper Award in 2014, ER 2008 best paper award (DKE 25 year award), ICDM 2009 Best Student Paper run-up award, and ICDE 2015 Best Paper Award.
The Arts and Digital Culture program has offered a high quality forum for the presentation of interactive and arts-based multimedia applications at the annual ACM Multimedia conference for over a decade. This tutorial will explore the evolution of this program as a guide to new authors considering future participation in this program. By surveying both past technical and past exhibited contributions, this tutorial will offer guidance to artists, researchers and practitioners on success at this multifaceted, interdisciplinary forum at ACM Multimedia.
Speakers: David A. Shamma, Daragh Byrne
David A. Shamma (Yahoo! Labs, USA) is a senior research scientist and head of the HCI Research group at Yahoo! Labs and Flickr. His personal research investigates synchronous environments and connected experiences both online and in-the-world. Focusing on creative expression and sharing frameworks, he designs and prototypes systems for multimedia-mediated communication, as well as, develops targeted methods and metrics for understanding how people communicate online in small environments and at web scale. Ayman is the creator and lead investigator on the Yahoo! Zync project, is the scientific liaison to Flickr, and is on the iSchool at Berkeley’s Data Science Advisory board. Additionally, Ayman serves on the ACM MM Steering Committee, the ACM TVx Steering Committee, and is a co-editor for Arts & Digital Culture for SIGMM. He recently was a Visiting Senior Research Fellow at the National University of Singapore’s CUTE Center in the Interactive Digital Media Institute. In the past he has worked at the Medill School of Journalism and NASA Ames Research Center. He has a Ph.D. in Computer Science from Northwestern University and a M.S./B.S. in Computer Science from the University of West Florida.
Daragh Byrne is Intel Special Faculty for Physical Computing, Responsive Environments and Emerging Media within the IDeATe Network and at School of Architecture at Carnegie Mellon University, where he explores the design of experiential media systems through process-oriented methods. Both at CMU and in his previous role as an Assistant Research Professor at Arizona State University’s School of Arts, Media and Engineering, he manages the NSF Funded XSEAD project. He also leads the recently launched the MakeSchools.org effort to catalog Making in higher education. He defended his PhD at Dublin City University in August 2011, holds a M.Res. degree in Design and Evaluation of Advanced interactive Systems from Lancaster University and a BSc. in Computer Applications from DCU. During his research career, he has published over 40 scientific papers and his doctoral work represents a first of its kind exploration where long-term multimodal lifelog collections were established to explore the creation of personal digital stories. This research interest continues with a current focus on process-oriented design research into experience capture, participatory documentation, and in particular, digital curation.
This tutorial reviews recent progresses in human-centric images and videos analysis: 1) fashion analysis: parsing, attribute prediction and retrieval; 2) action analysis: discriminative feature selection, pooling and fusion; 3) person verification: cross-domain person verification via learning a generalized similarity measure, and bit scalable deep hashing with regularized similarity learning.
Speakers: Si Liu, Liang Lin, Bingbing Ni
Dr. Si Liu is now an Associate Professor in Institute of Information Engineering, Chinese Academy of Sciences. She used to be a Research Fellow at the Department of Electrical and Computer Engineering, National University of Singapore (NUS).She obtained PhD degree from Institute of Automation, Chinese Academy of Sciences (CASIA) in 2012. She obtained Bachelor degree from Experimental Class of Beijing Institute of Technology (BIT).Her current research interests include attribute prediction, object detection andimage parsing. She is also interested in the applications, such as makeup and clothes recommendation, online product retrieval. She received the Best Paper Awards from ACM MM’13, Best Demo Awards from ACM MM’12.
Liang Lin is a Professor with the School of AdvancedComputing, Sun Yat-Sen University (SYSU),China. He received the B.S. and Ph.D. degrees from the Beijing Institute of Technology (BIT), Beijing,China, in 1999 and 2008, respectively. From 2006to 2007, he was a joint Ph.D. student with theDepartment of Statistics, University of California,Los Angeles (UCLA). His Ph.D. dissertation wasachieved the China National Excellent Ph.D. ThesisAward Nomination in 2010. He was a Post-DoctoralResearch Fellow with the Center for Vision, Cognition,Learning, and Art of UCLA. His research focuses on new models,algorithms and systems for intelligent processing and understanding of visualdata such as images and videos. He has published more than 70 papers intop tier academic journals and conferences. He was supported by severalpromotive programs or funds for his works, such as “Program for New CenturyExcellent Talents” of Ministry of Education (China) in 2012, and GuangdongNSFs for Distinguished Young Scholars in 2013. He received the Best PaperRunners-Up Award in ACM NPAR 2010, Google Faculty Award in 2012,and Best Student Paper Award in IEEE ICME 2014. He has served as anAssociate Editor for Neurocomputing and The Visual Computer.
Dr. Bingbing Ni received his B.Eng. degree in Electrical Engineering from Shanghai Jiao Tong University (SJTU), China in 2005 and obtained his Ph.D. from National University of Singapore (NUS), Singapore in 2011. Dr. Ni is currently a research scientist in Advanced Digital Sciences Center, Singapore. His research interests are in the areas of computer vision, machine learning and multimedia. Dr. Ni worked in Microsoft Research Asia, Beijing as a research intern in 2009. He also worked as a software engineer intern in Google Inc., Mountain View, CA in 2010. He received the Best Paper Award from PCM’11 and the Best Student Paper Award from PREMIA’08. He won the first prize in International Contest on Human Activity Recognition and Localization (HARL) in conjunction with International Conference on Pattern Recognition 2012, and the second prize in ChaLearn Action Recognition Challenge in conjunction with European Conference on Computer Vision (ECCV) 2014, respectively.
The explosion of social media has led to various Online Social Networking (OSN) services. Today’s typical netizens are using a multitude of OSN services. Exploring the user-contributed cross-OSN heterogeneous data is critical to connect between the separated data island and facilitate value mining from big social multimedia. From the perspective of content analysis, understanding the association among heterogeneous cross-OSN data is fundamental to advanced social media analysis and applications. From the perspective of user modeling, exploiting the available user data on different OSNs contributes to an integrated online user profile and thus improved customized social media services. This tutorial will introduce several pilot works on two basic tasks on cross-OSN multimedia computing: (1) From users: cross-OSN knowledge association mining and (2) For users: cross-OSN user modeling and collaborative applications.
Speaker: Jitao Sang
Jitao Sang is assistant professor in National Laboratory of Pattern Recognition at Institute of Automation, Chinese Academy of Sciences (CAS). He graduated his PhD from CAS with the highest honor, the special prize of CAS president scholarship. His research interest is in social multimedia computing, where the recent research on user-centric social multimedia computing has attracted increasing attentions, with award-winning publications in the prestigious multimedia conferences (best paper finalist in MM2012 & MM2013, best student paper in MMM2013, best student paper in ICMR2015). So far, he has authored one book, filed three patents, co-authored more than 40 peer-referenced papers in multimedia-related journals and conferences. He is program co-chair in PCM 2015, ICIMCS 2015, publicity chair in MMM 2015, publication chair in ICIMCS 2013, 2014, special session organizer in ICME2015, MMM2013, ICIMCS 2013, and program committee member in many conferences (MM2013, MM2014, CIKM2014, etc.). He is associate editor in Neurocomputing, guest editor in MMSJ and MTA. He is tutorial speaker at MM 2014, MMM 2015, ICME 2015 and ICMR 2015.
This tutorial focuses on challenges and solutions for content-based image annotation and retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement, and tag-based image retrieval. We introduce a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. Moreover, we present an open-source testbed, with training sets of varying sizes and three test datasets, to evaluate methods of varied learning complexity. A selected set of eleven representative works have been implemented and evaluated. During the tutorial we provide a practice session for hands on experience with the methods, software and datasets. For repeatable experiments all data and code are online at http://www.micc.unifi.it/tagsurvey
Speakers: Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G.M. Snoek, Alberto Del Bimbo
Xirong Li is currently an assistant professor at the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. He received Bachelor (2005) and Master (2007) degrees from Tsinghua University, and the PhD degree from University of Amsterdam (2012), all in computer science. His research focuses on multimedia retrieval. He has been awarded the ACM SIGMM Best PhD Thesis Award 2013, the IEEE Transactions on Multimedia Prize Paper Award 2012, the Best Paper Award of the ACM CIVR 2010, and PCM 2014 Outstanding Reviewer Award. He has served as publicity co-chair for ACM ICMR 2013 and publication co-chair for ACM ICMR 2015.
Tiberio Uricchio is currently a Ph.D. candidate in computer science at the Media Integration and Communication Centre (MICC), University of Florence, Italy. He received his B.S. and M.S. degrees both in computer engineering from the University of Florence, Italy in 2009 and 2012, respectively. His research interests include image and video understanding, social media analysis and machine learning.
Lamberto Ballan is currently a postdoctoral researcher at Stanford University, supported by a prestigious Marie Curie Fellowship from the European Commission. He received the Laurea degree in computer engineering in 2006 and the PhD degree in computer science in 2011, both from the University of Florence, Italy. He was also a visiting scholar at the Signal and Image Processing department at Telecom Paristech, in 2010. His research interests lie at the intersection of multimedia and computer vision, particularly in the areas of image/video understanding and social media analysis. His work was conducted in the context of several EU and national projects, and his results have led to more than 30 publications in international journals and conferences, mainly in multimedia and image analysis. He has been awarded the best paper award by the ACM-SIGMM Workshop on Social Media in 2010. He was also the lead organizer of the Web-scale Vision and Social Media Workshops at ECCV 2012 and CVPR 2014.
Marco Bertini is currently assistant Professor at the University of Florence, Italy. He is working at the Media Integration and Communication Center of the University of Florence. His interests are focused on image and video analysis, addressing semantic annotation, retrieval and transcoding. He is author of 20 journal papers and more than 90 peer-reviewed conference papers. He has been involved in 9 EU research projects as WP coordinator and researcher. Dr. Bertini is member of the editorial board of IEEE Transactions on Multimedia, and has been awarded the best paper award by the ACM-SIGMM Workshop on Social Media in 2010. He is co-organizer of the Web-scale Vision and Social Media Workshops at ECCV 2012 and CVPR 2014.
Cees G.M. Snoek is currently an Associate Professor in the Intelligent Systems Lab at the University of Amsterdam and a Principal Engineer at Qualcomm Research Netherlands. He was previously at Carnegie Mellon University, USA, UC Berkeley, and head of R&D at University spin-off Euvision Technologies (acquired by Qualcomm). His research interests focus on video and image retrieval. Dr. Snoek is the lead researcher of the award-winning MediaMill Semantic Video Search Engine, which is the most consistent top performer in the yearly NIST TRECVID evaluations. Dr. Snoek is a senior member of IEEE and ACM and member of the editorial boards for IEEE MultiMedia and IEEE Transactions on Multimedia. Cees is recipient of an NWO Veni award, a Fulbright Junior Scholarship, an NWO Vidi award, and the Netherlands Prize for ICT Research. Several of his Ph.D. students and Post-docs have won awards, including the IEEE Transactions on Multimedia Prize Paper Award, the SIGMM Best Ph.D. Thesis Award, and Best Paper Award of ACM Multimedia. He is general co-chair of ACM Multimedia 2016.
Alberto Del Bimbo is full professor at the University of Florence, Italy, where he is the Director of MICC – Media Integration and Communication Center, leading a research team on cutting-edge solutions in the fields of computer vision, multimedia content analysis, indexing and retrieval, and multimedia and multimodal interactivity. He is the author of more than 300 research papers that appeared in the most pretigious scientific journals and conference proceedings. He is a Founding Member of the ACM EuroMM, the European Chapter of ACM SIGMM, a Member of the ACM Steering Committee of ACM Int’l Conf. on Multimedia and ACM Int’l Conf. on Multimedia Retrieval and served as Associate Editor of some of the most important journals in the field, among which Pattern Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence and IEEE Trans. on Multimedia. He was the General Chair of ECCV’12, the European Conf. on Computer Vision, ACM ICMR’11, the Int’l Conf. on Multimedia Retrieval, ACM MM’10, the Int’l Conf. on Multimedia, ACM MIR’08, the Int’l Conf. on Multimedia Information Retrieval, IEEE ISM’08, the Int’l Symposium on Multimedia, and IEEE ICMCS’99, the Int’l Conf. on Multimedia Computing & Systems.
A challenge for human-centred multimedia is the analysis of human communicative behaviour in multimedia content when considering especially the spontaneous non-verbal signals that are generated by humans when interacting with each other. These signals require a different approach to multimedia computing where the methods developed need findings from other disciplines such as social and behavioural psychology, affective computing and social signal processing. This tutorial aims to address the gaps in understanding between these disciplines, providing core knowledge of each domain and to disseminate basic foundational concepts in emotional and social signal research in a very practical and interactive manner.
The tutorial content can be downloaded from https://essmmblog.wordpress.com/tutorial/
Speakers: Hayley Hung, Hatice Gunes
Hayley Hung is an Assistant Professor and Delft Technology Fellow in the Pattern Recognition and Bioinformatics group at TU Delft, The Netherlands, since 2013. Between 2010-2013, she held a Marie Curie Intra-European Fellowship at the Intelligent Systems Lab at the University of Amsterdam. Between 2007-2010, she was a post-doctoral researcher at Idiap Research Institute in Switzerland. She obtained her PhD in Computer Vision from Queen Mary University of London, UK in 2007 and her first degree from Imperial College, UK in Electrical and Electronic Engineering. Her research interests are in social computing, social signal processing, machine learning, and ubiquitous computing. She is local arrangements chair for ACM MM 2016, Workshop co-chair ACM ICMI 2015, area chair of the area on emotional and social signals at ACM MM (2014-2015), co-panel organiser for the panel on Emotional and Signals in Multimedia (ACM MM 2014), Doctoral Symposium co-chair ACM MM (2013). She has organized workshops on human behavior understanding (InterHUB ( AmI 2011), Measuring Behaviour in open spaces (MB 2012), HBU (ACM MM 2013). She is also a special issue guest editor for ACM Transactions on Interactive Intelligent Systems. She has received first prize in the IET Written Premium competition 2009, was nominated for outstanding paper at ICMI 2011, and was named outstanding reviewer at ICME 2014.
Hatice Gunes is a Senior Lecturer (Associate Professor) at Queen Mary University of London, leading the Affective and Human Computing Lab. Her research interests lie in the multidisciplinary areas of affective computing and social signal processing, focusing on automatic analysis of emotional and social behavior and human aesthetic canons, multimodal interaction, computer vision, machine learning, and human-computer and human-robot interactions. She published over 75 technical papers in these areas (Google scholar citations>1700, H-index=20) and was a recipient of awards for Outstanding Paper (IEEE FG’11), Quality Reviewer (IEEE ICME’11) and Best Demo (IEEE ACII’09). She serves as Associate Editor of IEEE Transactions on Affective Computing, on the Management Board of Association for the Advancement of Affective Computing, and the Steering Committee of IEEE Transactions on Affective Computing. She has also served as a Guest Editor of Special Issues in Int’l J. of Synthetic Emotions, Image and Vision Computing, and ACM Transactions on Interactive Intelligent Systems, and member of the Editorial Advisory Board for the Affective Computing and Interaction Book (IGI Global, 2011), cofounder and main organizer of the EmoSPACEWorkshops at IEEE FG’15, FG’13 and FG’11, workshop chair of MAPTRAITS’14, HBU’13 and AC4MobHCI’12, and area chair for ACM Multimedia’ 15, ACM Multimedia’14, IEEE ICME’13, ACM ICMI’13 and ACII’13. She has been involved as PI and Co-I in several projects funded by the Engineering and Physical Sciences Research Council UK (EPSRC) and the British Council.