The confirmed tutorials are listed below
|Deep Learning Interpretation||Jitao Sang (Beijing Jiatong University, China)||22/10/2018||AM|
|Interactive Video Search: Where is the User in the Age of Deep Learning?||Klaus Schoeffmann (Klagenfurt University, Austria), Werner Bailer (Joanneum Research, Austria), Jakub Lokoč (Charles University, Prague, Czech Republic), Cathal Gurrin (DCU, Ireland), George Awad (NIST, USA)||22/10/2018||AM|
|Similarity-Based Processing of Motion Capture Data||Pavel Zezula (Masaryk University, Czech), Jan Sedmidubsky (Masaryk University)||22/10/2018||PM|
|The Importance of Medical Multimedia||Michael Riegler (University of Oslo, Norway)，Pål Halvorsen (University of Oslo, Norway), Bernd Münzer (Klagenfurt University, Austria), Klaus Schoeffmann (Klagenfurt University, Austria)||22/10/2018||PM|
|Social and Political Event Analysis based on Rich Media||Jungseock Joo (UCLA), Zachary Steinert-Threlkeld (UCLA), Jiebo Luo (University of Rochester)||26/10/2018||AM|
|From Action Recognition to Complex Event Detection||Ting Yao (MSRA), Jingen Liu (SRI International)||26/10/2018||AM|
|Structured Deep Learning for Pixel-level Understanding||Yunchao Wei (UIUC, USA), Xiaodan Liang (CMU, USA), Si Liu (Beihang University, China), Liang Lin (Sun Yat-sen University, China)||26/10/2018||PM|
|Recognizing Families In the Wild||Joseph P Robinson (Northeastern University), Ming Shao (UMass Dartmouth), Yun Fu (Northeastern University, UMass Dartmouth)||26/10/2018||PM|
Deep Learning Interpretation
Deep Learning Interpretation: from identifying what problem deep learning can address to exploring what problem deep learning can NOT address.
Abstract: Deep learning has been successfully explored in addressing different multimedia topics in recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. The academic researchers have now transferring their attention from identifying what problem deep learning can address to exploring what problem deep learning can NOT address. This tutorial starts with a summarization of six ‘NOT’ problems deep learning fails to solve in the current stage, i.e., low stability, debugging difficulty, poor parameter transparency, poor incrementality, poor reasoning ability, and machine bias. These problems share a common origin from the lack of deep learning interpretation. This tutorial attempts to correspond the six ‘NOT’ problems to three levels of deep learning interpretation: (1) Locating, accurately and efficiently locating which feature contributes much to the output. (2) Understanding, bidirectional semantic accessing between human knowledge and deep learning algorithm. (3) Expandability, well storing, accumulating and reusing the models learned from deep learning. Existing studies falling into these three levels will be reviewed in detail, before a discussion on the future interesting directions in the end.
Speakers: Jitao Sang
Jitao Sang is a full Professor in Beijing Jiaotong University. He graduated his PhD from Chinese Academy of Science (CAS) with the highest honor, the special prize of CAS president scholarship. His research interest is in multimedia data mining and machine learning, with award-winning publications in prestigious multimedia conferences (best paper in PCM 2016, best paper finalist in MM2012 and MM2013, best student paper in MMM2013, best student paper in ICMR2015). He is the recipient of ACM China Rising Star in 2016. So far, he has authored one book, co-authored more than 60 peer-referenced papers in multimedia-related journals and conferences. He is program co-chair in PCM 2015 and ICIMCS 2015, area chair in ACM MM 2018 and ICPR 2018. He is tutorial speaker at ACM MM 2014 & 2015, MMM 2015, ICME 2015 and ICMR 2015.
Interactive Video Search: Where is the User in the Age of Deep Learning?
Interactive Video Search Tools, Evaluation Approaches, Datasets and Task Design, Evaluation Metrics, Lessons Learned from Evaluation Campaigns
Abstract: In this tutorial, we will discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information retrieval. We will cover three different campaigns (TRECVID, Video Browser Showdown, and the Lifelog Search Challenge), discuss their goals and rules, and present their achieved findings over the last half-decade. Moreover, we will talk about data sets, tasks, evaluation procedures, and examples of interactive video search tools, as well as how they evolved over the years. Participants of this tutorial will be able to gain collective insights from all three challenges and use them for focusing their research efforts on outstanding problems that still remain unsolved in this area.
Speakers: Klaus Schoeffmann, Werner Bailer, Jakub Lokoč, Cathal Gurrin, George Awad
Klaus Schoeffmann is an Associate Professor at Klagenfurt University, Austria. He received his PhD in 2009 and his habilitation (venia docendi) in 2015, both in computer science and from Klagenfurt University. His research focuses on video analytics and interactive multimedia systems, particularly in the medical domain. He has co-authored more than 110 publications on various topics in multimedia and co-organized several international conferences, workshops, and special sessions in the field. He is co-founder of the Video Browser Showdown (VBS) and he has given several tutorials at ACM and IEEE conferences in the past few years.
Werner Bailer is a key researcher at DIGITAL – Institute for Information and Communication Technologies at JOANNEUM RESEARCH in Graz, Austria. He received a degree in Media Technology and Design in 2002 for his diploma thesis on motion estimation and segmentation for film/video standards conversion. His research interests include digital film restoration, audiovisual content analysis and retrieval as well as multimedia metadata. He regularly contributes to standardization in MPEG and to EBU working groups, has co-organized Video Browser Showdown since 2012 and contributed to the TRECVID and MediaEval benchmarks.
Jakub Lokoč received the doctoral degree in software systems from the Charles University, Prague, Czech Republic. He is an assistant professor in the Department of Software Engineering at the Charles University, Faculty of Mathematics and Physics. His research interests include metric indexing, multimedia databases, interactive video retrieval, known-item search, deep learning and similarity modeling. He has co-authored more than 60 publications on various topics in multimedia retrieval/indexing and co-organized several international workshops.
Cathal Gurrin is an Associate Professor at the School of Computing at Dublin City University and a principal co-investigator at the Insight Centre for Data Analytics. His research interests are information retrieval, personal data analytics and lifelogging. Lifelogging integrates personal sensing, computer science, cognitive science and data-driven healthcare analytics to realize the next-generation of digital records for the individual. He has co-authored more than 200 publications in the area of multimedia data analytics and he is the founder of the Lifelog Search Challenge, as well as the NTCIR-Lifelog task and ImageCLEF-Lifelog.
George Awad is a senior computer scientist at Dakota Consulting, Inc and the TRECVID project leader at the National Institute of Standards and Technology, USA. He has been supporting the TRECVID project since 2007. He has MSc. in Computer Engineering (2000) from AASTMT (Arab Academy for Science, Technology \& Maritime Transport) and PhD in Computer Science (2007) from Dublin City University. His current main research activities includes evaluating video search engines on different real-world use-case scenarios using real-world data sets.
Similarity-Based Processing of Motion Capture Data
The tutorial presents challenges and state-of-the-art techniques for similarity-based processing of 3D human motion data such as subsequence searching, action recognition or semantic segmentation.
Abstract: Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. The recorded data can be imprecise, voluminous, and the same movement action can be performed by various subjects in a number of alternatives that can vary in speed, timing or a position in space. This requires employing completely different data-processing paradigms in comparison with the traditional domains such as attributes, text or images. The objective of this tutorial is to introduce fundamental principles and technologies designed for similarity comparison, searching, subsequence matching, classification and action detection in the motion capture data. Specifically, we emphasize the importance of similarity needed to express the degree of accordance between pairs of motion sequences and also discuss the machine-learning approaches that can automatically acquire content-descriptive movement features. We explain how the concept of similarity together with the learned features can be employed for searching similar occurrences of interested actions within a long motion sequence. Assuming a user-provided categorization of example motions, we discuss techniques able to recognize types of specific movement actions and detect such kinds of actions within continuous motion sequences. Selected operations will be demonstrated by on-line web applications.
Speakers: Jan Sedmidubsky, Pavel Zezula
Jan Sedmidubsky is a researcher of computer science at Masaryk University (Czech Republic) where he received the Ph.D. degree in 2011 and was awarded the dean’s and rector’s prize for a distinguished dissertation thesis. His research activities are primarily concentrated on developing effective and efficient similarity-based processing techniques, with a special emphasis on the domain of 3D motion capture data. He was a member of the team which was awarded by the IBM Shared University Research (SUR) Award for the “Web-scale Similarity Search in Multimedia Data” project. He participated in several research and application-oriented projects. He is a co-author of more than 30 research publications.
Pavel Zezula is a professor of computer science at Masaryk University (Czech Republic) and head of Data Intensive Systems and Applications (DISA) laboratory. His professional interests primarily concern multimedia content-based retrieval, large-scale similarity search, and big data analysis. He is a co-author of the seminal similarity search structure, the “M-Tree”, and the book “Similarity Search: The Metric Space Approach” by Springer US. His research was supported by numerous national and international grants and his results were recognized by both academia, e.g., Masaryk University rector’s prize, and industry, e.g., IBM prize. He is a co-author of more than 150 research publications with about 6,000 citations. He is a steering committee member of the Similarity Search and Applications (SISAP) series of international conferences.
Social and Political Event Analysis using Rich Media
Systematic computational analysis of real world social and political events from multimodal media data.
Abstract: This tutorial aims to provide a comprehensive overview on the applications of rich social media data for real world social and political event analysis, which is a new emerging topic in multimedia research. We will discuss the recent evolution of social media as venues for social and political interaction and their impacts on the real world events using specific examples. We will introduce large scale datasets drawn from social media sources and review concrete research projects that build on computer vision and deep learning based methods. Existing researches in social media have examined various patterns of information diffusion and contagion, user activities and networking, and social media-based predictions of real world events. Most existing works, however, rely on non-content or text based features and do not fully leverage rich multiple modalities — visuals and acoustics — which are prevalent in most online social media. Such approaches underutilize vibrant and integrated characteristics of social media especially because the current audiences are getting more attracted to visual information centric media. This proposal highlights the impacts of rich multimodal data to the real world events and elaborates on relevant recent research projects — the concrete development, data governance, technical details, and their implications to politics and society — on the following topics. 1) Decoding non-verbal content to identify intent and impact in political messages in mass and social media, such as political advertisements, debates, or news footage; 2) Recognition of emotion, expressions, and viewer perception from communicative gestures, gazes, and facial expressions; 3) Geo-coded Twitter image analysis for protest and social movement analysis; 4) Election outcome prediction and voter understanding by using social media post; and 5) Detection of misinformation, rumors, and fake news and analyzing their impacts in major political events such as the U.S. presidential election.
Speakers: Jungseock Joo, Zachary Steinert-Threlkeld, Jiebo Luo
Jungseock Joo is an assistant professor in Communication and Statistics at UCLA. His research primarily focuses on understanding multimodal human communication with computer vision and machine learning based methods. In particular, his research employs various types of large scale multimodal media data such as TV news or online social media and examines how multimodal cues in these domains relate to public opinions and real world events. He holds Ph.D. in Computer Science from UCLA. He was a former research scientist in Computer Vision at Facebook prior to joining UCLA in 2015.
Zachary Steinert-Threlkeld is an assistant professor of public policy at the University of California, Los Angeles’ Luskin School of Public Affairs. He uses computational methods and big data to study protest dynamics, with a particular interest in how social networks affect individuals’ decision to protest. Early work argues that mobilization comes from action occurring on the periphery of countries’ social networks, and subsequent work has measured conflict dynamics in Ukraine and the social media use of activists. New work explores how to measure the kinds of people who protest based on images shared on social media; the effect of internet outages on political beliefs; and simulations to study the interaction between protest and repression.
Jiebo Luo joined the University of Rochester in Fall 2011 after over fifteen prolific years at Kodak Research Laboratories. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010, IEEE CVPR 2012 and IEEE ICIP 2017. He has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Big Data, ACM Transactions on Intelligent Systems and Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. Dr. Luo is a Fellow of the SPIE, IEEE, and IAPR, and a member of ACM, AAAI, and AAAS. In addition, he is a Board Member of the Greater Rochester Data Science Industry Consortium.
Structured Deep Learning for Pixel-level Understanding
semantic segmentation, weakly supervised learning, human parsing, part/face parsing, depth estimation
Abstract: Pixel-level understanding is a significant research topic in the computer vision field. With the emergence of many artificial intelligence companies, researchers are putting more efforts in studying semantics of pixels, e.g. scene/human/part/face parsing and depth estimation, which can benefit many real multimedia applications such as intelligent surveillance, robotics, automatic driving, fashion recommendation, and augmented reality. The past decade has witnessed the rapid development of pixel-level understanding, owing to deep convolutional neural networks and human annotated training samples (image or video). However, localizing object/part of interests, learning to segment new concepts and depth estimation are still very challenging. The obstacles mainly come from: 1) computers still struggle to understand the interdependency of objects in the scene as a whole; 2) fine-grained object parts are difficult to be understood due to large diversity of object appearance, scale as well as view point; 3) learning to understand new concepts requires additional pixel-level annotations, which is very costly in terms of both finance and human effort; 4) the applicability of monocular depth estimation is greatly limited in the case of single camera setting. In this tutorial, we will discuss some promising solutions in the related topic to address the above raised issues.
Speakers: Yunchao Wei, Xiaodan Liang, Si Liu, Liang Lin
Yunchao Wei is currently a Postdoctoral Researcher in Beckman Institute at the University of Illinois at Urbana-Champaign, working with Prof. Thomas Huang. He received his Ph.D. degree from Beijing Jiaotong University in 2016. He received Excellent Doctoral Dissertation Awards of Chinese Institute of Electronics (CIE) in 2016, the Winner prize of the object detection task (1a) in ILSVRC 2014, the Runner-up prizes of all the video object detection tasks in ILSVRC 2017. He has published over 30 papers on top-tier conferences/journals such CVPR and T-PAMI. His current research interest focuses on computer vision techniques for large-scale data analysis. Specifically, he has done work in weakly- and semi-supervised object recognition, multi-label image classification, video object detection and multi-modal analysis.
Xiaodan Liang is currently a Project Scientist in Machine Learning Department at the Carnegie Mellon University, working with Prof. Eric Xing. She received her Ph.D. degree from Sun Yat-sen University in 2016. She has served as a program committee member at AAAI 2017 and CVPR 2017, IJCAI 2017. She has published over 40 cutting-edge papers on the human-related analysis including the human parsing, pedestrian detection and instance segmentation, 2D/3D human pose estimation and activity recognition, and weakly-supervised and few-shot learning. She and her collaborators has also published the largest human parsing dataset to advance the research on human understanding and successfully organized the 1st Look Into Person (LIP) workshop and challenge on CVPR 2017
Si Liu is now an Associate Professor in School of Computer Science and Engineering, Beihang University. She used to be a Research Fellow at the Department of Electrical and Computer Engineering, National University of Singapore (NUS). She obtained PhD degree from Institute of Automation, Chinese Academy of Sciences (CASIA) in 2012. She obtained Bachelor degree from Experimental Class of Beijing Institute of Technology (BIT). Her current research interests include attribute prediction, object detection and image parsing. She is also interested in the applications, such as makeup and clothes recommendation, online product retrieval. She received the Best Paper Awards from ACM MM’13, Best Demo Awards from ACM MM.
Liang Lin is the Executive Director of SenseTime Research and a full Professor of Sun Yat-sen University. He currently leads the SenseTime R & D teams to develop cutting-edges and deliverable solutions on computer vision, data analysis, and intelligent robotic systems. He has authorized and co-authorized on more than 100 papers in top-tier academic journals and conferences (e.g., 15 papers in TPAMI/IJCV). He has been serving as an Associate Editor of IEEE Trans. Human-Machine Systems. He served as an Area Chair for numerous conferences such as CVPR, ICME, ACCV, ICMR. He was the recipient of Best Paper Dimond Award in IEEE ICME 2017, Best Paper Runners-Up Award in ACM NPAR 2010, Google Faculty Award in 2012, Best Student Paper Award in IEEE ICME 2014, and Hong Kong Scholars Award in 2014. He is a Fellow of IET.