Tutorials

 

The confirmed tutorials are listed below

Deep Learning Interpretation Jitao Sang (Beijing Jiatong University, China) 22/10/2018 AM
Interactive Video Search: Where is the User in the Age of Deep Learning? Klaus Schoeffmann (Klagenfurt University, Austria), Werner Bailer (Joanneum Research, Austria), Jakub Lokoč (Charles University, Prague, Czech Republic), Cathal Gurrin (DCU, Ireland), George Awad (NIST, USA) 22/10/2018 AM
Similarity-Based Processing of Motion Capture Data Pavel Zezula (Masaryk University, Czech), Jan Sedmidubsky (Masaryk University) 22/10/2018 PM
The Importance of Medical Multimedia Michael Riegler (University of Oslo, Norway),Pål Halvorsen (University of Oslo, Norway), Bernd Münzer (Klagenfurt University, Austria), Klaus Schoeffmann (Klagenfurt University, Austria) 22/10/2018 PM
Social and Political Event Analysis based on Rich Media Jungseock Joo (UCLA), Zachary Steinert-Threlkeld (UCLA), Jiebo Luo (University of Rochester) 26/10/2018 AM
From Action Recognition to Complex Event Detection Ting Yao (MSRA), Jingen Liu (SRI International) 26/10/2018 AM
Structured Deep Learning for Pixel-level Understanding Yunchao Wei (UIUC, USA),  Xiaodan Liang (CMU, USA), Si Liu (Beihang University, China),  Liang Lin (Sun Yat-sen University, China) 26/10/2018 PM
To Recognize Families In the Wild: A Machine Vision Tutorial Joseph P Robinson (Northeastern University), Ming Shao (UMass Dartmouth),  Yun Fu (Northeastern University) 26/10/2018 PM

Deep Learning Interpretation

Deep Learning Interpretation: from identifying what problem deep learning can address to exploring what problem deep learning can NOT address.

Abstract: Deep learning has been successfully explored in addressing different multimedia topics in recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. The academic researchers have now transferring their attention from identifying what problem deep learning can address to exploring what problem deep learning can NOT address. This tutorial starts with a summarization of six ‘NOT’ problems deep learning fails to solve in the current stage, i.e., low stability, debugging difficulty, poor parameter transparency, poor incrementality, poor reasoning ability, and machine bias. These problems share a common origin from the lack of deep learning interpretation. This tutorial attempts to correspond the six ‘NOT’ problems to three levels of deep learning interpretation: (1) Locating, accurately and efficiently locating which feature contributes much to the output. (2) Understanding, bidirectional semantic accessing between human knowledge and deep learning algorithm. (3) Expandability, well storing, accumulating and reusing the models learned from deep learning. Existing studies falling into these three levels will be reviewed in detail, before a discussion on the future interesting directions in the end.

Speakers: Jitao Sang

Jitao Sang is a full Professor in Beijing Jiaotong University. He graduated his PhD from Chinese Academy of Science (CAS) with the highest honor, the special prize of CAS president scholarship. His research interest is in multimedia data mining and machine learning, with award-winning publications in prestigious multimedia conferences (best paper in PCM 2016, best paper finalist in MM2012 and MM2013, best student paper in MMM2013, best student paper in ICMR2015). He is the recipient of ACM China Rising Star in 2016. So far, he has authored one book, co-authored more than 60 peer-referenced papers in multimedia-related journals and conferences. He is program co-chair in PCM 2015 and ICIMCS 2015, area chair in ACM MM 2018 and ICPR 2018. He is tutorial speaker at ACM MM 2014 & 2015, MMM 2015, ICME 2015 and ICMR 2015.

 

Interactive Video Search: Where is the User in the Age of Deep Learning?

Interactive Video Search Tools, Evaluation Approaches, Datasets and Task Design, Evaluation Metrics, Lessons Learned from Evaluation Campaigns

slide: https://www.slideshare.net/klschoef/interactive-video-search-where-is-the-user-in-the-age-of-deep-learning

Abstract: In this tutorial, we will discuss interactive video search tools and methods, review their need in the age of deep learning, and explore video and multimedia search challenges and their role as evaluation benchmarks in the field of multimedia information retrieval. We will cover three different campaigns (TRECVID, Video Browser Showdown, and the Lifelog Search Challenge), discuss their goals and rules, and present their achieved findings over the last half-decade. Moreover, we will talk about data sets, tasks, evaluation procedures, and examples of interactive video search tools, as well as how they evolved over the years. Participants of this tutorial will be able to gain collective insights from all three challenges and use them for focusing their research efforts on outstanding problems that still remain unsolved in this area.

Speakers: Klaus Schoeffmann, Werner Bailer, Jakub Lokoč, Cathal Gurrin, George Awad

Klaus Schoeffmann is an Associate Professor at Klagenfurt University, Austria. He received his PhD in 2009 and his habilitation (venia docendi) in 2015, both in computer science and from Klagenfurt University. His research focuses on video analytics and interactive multimedia systems, particularly in the medical domain. He has co-authored more than 110 publications on various topics in multimedia and co-organized several international conferences, workshops, and special sessions in the field. He is co-founder of the Video Browser Showdown (VBS) and he has given several tutorials at ACM and IEEE conferences in the past few years.

Werner Bailer is a key researcher at DIGITAL – Institute for Information and Communication Technologies at JOANNEUM RESEARCH in Graz, Austria. He received a degree in Media Technology and Design in 2002 for his diploma thesis on motion estimation and segmentation for film/video standards conversion. His research interests include digital film restoration, audiovisual content analysis and retrieval as well as multimedia metadata. He regularly contributes to standardization in MPEG and to EBU working groups, has co-organized Video Browser Showdown since 2012 and contributed to the TRECVID and MediaEval benchmarks.

Jakub Lokoč received the doctoral degree in software systems from the Charles University, Prague, Czech Republic. He is an assistant professor in the Department of Software Engineering at the Charles University, Faculty of Mathematics and Physics. His research interests include metric indexing, multimedia databases, interactive video retrieval, known-item search, deep learning and similarity modeling. He has co-authored more than 60 publications on various topics in multimedia retrieval/indexing and co-organized several international workshops.

Cathal Gurrin is an Associate Professor at the School of Computing at Dublin City University and a principal co-investigator at the Insight Centre for Data Analytics. His research interests are information retrieval, personal data analytics and lifelogging. Lifelogging integrates personal sensing, computer science, cognitive science and data-driven healthcare analytics to realize the next-generation of digital records for the individual.  He has co-authored more than 200 publications in the area of multimedia data analytics and he is the founder of the Lifelog Search Challenge, as well as the NTCIR-Lifelog task and ImageCLEF-Lifelog.

George Awad is a senior computer scientist at Dakota Consulting, Inc and the TRECVID project leader at the National Institute of Standards and Technology, USA. He has been supporting the TRECVID project since 2007. He has MSc. in Computer Engineering (2000) from AASTMT (Arab Academy for Science, Technology \& Maritime Transport) and PhD in Computer Science (2007) from Dublin City University. His current main research activities includes evaluating video search engines on different real-world use-case scenarios using real-world data sets.

 

Similarity-Based Processing of Motion Capture Data

The tutorial presents challenges and state-of-the-art techniques for similarity-based processing of 3D human motion data such as subsequence searching, action recognition or semantic segmentation.

slide: https://www.fi.muni.cz/~xsedmid/download/MM18-tutorial.pdf

Abstract: Motion capture technologies digitize human movements by tracking 3D positions of specific skeleton joints in time. Such spatio-temporal data have an enormous application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. The recorded data can be imprecise, voluminous, and the same movement action can be performed by various subjects in a number of alternatives that can vary in speed, timing or a position in space. This requires employing completely different data-processing paradigms in comparison with the traditional domains such as attributes, text or images. The objective of this tutorial is to introduce fundamental principles and technologies designed for similarity comparison, searching, subsequence matching, classification and action detection in the motion capture data. Specifically, we emphasize the importance of similarity needed to express the degree of accordance between pairs of motion sequences and also discuss the machine-learning approaches that can automatically acquire content-descriptive movement features. We explain how the concept of similarity together with the learned features can be employed for searching similar occurrences of interested actions within a long motion sequence. Assuming a user-provided categorization of example motions, we discuss techniques able to recognize types of specific movement actions and detect such kinds of actions within continuous motion sequences. Selected operations will be demonstrated by on-line web applications.

Speakers: Jan Sedmidubsky, Pavel Zezula

Jan Sedmidubsky is a researcher of computer science at Masaryk University (Czech Republic) where he received the Ph.D. degree in 2011 and was awarded the dean’s and rector’s prize for a distinguished dissertation thesis. His research activities are primarily concentrated on developing effective and efficient similarity-based processing techniques, with a special emphasis on the domain of 3D motion capture data. He was a member of the team which was awarded by the IBM Shared University Research (SUR) Award for the “Web-scale Similarity Search in Multimedia Data” project. He participated in several research and application-oriented projects. He is a co-author of more than 30 research publications.

Pavel Zezula is a professor of computer science at Masaryk University (Czech Republic) and head of Data Intensive Systems and Applications (DISA) laboratory. His professional interests primarily concern multimedia content-based retrieval, large-scale similarity search, and big data analysis. He is a co-author of the seminal similarity search structure, the “M-Tree”, and the book “Similarity Search: The Metric Space Approach” by Springer US. His research was supported by numerous national and international grants and his results were recognized by both academia, e.g., Masaryk University rector’s prize, and industry, e.g., IBM prize. He is a co-author of more than 150 research publications with about 6,000 citations. He is a steering committee member of the Similarity Search and Applications (SISAP) series of international conferences.

 

The Importance of Medical Multimedia

slide: https://www.slideshare.net/klschoef/the-importance-of-medical-multimedia-120369152

Abstract: Multimedia research is becoming more and more important for the medical domain, where an increasing number of videos and images are integrated in the daily routine of surgical and diagnostic work. While the collection of medical multimedia data is not an issue, appropriate tools for efficient use of this data are missing. This includes management and inspection of the data, visual analytics, as well as learning relevant semantics and using recognition results for optimizing surgical and diagnostic processes. The characteristics and requirements in this interesting but challenging field are different than the ones in classic multimedia domains. Therefore, this tutorial gives a general introduction to the field, provides a broad overview of specific requirements and challenges, discusses existing work and open challenges, and elaborates in detail how machine learning approaches can help in multimedia-related fields to improve the performance of surgeons/clinicians.

Speakers:  Michael Riegler, Pål Halvorsen, Bernd Münzer, Klaus Schoeffmann

Michael Alexander Riegler is a senior research scientist at Simula Research Laboratory and University of Oslo. He received his Master’s degree from Klagenfurt University with distinction and finished his PhD at the University of Oslo in two and a half years. His PhD thesis topic was efficient processing of medical multimedia workloads. His research interests are medical multimedia data analysis and understanding, image processing, image retrieval, parallel processing, gamification and serious games, crowdsourcing, social computing and user intentions. Furthermore, he is involved in several initiatives like the MediaEval Benchmarking initiative for Multimedia Evaluation, which runs this year the Medico task (automatic analysis of colonoscopy videos).

Pål Halvorsen is a chief research scientist at Simula Research Laboratory, a professor at the Department of Informatics, University of Oslo, Norway, and the CEO of ForzaSys AS. He received his doctoral degree (Dr.Scient.) in 2001.  His research focuses mainly at distributed multimedia systems including operating systems, processing, storage and retrieval, communication and distribution from a performance and efficiency point of view. He has co-authored more than 200 publications on various topics in multimedia. More information about authored papers, projects, supervised students, teaching, community services, etc. can be found at http://home.ifi.uio.no/paalh.

Bernd Münzer is a postdoc researcher at Klagenfurt University, Austria. He received his PhD from Klagenfurt University with distinction in 2015. His research focus is on content-based analysis of videos from endoscopic surgeries in order to enable comprehensive video documentation and efficient post-procedural analysis. In particular, he investigates topics such as domain-specific video compression exploiting domain-specific characteristics, perceptual aspects of video quality in the endoscopy domain as well as interactive aspects. He is the main author of a recent comprehensive survey article that aggregates research results from various research fields and communities that are active in the field of endoscopic image and video processing.

Klaus Schoeffmann is an Associate Professor at Klagenfurt University, Austria. He received his PhD in 2009 and his habilitation (venia docendi) in 2015, both in computer science and from Klagenfurt University. His research focuses on video analytics and interactive multimedia systems, particularly in the medical domain. He has co-authored more than 110 publications on various topics in multimedia, inclusive of more than 30 on different aspects of medical multimedia systems. He has co-organized several international conferences, workshops, and special sessions in the field of multimedia. Moreover, he has given several tutorials at ACM and IEEE conferences in the past few years and he has been a project leader (PI) for several research projects in the medical multimedia domain.

 

Social and Political Event Analysis using Rich Media

slide: https://www.dropbox.com/s/40tb6ojgw68mavc/ACM_Multimedia_ZST_Presentation10262018.pdf?dl=0

Systematic computational analysis of real world social and political events from multimodal media data.

Abstract: This tutorial aims to provide a comprehensive overview on the applications of rich social media data for real world social and political event analysis, which is a new emerging topic in multimedia research. We will discuss the recent evolution of social media as venues for social and political interaction and their impacts on the real world events using specific examples. We will introduce large scale datasets drawn from social media sources and review concrete research projects that build on computer vision and deep learning based methods. Existing researches in social media have examined various patterns of information diffusion and contagion, user activities and networking, and social media-based predictions of real world events. Most existing works, however, rely on non-content or text based features and do not fully leverage rich multiple modalities — visuals and acoustics — which are prevalent in most online social media. Such approaches underutilize vibrant and integrated characteristics of social media especially because the current audiences are getting more attracted to visual information centric media. This proposal highlights the impacts of rich multimodal data to the real world events and elaborates on relevant recent research projects — the concrete development, data governance, technical details, and their implications to politics and society — on the following topics. 1) Decoding non-verbal content to identify intent and impact in political messages in mass and social media, such as political advertisements, debates, or news footage; 2) Recognition of emotion, expressions, and viewer perception from communicative gestures, gazes, and facial expressions; 3) Geo-coded Twitter image analysis for protest and social movement analysis; 4) Election outcome prediction and voter understanding by using social media post; and 5) Detection of misinformation, rumors, and fake news and analyzing their impacts in major political events such as the U.S. presidential election.

Speakers: Jungseock Joo, Zachary Steinert-Threlkeld, Jiebo Luo

Jungseock Joo is an assistant professor in Communication and Statistics at UCLA. His research primarily focuses on understanding multimodal human communication with computer vision and machine learning based methods. In particular, his research employs various types of large scale multimodal media data such as TV news or online social media and examines how multimodal cues in these domains relate to public opinions and real world events. He holds Ph.D. in Computer Science from UCLA. He was a former research scientist in Computer Vision at Facebook prior to joining UCLA in 2015.

Zachary Steinert-Threlkeld is an assistant professor of public policy at the University of California, Los Angeles’ Luskin School of Public Affairs. He uses computational methods and big data to study protest dynamics, with a particular interest in how social networks affect individuals’ decision to protest. Early work argues that mobilization comes from action occurring on the periphery of countries’ social networks, and subsequent work has measured conflict dynamics in Ukraine and the social media use of activists. New work explores how to measure the kinds of people who protest based on images shared on social media; the effect of internet outages on political beliefs; and simulations to study the interaction between protest and repression.

Jiebo Luo joined the University of Rochester in Fall 2011 after over fifteen prolific years at Kodak Research Laboratories. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010, IEEE CVPR 2012 and IEEE ICIP 2017. He has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Big Data, ACM Transactions on Intelligent Systems and Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. Dr. Luo is a Fellow of the SPIE, IEEE, and IAPR, and a member of ACM, AAAI, and AAAS. In addition, he is a Board Member of the Greater Rochester Data Science Industry Consortium.

 

From Action Recognition to Complex Event Detection

Abstract: Understanding human behavior from videos has been one of the critical problems for video analysis. Detecting simple action and complex events from videos is a very challenging task due to large variations in content, motion, and viewpoint. In the past decade, research on human behavior understanding has been making significant improvement thanks to various feature encoding and pooling, as well as recent deep learning techniques. After a problem introduction and brief history review, this tutorial will present the approaches of learning video representation in the range from frame-level representation plus different pooling strategies or quantization mechanisms to end-to-end spatio-temporal video-level representation learning frameworks, followed by action recognition which categorizes video content into human action classes. It will focus on the discussion of the difference between techniques, and practical instructions on dealing real problems. Recognizing complex events, which can be a sequence of actions, is obtaining increasing attention. In the second part of this tutorial, we will present research achievements on both multimedia event detection and surveillance event detection. The instructors will share their practical lessons learned from their experience in various international action recognition and event detection evaluation challenges.

Speakers: Ting Yao, Jingen Liu

Dr. Ting Yao is currently a Principal Researcher in Vision and Multimedia Lab at JD AI Research, Beijing, China. His research interests include video understanding, large-scale multimedia search and deep learning. Prior to joining JD AI Research, he was a Researcher with Microsoft Research Asia in Beijing, China. Ting is an active participant of several benchmark evaluations. He is the principal designer of several top-performing multimedia analytic systems in worldwide competitions such as COCO Image Captioning, Visual Do- main Adaptation Challenge 2017, ActivityNet Large Scale Activity Recognition Challenge 2018, 2017 and 2016, THU- MOS Action Recognition Challenge 2015, and MSR-Bing Image Retrieval Challenge 2014 and 2013. He is one of the organizers of the MSR Video to Language Challenge 2017 and 2016. For his contributions to Multimedia Search by Self, External and Crowdsourcing Knowledge, he was awarded the 2015 SIGMM Outstanding Ph.D. Thesis Award.

Dr. Jingen Liu is a computer vision researcher in Vision and Multimedia Lab at JD AI Research, Mountain View, CA, USA. He was a Sr. Computer Scientist at SRI International, USA from 2011 to 2018, and Research Fellow at University of Michigan, Ann Arbor from 2010 to 2011. He received his PhD degree from CRCV at UCF. His research is focused on computer vision, multimedia processing and machine learning. His expertise lies in video analysis, human action recognition, and multimedia event detection. He has published 40+ papers on prestigious computer vision conferences, such as CVPR, ICCV, ECCV, and AAAI. His works have been cited about 4000 times. As an outstanding young scientist, he was invited by NAE to present his work on video content analysis in the Japan-American Frontiers of Engineering Symposium 2012. He is an Associate Editor for Machine Vision and Application, Springer since 2014, and was an Area Chair of IEEE Winter Conference on Applications of Computer Vision 2015 and 2016, a program chair of THUMOS workshop 2013 and 2014 for large scale activity recognition. He is a senior member of IEEE since 2014.

 

Structured Deep Learning for Pixel-level Understanding

Semantic Segmentation, Weakly Supervised Learning, Human Parsing, Part/face parsing, Depth Estimation

Abstract: Pixel-level understanding is a significant research topic in the computer vision field. With the emergence of many artificial intelligence companies, researchers are putting more efforts in studying semantics of pixels, e.g. scene/human/part/face parsing and depth estimation, which can benefit many real multimedia applications such as intelligent surveillance, robotics, automatic driving, fashion recommendation, and augmented reality. The past decade has witnessed the rapid development of pixel-level understanding, owing to deep convolutional neural networks and human annotated training samples (image or video). However, localizing object/part of interests, learning to segment new concepts and depth estimation are still very challenging. The obstacles mainly come from: 1) computers still struggle to understand the interdependency of objects in the scene as a whole; 2) fine-grained object parts are difficult to be understood due to large diversity of object appearance, scale as well as view point; 3) learning to understand new concepts requires additional pixel-level annotations, which is very costly in terms of both finance and human effort; 4) the applicability of monocular depth estimation is greatly limited in the case of single camera setting. In this tutorial, we will discuss some promising solutions in the related topic to address the above raised issues.

Speakers: Yunchao Wei, Xiaodan Liang, Si Liu, Liang Lin

Yunchao Wei is currently a Postdoctoral Researcher in Beckman Institute at the University of Illinois at Urbana-Champaign, working with Prof. Thomas Huang. He received his Ph.D. degree from Beijing Jiaotong University in 2016. He received Excellent Doctoral Dissertation Awards of Chinese Institute of Electronics (CIE) in 2016, the Winner prize of the object detection task (1a) in ILSVRC 2014, the Runner-up prizes of all the video object detection tasks in ILSVRC 2017. He has published over 30 papers on top-tier conferences/journals such CVPR and T-PAMI. His current research interest focuses on computer vision techniques for large-scale data analysis. Specifically, he has done work in weakly- and semi-supervised object recognition, multi-label image classification, video object detection and multi-modal analysis.

Xiaodan Liang is currently a Project Scientist in Machine Learning Department at the Carnegie Mellon University, working with Prof. Eric Xing. She received her Ph.D. degree from Sun Yat-sen University in 2016. She has served as a program committee member at AAAI 2017 and CVPR 2017, IJCAI 2017. She has published over 40 cutting-edge papers on the human-related analysis including the human parsing, pedestrian detection and instance segmentation, 2D/3D human pose estimation and activity recognition, and weakly-supervised and few-shot learning. She and her collaborators has also published the largest human parsing dataset to advance the research on human understanding and successfully organized the 1st Look Into Person (LIP) workshop and challenge on CVPR 2017

Si Liu is now an Associate Professor in School of Computer Science and Engineering, Beihang University. She used to be a Research Fellow at the Department of Electrical and Computer Engineering, National University of Singapore (NUS). She obtained PhD degree from Institute of Automation, Chinese Academy of Sciences (CASIA) in 2012. She obtained Bachelor degree from Experimental Class of Beijing Institute of Technology (BIT). Her current research interests include attribute prediction, object detection and image parsing. She is also interested in the applications, such as makeup and clothes recommendation, online product retrieval. She received the Best Paper Awards from ACM MM’13, Best Demo Awards from ACM MM.

Liang Lin is the Executive Director of SenseTime Research and a full Professor of Sun Yat-sen University. He currently leads the SenseTime R & D teams to develop cutting-edges and deliverable solutions on computer vision, data analysis, and intelligent robotic systems. He has authorized and co-authorized on more than 100 papers in top-tier academic journals and conferences (e.g., 15 papers in TPAMI/IJCV). He has been serving as an Associate Editor of IEEE Trans. Human-Machine Systems. He served as an Area Chair for numerous conferences such as CVPR, ICME, ACCV, ICMR. He was the recipient of Best Paper Dimond Award in IEEE ICME 2017, Best Paper Runners-Up Award in ACM NPAR 2010, Google Faculty Award in 2012, Best Student Paper Award in IEEE ICME 2014, and Hong Kong Scholars Award in 2014. He is a Fellow of IET.

 

To Recognize Families In the Wild: A Machine Vision Tutorial

slide: https://web.northeastern.edu/smilelab/acm_mm_2018_tutorial_reduced.pdf

Visual Kinship Understanding, Family Recognition, Deep Learning, Familiar Feature, Big Data

Abstract: Automatic kinship recognition has relevance in an abundance of applications. For starters, aiding forensic investigations, as kinship is a powerful cue that could narrow the search space (e.g., knowledge that the “Boston Bombers” were brothers could have helped identify the suspects sooner). In short, there are many beneficiaries that could result from such technologies: whether the consumer (e.g., automatic photo library management), scholar (e.g., historic lineage & genealogical studies), data analyzer (e.g., social-mediabased analysis), investigator (e.g., cases of missing children and human trafficking– for instance, it is unlikely that a missing child found online would be in any database, however, more than likely a family member would be), or even refugees. Besides applicationbased problems, and as already hinted, kinship is a powerful cue that could serve as a face attribute capable of greatly reducing the search space in more general face-recognition problems. In this tutorial, we will introduce the background information, progress leading us up to these points, several current state-of-the-art algorithms spanning various views of the kinship recognition problem (e.g., verification, classification, tri-subject). We will then cover ourlarge-scale Families In the Wild (FIW) image collection, several challenge competitions it as been used in, along with the top performing deep learning approaches. The tutorial will end with adiscussion about future research directions and practical use-cases.

Speakers: Joseph P Robinson, Ming Shao, Yun Fu

Joseph P Robinson received BS in electrical & computer engineering (’14) and is pursuing a PhD in computer engineering at Northeastern University (NEU). His research is applied machine vision, with emphasis on faces, deep learning, multimedia, and large databases. He led efforts in TRECVid de´but. He built several image & video databases– most notably FIW. Robinson’s has served as organizing chair and host of various workshops & challenges (e.g., NECV ’17, RFIW@ACMMM ’17, RFIW@FG 2018, AMFG@CVPR ’18, FacesMM@ICME ’18), PC member (e.g., FG, MIRP, MMEDIA), reviewer (e.g., IEEE Transactions on– Biomedical Circuits and Systems, Image Processing, Pattern Analysis and Machine Intelligence), and leading positions like president of IEEE@NEU and Relations Officer of IEEE SAC R1 Region. He did two NSF REUs (’10 & ’11), co-op at Analogic Corporation & BBN Technology, and internships at MIT-LL (’14), STR (’16 & ’17), and Snap Inc. (i.e., Snapchat) (’18).

Ming Shao received the B.E. degree in computer science, the BS degree in applied mathematics, and the M.E. degree in computer science from Beihang University, Beijing, China, in 2006, 2007, and 2010, respectively. He received the Ph.D. degree in computer engineering from NEU, 2016. He is a tenure-track Assistant Professor affiliated with College of Engineering at University of Massachusetts Dartmouth since 2016 Fall. His current research interests include sparse modeling, low-rank matrix analysis, deep learning, and applied machine learning on social media analytics. He was the recipient of the Presidential Fellowship of State University of New York at Buffalo (2010-12), and the best paper award at the 2011 IEEE ICDM Workshop on Large Scale Visual Analytics. He has served as a reviewer for IEEE journals: IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Image Processing, IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Knowledge and Data Engineering, etc.

Yun Fu received the B.Eng. degree in information engineering and the M.Eng. degree in pattern recognition and intelligence systems from Xi’an Jiaotong University, China, respectively, and the M.S. degree in statistics and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, respectively. He is an interdisciplinary faculty member affiliated with College of Engineering and the College of Computer and Information Science at NEU since 2012. His research interests are Machine Learning, Computational Intelligence, Big Data Mining, Computer Vision, Pattern Recognition, and Cyber-Physical Systems. He has extensive publications in leading journals, books/book chapters and international conferences/workshops. He serves as associate editor, chairs, PC member & reviewer of many top journals and international conferences/workshops. He received seven Prestigious Young Investigator Awards from NAE, ONR, ARO, IEEE, INNS, UIUC, Grainger Foundation; seven Best Paper Awards from IEEE, IAPR, SPIE, SIAM; 3 major Industrial Research Awards from Google, Samsung, and Adobe, etc. He is currently an Associate Editor of the IEEE Transactions on Neural Networks and Leaning Systems (TNNLS). He is fellow of SPIE and IAPR, a Lifetime Senior Member of ACM, Lifetime Member of AAAI, OSA, and Institute of Mathematical Statistics, member of Global Young Academy (GYA), AAAS, INNS & Beckman Graduate Fellow during 2007-08.