Multimedia Grand Challenges
The purpose of the Multimedia Grand Challenge is to engage the multimedia research community by establishing well-defined and objectively judged challenge problems intended to exercise the state-of-the-art methods and inspire future research directions. The key criteria for Grand Challenges are that they should be useful, interesting, and their solution should involve a series of research tasks over a long period of time, with pointers towards longer-term research.
The Multimedia Grand Challenge proposals accepted for the ACM Multimedia 2019 edition are the following:
AI Meets Beauty Perfect Corp. Half Million Beauty Product Image Recognition Challenge
The “AI Meets Beauty” Challenge 2019 aims at providing a large-scale image dataset of over half million images of beauty and personal care products, namely the Perfect-500K dataset, for participants to solve a challenging task: beauty and personal care product recognition. Particularly, given a real-world image containing one beauty or personal care item, the task is to match the real-world example of this item to the same item in the Perfect-500K data set. This is a practical but extremely challenging task, given the limitation that only images in a limited number from e-commerce sites are available in Perfect-500K and no real-world examples will be provided in advance. The Mean Average Precision (MAP) will be used as the metric for evaluating the recognition performance.
BioMedia: Multimedia in Medicine
The Biomedia Grand Challenge focuses on the emerging field of medical multimedia where the task is to detect and classify both normal findings and anomalies in the gastrointestinal (GI) tract. For the classification, we will use the standard metrics recall, precision, specificity, accuracy, Matthew correlation coefficient and F1-score, and the time to perform the classification, and the automatically generated report will be assessed manually from three of our medical partners based on usefulness, correctness and innovation. The proposed task is by it self challenging, and it gives the multimedia community an opportunity to develop solutions in a medical area longing for technological developments, potentially giving societal impact by saving both valuable medical resources and human lives.
Dense Video Captioning Challenge
The challenge focuses on the dense video captioning task and the participants are asked to develop a contesting system to produce one sentence for each temporally localized event in a test video. The accuracy will be evaluated against human pre-generated paragraph over several automatic metrics (e.g., BLEU@4, METEOR, and SPICE) during the evaluation stage.
This challenge goes a step beyond traditional video captioning and targets for generating a sentence for each event occurring in a video, which will offer a valuable venue to foster research into dense video captioning.
iQIYI Celebrity Video Identification Challenge
The goal of the iQIYI-VID-2019 Challenge is to arouse a multi-modal celebrity identification research to promote wider applications in real videos. We use the Mean Average Precision (MAP) as the evaluation of the final results carried out online in a docker environment for all participants. We hope to solve the universal muti-modal problem of face recognition, speaker diarization, person re-identification and so on, and continue to improve the technical innovations and enhance the communication of academic research and industries internationally.A
Live Video Streaming
Live video streaming over DASH is challenging as it requires a low end-to-end latency, is more prone to stall, and the receiver has to make online decisions about which representation at which bitrate to download and whether to adjust the playback to control the latency.
We invite the multimedia research community to jointly address these challenges by participating in this grand challenge where participants get to design and implement the algorithms for bitrate control and latency control in a common, simulated, live video streaming player to benchmark their algorithms against each other. Submissions will be evaluated based on the QoE score given the latency constraint.
Relation Understanding in Videos
In this challenge, you are encouraged to participate one or more of 3 pivotal tasks in relation understanding: video object detection, action detection and visual relation detection. Based on a large-scale
user-generated video dataset, we will evaluate your novel detection approaches by the mean average precision metrics. The challenge will push the limits in video content analysis and further bridge the gap between vision and language. For more details, please visit http://lms.comp.nus.edu.sg/research/video-relation-challenge/mm19-gdc/
Social Media Prediction
The purpose of SMP Challenge is to seek excellent research teams for prediction for social multimedia. In this year, the open task of SMP Challenge 2019 is Temporal Popularity Prediction, which focused on predicting future clicks of new social media posts before they were posted in social feeds. The participated teams need to design new algorithm based on understanding and learning techniques, and automatically predict popularity (formulated by clicks or visits etc.) to achieve better accuracy MSE and correlation evaluation SRC. Making predictions via social multimedia (photos, video or news) is not only helps us to make better strategic decisions for future, but also explore advanced predictive learning and analytics methods for various of problems and scenarios in multimedia area, such as multimedia recommendation, advertising system, fashion analysis etc.
Content-based video relevance prediction: Hulu
The main task of this challenge is to solve the “cold-start” problem. According to the viewer behavior history and the video content metadata, participants need to predict the viewer click-through behavior on new TV shows or new movies. AUC (Area Under Curve) will be used as evaluation metric. Video relevance computation is one of the most important tasks for personalized online streaming service. Particularly, for “cold-start” problems – when a new video is added to the library, the recommendation system needs to bootstrap the video relevance score with very little historical viewer feedbacks, content-based video relevance prediction is a more promising method compared to viewer history based methods.