Kickstarting the Commons: The YFCC100M and the YLI Corpora

The publication of the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M)--to date the largest open-access collection of photos and videos--has provided a unique opportunity to stimulate new research in multimedia analysis and retrieval. To make the YFCC100M even more valuable, we have started working towards supplementing it with a comprehensive set of precomputed features and high-quality ground truth annotations. As part of our efforts, we are releasing the YLI feature corpus, as well as the YLI-GEO and YLI-MED annotation subsets. Under the Multimedia Commons Project (MMCP), we are currently laying the groundwork for a common platform and framework around the YFCC100M that (i) facilitates researchers in contributing additional features and annotations, (ii) supports experimentation on the dataset, and (iii) enables sharing of obtained results. This paper describes the YLI features and annotations released thus far, and sketches our vision for the MMCP.


Work on the YLI corpus and the Multimedia CommonsProject is supported by several funders, including: a collaborativeLaboratory Directed Research and Developmentproject led by Lawrence Livermore National Laboratory,under the auspices of the U.S. Dept. of Energy contractDE-AC52-07NA27344 (LLNL-CONF-676635); a grant fromCisco Systems, Inc. for Event Detection for ImprovedSpeaker Diarization and Meeting Analysis; and a NationalScience Foundation grant for the SMASH project: ScalableMultimedia content AnalysiS in a High-level language(award IIS : 1251276). Any opinions, findings, and conclusionsexpressed here are those of the individual researchers,and do not necessarily reflect the views of the funders.

Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons '15), Brisbane, Australia, pp. 1-6

