csuhan's picture
Upload folder using huggingface_hub
b0c0df0 verified

EgoSchema

Task Description

EgoSchema is a diagnostic benchmark for very long-form video language understanding. The task format for EgoSchema is Multi-choice Question Answering.

  • Questions: For each MCQ in the dataset, we provide a post prompt:\nAnswer with the option's letter from the given choices directly.

  • Answers: As required by the official website, we match the generated option letter into index, i.e., A: 0, B: 1, C: 2, D: 3, E: 4. Many models like LLaVA can follow the instructions well and generate only the option letter. However, in case model may generate redundant information (e.g., the entire option string, the option sentence, etc.), we also parse these outputs based on some pre-defined rule-based matching.

Evaluation

Full set: Submission

EgoSchema is intended for a 0-shot evaluation benchmark, hence the entire correct answer file will not be make public.

lmms-eval will automatically generate a submission file inference_results_egoschema_{taskname}_{now_date_time}.json under logs/. To evaluate on the entire benchmark, please submit the generated submission file using CURL:

curl -X POST -H "Content-Type: application/json" -d @<path_to_json_file> https://validation-server.onrender.com/api/upload/

Subset: Direct Scoring

EgoSchema also release the correct answers to only 500 of the EgoSchema questions provided in the subset_answers.json file intended for offline experimentation and performance tracking. Hence,lmms-eval will automatically generate the score for subset.

Tasks

  • egoschema: Standard MCQA for Full set.
  • egoschema_mc_ppl: MCQA Perplexity task format for Full set.
  • egoschema_subset: Standard MCQA for Subset.
  • egoschema_subset_mcppl: MCQA Perplexity task format for Subset.

Citation

@inproceedings{NEURIPS2023_90ce332a,
 author = {Mangalam, Karttikeya and Akshulakov, Raiymbek and Malik, Jitendra},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {46212--46244},
 publisher = {Curran Associates, Inc.},
 title = {EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/90ce332aff156b910b002ce4e6880dec-Paper-Datasets_and_Benchmarks.pdf},
 volume = {36},
 year = {2023}
}