CharadesSTA

multimodal

About

CharadesSTA is a video understanding benchmark that evaluates AI models' ability to perform spatio-temporal action localization in videos. The benchmark tests models' capability to identify and temporally locate human activities within video sequences, requiring both spatial and temporal reasoning. CharadesSTA challenges AI systems to understand complex human actions in realistic video scenarios, making it essential for video analysis and action recognition research.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

54.2%

Average Score

48.9%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

2 models

48.9%

Leaderboard

2 models ranked by performance on CharadesSTA

			License		Links
#01Qwen2.5 VL 32B Instruct	Alibaba Cloud / Qwen Team	Feb 28, 2025	Apache 2.0	54.2%
#02Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	43.6%

Resources

Research Paper