CharadesSTA
multimodal
+
+
+
+
About
CharadesSTA is a video understanding benchmark that evaluates AI models' ability to perform spatio-temporal action localization in videos. The benchmark tests models' capability to identify and temporally locate human activities within video sequences, requiring both spatial and temporal reasoning. CharadesSTA challenges AI systems to understand complex human actions in realistic video scenarios, making it essential for video analysis and action recognition research.
+
+
+
+
Evaluation Stats
Total Models2
Organizations1
Verified Results0
Self-Reported2
+
+
+
+
Benchmark Details
Max Score1
Language
en
+
+
+
+
Performance Overview
Score distribution and top performers
Score Distribution
2 models
Top Score
54.2%
Average Score
48.9%
High Performers (80%+)
0Top Organizations
#1Alibaba Cloud / Qwen Team
2 models
48.9%
+
+
+
+
Leaderboard
2 models ranked by performance on CharadesSTA
License | Links | ||||
---|---|---|---|---|---|
Feb 28, 2025 | Apache 2.0 | 54.2% | |||
Jan 26, 2025 | Apache 2.0 | 43.6% |