STEM

multimodal

About

STEM is a comprehensive benchmark evaluating AI models' capabilities across Science, Technology, Engineering, and Mathematics domains, testing knowledge integration and problem-solving skills in technical fields. This multidisciplinary evaluation assesses models' ability to apply scientific principles, mathematical reasoning, engineering concepts, and technological understanding in diverse STEM contexts and applications.

Evaluation Stats

Total Models1

Organizations1

Verified Results0

Self-Reported1

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

1 models

Top Score

34.0%

Average Score

34.0%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

1 model

34.0%

Leaderboard

1 models ranked by performance on STEM

			License		Links
#01Qwen2.5-Coder 7B Instruct	Alibaba Cloud / Qwen Team	Sep 19, 2024	Apache 2.0	34.0%

Resources

Research Paper