OCRBench

multimodal

About

OCRBench is a comprehensive optical character recognition benchmark for evaluating text recognition capabilities of multimodal models. It tests models' ability to accurately recognize, extract, and understand text from images across various formats, fonts, languages, and visual contexts, providing systematic assessment of OCR performance and text-centric visual understanding.

Evaluation Stats

Total Models7

Organizations3

Verified Results0

Self-Reported7

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

7 models

Top Score

88.5%

Average Score

84.6%

High Performers (80%+)

Top Organizations

#1Alibaba Cloud / Qwen Team

3 models

87.5%

#2Microsoft

1 model

84.4%

#3DeepSeek

3 models

81.8%

Leaderboard

7 models ranked by performance on OCRBench

			License
#01Qwen2.5 VL 72B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	tongyi-qianwen	88.5%
#02Qwen2-VL-72B-Instruct	Alibaba Cloud / Qwen Team	Aug 29, 2024	tongyi-qianwen	87.7%
#03Qwen2.5 VL 7B Instruct	Alibaba Cloud / Qwen Team	Jan 26, 2025	Apache 2.0	86.4%
#04Phi-4-multimodal-instruct	Microsoft	Feb 1, 2025	MIT	84.4%
#05DeepSeek VL2 Small	DeepSeek	Dec 13, 2024	deepseek	83.4%
#06DeepSeek VL2	DeepSeek	Dec 13, 2024	deepseek	81.1%
#07DeepSeek VL2 Tiny	DeepSeek	Dec 13, 2024	deepseek	80.9%

Resources

Research Paper