Meta

Llama 3.2 11B Instruct

Multimodal
Zero-eval
#1VQAv2 (test)

by Meta

+
+
+
+
About

Llama 3.2 11B Instruct is a multimodal language model developed by Meta. It achieves strong performance with an average score of 63.6% across 11 benchmarks. It excels particularly in AI2D (91.1%), DocVQA (88.4%), ChartQA (83.4%). It supports a 256K token context window for handling large documents. The model is available through 6 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Meta's latest advancement in AI technology.

+
+
+
+
Pricing Range
Input (per 1M)$0.05 -$0.20
Output (per 1M)$0.05 -$0.30
Providers6
+
+
+
+
Timeline
AnnouncedSep 25, 2024
ReleasedSep 25, 2024
Knowledge CutoffDec 31, 2023
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Llama 3.2 Community License
Performance Overview
Performance metrics and category breakdown

Overall Performance

11 benchmarks
Average Score
63.6%
Best Score
91.1%
High Performers (80%+)
3

Performance Metrics

Max Context Window
256.0K
Avg Throughput
116.8 tok/s
Avg Latency
0ms
+
+
+
+
All Benchmark Results for Llama 3.2 11B Instruct
Complete list of benchmark scores with detailed information
AI2D
multimodal
0.91
91.1%
Self-reported
DocVQA
multimodal
0.88
88.4%
Self-reported
ChartQA
multimodal
0.83
83.4%
Self-reported
VQAv2 (test)
multimodal
0.75
75.2%
Self-reported
MMLU
text
0.73
73.0%
Self-reported
MGSM
text
0.69
68.9%
Self-reported
MATH
text
0.52
51.9%
Self-reported
MathVista
multimodal
0.52
51.5%
Self-reported
MMMU
multimodal
0.51
50.7%
Self-reported
MMMU-Pro
multimodal
0.33
33.0%
Self-reported
Showing 1 to 10 of 11 benchmarks