
Llama 3.2 11B Instruct
Multimodal
Zero-eval
#1VQAv2 (test)
by Meta
+
+
+
+
About
Llama 3.2 11B Instruct is a multimodal language model developed by Meta. It achieves strong performance with an average score of 63.6% across 11 benchmarks. It excels particularly in AI2D (91.1%), DocVQA (88.4%), ChartQA (83.4%). It supports a 256K token context window for handling large documents. The model is available through 6 API providers. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. Released in 2024, it represents Meta's latest advancement in AI technology.
+
+
+
+
Pricing Range
Input (per 1M)$0.05 -$0.20
Output (per 1M)$0.05 -$0.30
Providers6
+
+
+
+
Timeline
AnnouncedSep 25, 2024
ReleasedSep 25, 2024
Knowledge CutoffDec 31, 2023
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Llama 3.2 Community License
Performance Overview
Performance metrics and category breakdown
Overall Performance
11 benchmarks
Average Score
63.6%
Best Score
91.1%
High Performers (80%+)
3Performance Metrics
Max Context Window
256.0KAvg Throughput
116.8 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Llama 3.2 11B Instruct
Complete list of benchmark scores with detailed information
AI2D | multimodal | 0.91 | 91.1% | Self-reported | |
DocVQA | multimodal | 0.88 | 88.4% | Self-reported | |
ChartQA | multimodal | 0.83 | 83.4% | Self-reported | |
VQAv2 (test) | multimodal | 0.75 | 75.2% | Self-reported | |
MMLU | text | 0.73 | 73.0% | Self-reported | |
MGSM | text | 0.69 | 68.9% | Self-reported | |
MATH | text | 0.52 | 51.9% | Self-reported | |
MathVista | multimodal | 0.52 | 51.5% | Self-reported | |
MMMU | multimodal | 0.51 | 50.7% | Self-reported | |
MMMU-Pro | multimodal | 0.33 | 33.0% | Self-reported |
Showing 1 to 10 of 11 benchmarks