
Granite 3.3 8B Instruct
Multimodal
Zero-eval
#2AttaQ
#2TruthfulQA
#2AlpacaEval 2.0
+1 more
by IBM
+
+
+
+
About
Granite 3.3 8B Instruct is a multimodal language model developed by IBM. It achieves strong performance with an average score of 69.8% across 14 benchmarks. It excels particularly in HumanEval (89.7%), AttaQ (88.5%), HumanEval+ (86.1%). As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents IBM's latest advancement in AI technology.
+
+
+
+
Timeline
AnnouncedApr 16, 2025
ReleasedApr 16, 2025
Knowledge CutoffApr 1, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
14 benchmarks
Average Score
69.8%
Best Score
89.7%
High Performers (80%+)
5+
+
+
+
All Benchmark Results for Granite 3.3 8B Instruct
Complete list of benchmark scores with detailed information
HumanEval | text | 0.90 | 89.7% | Self-reported | |
AttaQ | text | 0.89 | 88.5% | Self-reported | |
HumanEval+ | text | 0.86 | 86.1% | Self-reported | |
AIME 2024 | text | 0.81 | 81.2% | Self-reported | |
GSM8k | text | 0.81 | 80.9% | Self-reported | |
IFEval | text | 0.75 | 74.8% | Self-reported | |
BIG-Bench Hard | text | 0.69 | 69.1% | Self-reported | |
MATH-500 | text | 0.69 | 69.0% | Self-reported | |
TruthfulQA | text | 0.67 | 66.9% | Self-reported | |
MMLU | text | 0.66 | 65.5% | Self-reported |
Showing 1 to 10 of 14 benchmarks