
Gemma 3 4B
Multimodal
Zero-eval
#3VQAv2 (val)
#3MMMU (val)
by Google
+
+
+
+
About
Gemma 3 4B is a multimodal language model developed by Google. The model shows competitive results across 26 benchmarks. It excels particularly in IFEval (90.2%), GSM8k (89.2%), DocVQA (75.8%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.
+
+
+
+
Pricing Range
Input (per 1M)$0.02 -$0.02
Output (per 1M)$0.04 -$0.04
Providers1
+
+
+
+
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Knowledge CutoffAug 1, 2024
+
+
+
+
Specifications
Training Tokens4.0T
Capabilities
Multimodal
+
+
+
+
License & Family
License
Gemma
Performance Overview
Performance metrics and category breakdown
Overall Performance
26 benchmarks
Average Score
53.0%
Best Score
90.2%
High Performers (80%+)
2Performance Metrics
Max Context Window
262.1KAvg Throughput
33.0 tok/sAvg Latency
0ms+
+
+
+
All Benchmark Results for Gemma 3 4B
Complete list of benchmark scores with detailed information
IFEval | text | 0.90 | 90.2% | Self-reported | |
GSM8k | text | 0.89 | 89.2% | Self-reported | |
DocVQA | multimodal | 0.76 | 75.8% | Self-reported | |
MATH | text | 0.76 | 75.6% | Self-reported | |
AI2D | multimodal | 0.75 | 74.8% | Self-reported | |
BIG-Bench Hard | text | 0.72 | 72.2% | Self-reported | |
HumanEval | text | 0.71 | 71.3% | Self-reported | |
Natural2Code | text | 0.70 | 70.3% | Self-reported | |
FACTS Grounding | text | 0.70 | 70.1% | Self-reported | |
ChartQA | multimodal | 0.69 | 68.8% | Self-reported |
Showing 1 to 10 of 26 benchmarks