Google

Gemma 3 4B

Multimodal
Zero-eval
#3VQAv2 (val)
#3MMMU (val)

by Google

+
+
+
+
About

Gemma 3 4B is a multimodal language model developed by Google. The model shows competitive results across 26 benchmarks. It excels particularly in IFEval (90.2%), GSM8k (89.2%), DocVQA (75.8%). It supports a 262K token context window for handling large documents. The model is available through 1 API provider. As a multimodal model, it can process and understand text, images, and other input formats seamlessly. It's licensed for commercial use, making it suitable for enterprise applications. Released in 2025, it represents Google's latest advancement in AI technology.

+
+
+
+
Pricing Range
Input (per 1M)$0.02 -$0.02
Output (per 1M)$0.04 -$0.04
Providers1
+
+
+
+
Timeline
AnnouncedMar 12, 2025
ReleasedMar 12, 2025
Knowledge CutoffAug 1, 2024
+
+
+
+
Specifications
Training Tokens4.0T
Capabilities
Multimodal
+
+
+
+
License & Family
License
Gemma
Performance Overview
Performance metrics and category breakdown

Overall Performance

26 benchmarks
Average Score
53.0%
Best Score
90.2%
High Performers (80%+)
2

Performance Metrics

Max Context Window
262.1K
Avg Throughput
33.0 tok/s
Avg Latency
0ms
+
+
+
+
All Benchmark Results for Gemma 3 4B
Complete list of benchmark scores with detailed information
IFEval
text
0.90
90.2%
Self-reported
GSM8k
text
0.89
89.2%
Self-reported
DocVQA
multimodal
0.76
75.8%
Self-reported
MATH
text
0.76
75.6%
Self-reported
AI2D
multimodal
0.75
74.8%
Self-reported
BIG-Bench Hard
text
0.72
72.2%
Self-reported
HumanEval
text
0.71
71.3%
Self-reported
Natural2Code
text
0.70
70.3%
Self-reported
FACTS Grounding
text
0.70
70.1%
Self-reported
ChartQA
multimodal
0.69
68.8%
Self-reported
Showing 1 to 10 of 26 benchmarks