Mistral AI

Pixtral-12B

Multimodal
Zero-eval
#1MM IF-Eval
#2VQAv2
#2MM-MT-Bench

by Mistral AI

+
+
+
+
About

Pixtral 12B was introduced as Mistral's multimodal vision-language model, designed to understand and reason about both images and text. Built with 12 billion parameters for integrated visual and textual processing, it extends Mistral's capabilities into multimodal applications.

+
+
+
+
Pricing Range
Input (per 1M)$0.15 -$0.15
Output (per 1M)$0.15 -$0.15
Providers1
+
+
+
+
Timeline
AnnouncedSep 17, 2024
ReleasedSep 17, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown

Overall Performance

12 benchmarks
Average Score
66.9%
Best Score
90.7%
High Performers (80%+)
2

Performance Metrics

Max Context Window
136.2K
Avg Throughput
0.1 tok/s
Avg Latency
1ms
+
+
+
+
All Benchmark Results for Pixtral-12B
Complete list of benchmark scores with detailed information
DocVQA
multimodal
0.91
90.7%
Self-reported
ChartQA
multimodal
0.82
81.8%
Self-reported
VQAv2
multimodal
0.79
78.6%
Self-reported
MT-Bench
text
0.77
76.8%
Self-reported
HumanEval
text
0.72
72.0%
Self-reported
MMLU
text
0.69
69.2%
Self-reported
IFEval
text
0.61
61.3%
Self-reported
MM-MT-Bench
multimodal
0.60
60.5%
Self-reported
MathVista
multimodal
0.58
58.0%
Self-reported
MM IF-Eval
multimodal
0.53
52.7%
Self-reported
Showing 1 to 10 of 12 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+