Pixtral-12B
Multimodal
Zero-eval
#1MM IF-Eval
#2VQAv2
#2MM-MT-Bench
by Mistral AI
+
+
+
+
About
Pixtral 12B was introduced as Mistral's multimodal vision-language model, designed to understand and reason about both images and text. Built with 12 billion parameters for integrated visual and textual processing, it extends Mistral's capabilities into multimodal applications.
+
+
+
+
Pricing Range
Input (per 1M)$0.15 -$0.15
Output (per 1M)$0.15 -$0.15
Providers1
+
+
+
+
Timeline
AnnouncedSep 17, 2024
ReleasedSep 17, 2024
+
+
+
+
Specifications
Capabilities
Multimodal
+
+
+
+
License & Family
License
Apache 2.0
Performance Overview
Performance metrics and category breakdown
Overall Performance
12 benchmarks
Average Score
66.9%
Best Score
90.7%
High Performers (80%+)
2Performance Metrics
Max Context Window
136.2KAvg Throughput
0.1 tok/sAvg Latency
1ms+
+
+
+
All Benchmark Results for Pixtral-12B
Complete list of benchmark scores with detailed information
| DocVQA | multimodal | 0.91 | 90.7% | Self-reported | |
| ChartQA | multimodal | 0.82 | 81.8% | Self-reported | |
| VQAv2 | multimodal | 0.79 | 78.6% | Self-reported | |
| MT-Bench | text | 0.77 | 76.8% | Self-reported | |
| HumanEval | text | 0.72 | 72.0% | Self-reported | |
| MMLU | text | 0.69 | 69.2% | Self-reported | |
| IFEval | text | 0.61 | 61.3% | Self-reported | |
| MM-MT-Bench | multimodal | 0.60 | 60.5% | Self-reported | |
| MathVista | multimodal | 0.58 | 58.0% | Self-reported | |
| MM IF-Eval | multimodal | 0.53 | 52.7% | Self-reported |
Showing 1 to 10 of 12 benchmarks
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+