AttaQ

text

About

AttaQ (Adversarial Question Attack) is a semi-automatically curated dataset featuring adversarial question attack samples designed to evaluate AI systems' robustness against challenging and potentially misleading questions. The benchmark tests models' ability to handle adversarial inputs and maintain reliable performance when faced with questions designed to exploit weaknesses in reasoning or comprehension. AttaQ serves as a critical evaluation tool for assessing AI safety and robustness.

Evaluation Stats

Total Models3

Organizations1

Verified Results0

Self-Reported3

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

3 models

Top Score

88.5%

Average Score

87.7%

High Performers (80%+)

Top Organizations

#1IBM

3 models

87.7%

Leaderboard

3 models ranked by performance on AttaQ

			License
#01Granite 3.3 8B Base	IBM	Apr 16, 2025	Apache 2.0	88.5%
#02Granite 3.3 8B Instruct	IBM	Apr 16, 2025	Apache 2.0	88.5%
#03IBM Granite 4.0 Tiny Preview	IBM	May 2, 2025	Apache 2.0	86.1%

Resources

Research Paper