RepoQA

Multilingual

text

About

RepoQA is the first benchmark specifically designed for long-context code understanding, featuring 500 code search tasks from 50 repositories across 5 programming languages. Using the Searching Needle Function (SNF) task, this multilingual evaluation tests AI models' ability to retrieve specific functions based on natural-language descriptions within extensive codebases, challenging long-context comprehension capabilities.

Evaluation Stats

Total Models2

Organizations1

Verified Results0

Self-Reported2

Benchmark Details

Max Score1

Language

Performance Overview

Score distribution and top performers

Score Distribution

2 models

Top Score

85.0%

Average Score

81.0%

High Performers (80%+)

Top Organizations

#1Microsoft

2 models

81.0%

Leaderboard

2 models ranked by performance on RepoQA

			License		Links
#01Phi-3.5-MoE-instruct	Microsoft	Aug 23, 2024	MIT	85.0%
#02Phi-3.5-mini-instruct	Microsoft	Aug 23, 2024	MIT	77.0%

Resources

Research Paper