docs: clarify retrieval accuracy metrics

This commit is contained in:
Johann Schopplich
2025-10-28 08:39:43 +01:00
parent cdd4a20c67
commit 52dc9c4b3f
4 changed files with 13 additions and 14 deletions

View File

@@ -1,6 +1,6 @@
### Retrieval Accuracy
Tested across **3 LLMs** with data retrieval tasks:
Accuracy across **3 LLMs** on **159 data retrieval questions**:
```
gpt-5-nano
@@ -124,7 +124,7 @@ Four datasets designed to test different structural patterns:
#### Question Types
~160 questions are generated dynamically across three categories:
159 questions are generated dynamically across three categories:
- **Field retrieval (50%)**: Direct value lookups
- Example: "What is Alice's salary?" → `75000`

View File

@@ -87,5 +87,5 @@
"yaml-analytics": 2938,
"yaml-github": 13129
},
"timestamp": "2025-10-28T06:43:10.560Z"
"timestamp": "2025-10-28T07:39:09.360Z"
}

View File

@@ -177,10 +177,13 @@ ${tableRows}
`.trimStart()
}).join('\n')
// Calculate total unique questions
const totalQuestions = [...new Set(results.map(r => r.questionId))].length
return `
### Retrieval Accuracy
Tested across **${modelCount} ${modelCount === 1 ? 'LLM' : 'LLMs'}** with data retrieval tasks:
Accuracy across **${modelCount} ${modelCount === 1 ? 'LLM' : 'LLMs'}** on **${totalQuestions} data retrieval questions**:
\`\`\`
${modelBreakdown}
@@ -217,7 +220,7 @@ Four datasets designed to test different structural patterns:
#### Question Types
~160 questions are generated dynamically across three categories:
${totalQuestions} questions are generated dynamically across three categories:
- **Field retrieval (50%)**: Direct value lookups
- Example: "What is Alice's salary?" → \`75000\`