mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
docs: clarify retrieval accuracy metrics
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
### Retrieval Accuracy
|
||||
|
||||
Tested across **3 LLMs** with data retrieval tasks:
|
||||
Accuracy across **3 LLMs** on **159 data retrieval questions**:
|
||||
|
||||
```
|
||||
gpt-5-nano
|
||||
@@ -124,7 +124,7 @@ Four datasets designed to test different structural patterns:
|
||||
|
||||
#### Question Types
|
||||
|
||||
~160 questions are generated dynamically across three categories:
|
||||
159 questions are generated dynamically across three categories:
|
||||
|
||||
- **Field retrieval (50%)**: Direct value lookups
|
||||
- Example: "What is Alice's salary?" → `75000`
|
||||
|
||||
@@ -87,5 +87,5 @@
|
||||
"yaml-analytics": 2938,
|
||||
"yaml-github": 13129
|
||||
},
|
||||
"timestamp": "2025-10-28T06:43:10.560Z"
|
||||
"timestamp": "2025-10-28T07:39:09.360Z"
|
||||
}
|
||||
|
||||
@@ -177,10 +177,13 @@ ${tableRows}
|
||||
`.trimStart()
|
||||
}).join('\n')
|
||||
|
||||
// Calculate total unique questions
|
||||
const totalQuestions = [...new Set(results.map(r => r.questionId))].length
|
||||
|
||||
return `
|
||||
### Retrieval Accuracy
|
||||
|
||||
Tested across **${modelCount} ${modelCount === 1 ? 'LLM' : 'LLMs'}** with data retrieval tasks:
|
||||
Accuracy across **${modelCount} ${modelCount === 1 ? 'LLM' : 'LLMs'}** on **${totalQuestions} data retrieval questions**:
|
||||
|
||||
\`\`\`
|
||||
${modelBreakdown}
|
||||
@@ -217,7 +220,7 @@ Four datasets designed to test different structural patterns:
|
||||
|
||||
#### Question Types
|
||||
|
||||
~160 questions are generated dynamically across three categories:
|
||||
${totalQuestions} questions are generated dynamically across three categories:
|
||||
|
||||
- **Field retrieval (50%)**: Direct value lookups
|
||||
- Example: "What is Alice's salary?" → \`75000\`
|
||||
|
||||
Reference in New Issue
Block a user