mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
chore: more work on benchmarks
This commit is contained in:
@@ -34,7 +34,7 @@ Results are saved to `results/token-efficiency.md`.
|
||||
|
||||
Tests how well LLMs can answer questions about data in different formats (TOON, JSON, JSON compact, XML, YAML, CSV):
|
||||
|
||||
1. Generate ~150-160 questions across 6 datasets (CSV only included for datasets with flat/tabular structure)
|
||||
1. Generate ~200 questions across 6 datasets (CSV only included for datasets with flat/tabular structure)
|
||||
2. Convert each dataset to all supported formats
|
||||
3. Query each LLM with formatted data + question
|
||||
4. Validate answers using `gpt-5-nano` as judge
|
||||
|
||||
Reference in New Issue
Block a user