docs: add benchmarks for gemini-2.5-flash

2026-01-29 23:34:10 +08:00 · 2025-10-27 16:02:51 +01:00
parent 77696ce932
commit 7b76acde31
10 changed files with 15837 additions and 7011 deletions
--- a/benchmarks/src/report.ts
+++ b/benchmarks/src/report.ts
@@ -201,8 +201,8 @@ ${modelPerformance}

 - **Semantic validation**: LLM-as-judge validates responses semantically (not exact string matching).
 - **Token counting**: Using \`gpt-tokenizer\` with \`o200k_base\` encoding.
- **Question types**: Field retrieval, aggregation, and filtering tasks.
- **Real data**: Faker.js-generated datasets + GitHub repositories.
+- **Question types**: ~160 questions across field retrieval, aggregation, and filtering tasks.
+- **Datasets**: Faker.js-generated datasets (seeded) + GitHub repositories.

 </details>
 `.trimStart()