docs: add benchmarks for gemini-2.5-flash

This commit is contained in:
Johann Schopplich
2025-10-27 16:02:51 +01:00
parent 77696ce932
commit 7b76acde31
10 changed files with 15837 additions and 7011 deletions

View File

@@ -201,8 +201,8 @@ ${modelPerformance}
- **Semantic validation**: LLM-as-judge validates responses semantically (not exact string matching).
- **Token counting**: Using \`gpt-tokenizer\` with \`o200k_base\` encoding.
- **Question types**: Field retrieval, aggregation, and filtering tasks.
- **Real data**: Faker.js-generated datasets + GitHub repositories.
- **Question types**: ~160 questions across field retrieval, aggregation, and filtering tasks.
- **Datasets**: Faker.js-generated datasets (seeded) + GitHub repositories.
</details>
`.trimStart()