Commit Graph

9 Commits

Author SHA1 Message Date
Johann Schopplich
acca69c64a chore(benchmarks): replace LLM-as-judge, new structural validation 2025-11-07 21:28:21 +01:00
Johann Schopplich
89df613059 chore(benchmarks): add structure-awareness questions 2025-11-07 09:03:51 +01:00
Johann Schopplich
a9d52fc69b chore: more work on benchmarks 2025-11-06 15:51:31 +01:00
Johann Schopplich
bc711ccecf test(benchmark): overhaul generation 2025-11-06 14:45:44 +01:00
Johann Schopplich
af17efe128 docs: add accuracy per 1k tokens report (closes #72) 2025-11-05 08:21:57 +01:00
Johann Schopplich
7b76acde31 docs: add benchmarks for gemini-2.5-flash 2025-10-27 16:02:51 +01:00
Johann Schopplich
05b3d43023 test: refactor accuracy benchmark generation 2025-10-27 14:07:20 +01:00
Johann Schopplich
1a5e6199ac test: update retrieval accuracy benchmarks 2025-10-27 13:45:48 +01:00
Johann Schopplich
3c840259fe test: add LLM retrieval accuracy tests 2025-10-27 11:48:33 +01:00