Commit Graph

12 Commits

Author SHA1 Message Date
Johann Schopplich
acca69c64a chore(benchmarks): replace LLM-as-judge, new structural validation 2025-11-07 21:28:21 +01:00
Johann Schopplich
c6ba6446f5 chore(benchmarks): finalize structure-awareness run 2025-11-07 10:33:46 +01:00
Johann Schopplich
89df613059 chore(benchmarks): add structure-awareness questions 2025-11-07 09:03:51 +01:00
Johann Schopplich
a9d52fc69b chore: more work on benchmarks 2025-11-06 15:51:31 +01:00
Johann Schopplich
ecf578a7dc text(accuracy): add Grok-4-fast, remove default temperature 2025-10-28 22:54:00 +01:00
Johann Schopplich
67c0df8cb0 docs: overhaul retrieval accuracy benchmark 2025-10-28 20:22:43 +01:00
Johann Schopplich
cdd4a20c67 refactor: benchmarks code style 2025-10-28 08:02:57 +01:00
Johann Schopplich
8b9924ff05 refactor: token efficiency benchmark code 2025-10-28 07:42:49 +01:00
Johann Schopplich
7b76acde31 docs: add benchmarks for gemini-2.5-flash 2025-10-27 16:02:51 +01:00
Johann Schopplich
05b3d43023 test: refactor accuracy benchmark generation 2025-10-27 14:07:20 +01:00
Johann Schopplich
1a5e6199ac test: update retrieval accuracy benchmarks 2025-10-27 13:45:48 +01:00
Johann Schopplich
3c840259fe test: add LLM retrieval accuracy tests 2025-10-27 11:48:33 +01:00