Commit Graph

6 Commits

Author SHA1 Message Date
Johann Schopplich
acca69c64a chore(benchmarks): replace LLM-as-judge, new structural validation 2025-11-07 21:28:21 +01:00
Johann Schopplich
54433de930 chore: split token efficiency benchmark into mixed/flat tracks 2025-11-06 22:17:18 +01:00
Johann Schopplich
a9d52fc69b chore: more work on benchmarks 2025-11-06 15:51:31 +01:00
Johann Schopplich
bc711ccecf test(benchmark): overhaul generation 2025-11-06 14:45:44 +01:00
Johann Schopplich
7317b869b1 docs: update benchmark README 2025-10-30 17:38:00 +01:00
Johann Schopplich
ecf578a7dc text(accuracy): add Grok-4-fast, remove default temperature 2025-10-28 22:54:00 +01:00