Johann Schopplich
|
acca69c64a
|
chore(benchmarks): replace LLM-as-judge, new structural validation
|
2025-11-07 21:28:21 +01:00 |
|
Johann Schopplich
|
c6ba6446f5
|
chore(benchmarks): finalize structure-awareness run
|
2025-11-07 10:33:46 +01:00 |
|
Johann Schopplich
|
89df613059
|
chore(benchmarks): add structure-awareness questions
|
2025-11-07 09:03:51 +01:00 |
|
Johann Schopplich
|
a9d52fc69b
|
chore: more work on benchmarks
|
2025-11-06 15:51:31 +01:00 |
|
Johann Schopplich
|
ecf578a7dc
|
text(accuracy): add Grok-4-fast, remove default temperature
|
2025-10-28 22:54:00 +01:00 |
|
Johann Schopplich
|
67c0df8cb0
|
docs: overhaul retrieval accuracy benchmark
|
2025-10-28 20:22:43 +01:00 |
|
Johann Schopplich
|
cdd4a20c67
|
refactor: benchmarks code style
|
2025-10-28 08:02:57 +01:00 |
|
Johann Schopplich
|
8b9924ff05
|
refactor: token efficiency benchmark code
|
2025-10-28 07:42:49 +01:00 |
|
Johann Schopplich
|
7b76acde31
|
docs: add benchmarks for gemini-2.5-flash
|
2025-10-27 16:02:51 +01:00 |
|
Johann Schopplich
|
05b3d43023
|
test: refactor accuracy benchmark generation
|
2025-10-27 14:07:20 +01:00 |
|
Johann Schopplich
|
1a5e6199ac
|
test: update retrieval accuracy benchmarks
|
2025-10-27 13:45:48 +01:00 |
|
Johann Schopplich
|
3c840259fe
|
test: add LLM retrieval accuracy tests
|
2025-10-27 11:48:33 +01:00 |
|