Johann Schopplich
|
acca69c64a
|
chore(benchmarks): replace LLM-as-judge, new structural validation
|
2025-11-07 21:28:21 +01:00 |
|
Johann Schopplich
|
89df613059
|
chore(benchmarks): add structure-awareness questions
|
2025-11-07 09:03:51 +01:00 |
|
Johann Schopplich
|
54433de930
|
chore: split token efficiency benchmark into mixed/flat tracks
|
2025-11-06 22:17:18 +01:00 |
|
Johann Schopplich
|
e22884308b
|
chore(benchmarks): fix undefined in GitHub question generation
|
2025-11-06 16:06:31 +01:00 |
|
Johann Schopplich
|
a9d52fc69b
|
chore: more work on benchmarks
|
2025-11-06 15:51:31 +01:00 |
|
Johann Schopplich
|
bc711ccecf
|
test(benchmark): overhaul generation
|
2025-11-06 14:45:44 +01:00 |
|
Johann Schopplich
|
67c0df8cb0
|
docs: overhaul retrieval accuracy benchmark
|
2025-10-28 20:22:43 +01:00 |
|
Johann Schopplich
|
4ec7e84f5f
|
refactor: shared utils for benchmark scripts
|
2025-10-27 17:37:27 +01:00 |
|
Johann Schopplich
|
77696ce932
|
docs: benchmarks for XML format
|
2025-10-27 14:50:26 +01:00 |
|
Johann Schopplich
|
05b3d43023
|
test: refactor accuracy benchmark generation
|
2025-10-27 14:07:20 +01:00 |
|
Johann Schopplich
|
3c840259fe
|
test: add LLM retrieval accuracy tests
|
2025-10-27 11:48:33 +01:00 |
|