Johann Schopplich
|
acca69c64a
|
chore(benchmarks): replace LLM-as-judge, new structural validation
|
2025-11-07 21:28:21 +01:00 |
|
Johann Schopplich
|
c6ba6446f5
|
chore(benchmarks): finalize structure-awareness run
|
2025-11-07 10:33:46 +01:00 |
|
Johann Schopplich
|
89df613059
|
chore(benchmarks): add structure-awareness questions
|
2025-11-07 09:03:51 +01:00 |
|
Johann Schopplich
|
54433de930
|
chore: split token efficiency benchmark into mixed/flat tracks
|
2025-11-06 22:17:18 +01:00 |
|
Johann Schopplich
|
2c4f3c4362
|
test: add benchmarks for compact vs. pretty JSON
|
2025-10-30 15:02:51 +01:00 |
|
Johann Schopplich
|
ecf578a7dc
|
text(accuracy): add Grok-4-fast, remove default temperature
|
2025-10-28 22:54:00 +01:00 |
|
Johann Schopplich
|
67c0df8cb0
|
docs: overhaul retrieval accuracy benchmark
|
2025-10-28 20:22:43 +01:00 |
|