Commit Graph

35 Commits

Author SHA1 Message Date
Johann Schopplich
c6ba6446f5 chore(benchmarks): finalize structure-awareness run 2025-11-07 10:33:46 +01:00
Johann Schopplich
89df613059 chore(benchmarks): add structure-awareness questions 2025-11-07 09:03:51 +01:00
Johann Schopplich
54433de930 chore: split token efficiency benchmark into mixed/flat tracks 2025-11-06 22:17:18 +01:00
Johann Schopplich
e22884308b chore(benchmarks): fix undefined in GitHub question generation 2025-11-06 16:06:31 +01:00
Johann Schopplich
a9d52fc69b chore: more work on benchmarks 2025-11-06 15:51:31 +01:00
Johann Schopplich
bc711ccecf test(benchmark): overhaul generation 2025-11-06 14:45:44 +01:00
Johann Schopplich
af17efe128 docs: add accuracy per 1k tokens report (closes #72) 2025-11-05 08:21:57 +01:00
Johann Schopplich
3472081b40 docs: clarify CSV vs TOON use cases 2025-11-04 18:12:19 +01:00
Johann Schopplich
c1527dcf80 chore: fix type issue 2025-11-02 18:34:00 +01:00
Johann Schopplich
8977c8c7d6 feat: use language-agnostic test suite 2025-11-02 18:31:06 +01:00
Johann Schopplich
5f09a14c61 chore: fix type issues 2025-11-01 17:15:37 +01:00
Johann Schopplich
753ee2cefd docs: add table of contents 2025-10-31 08:56:42 +01:00
Johann Schopplich
7317b869b1 docs: update benchmark README 2025-10-30 17:38:00 +01:00
Johann Schopplich
983728e913 refactor: progress bar configuration 2025-10-30 15:24:22 +01:00
Johann Schopplich
fb43bdf527 docs: adjust padding for benchmark comparison 2025-10-30 15:19:16 +01:00
Johann Schopplich
2c4f3c4362 test: add benchmarks for compact vs. pretty JSON 2025-10-30 15:02:51 +01:00
Johann Schopplich
38ea864763 docs: clarify TOON's advantages and optimal data structure 2025-10-29 19:04:04 +01:00
Johann Schopplich
45604b06e8 feat: decode method (#10) 2025-10-29 07:42:15 +01:00
Johann Schopplich
7db91398fe docs(benchmark): add YAML format support 2025-10-29 06:42:40 +01:00
Johann Schopplich
e757746351 docs(accuracy): highlight toon in perf table 2025-10-28 23:08:47 +01:00
Johann Schopplich
ecf578a7dc text(accuracy): add Grok-4-fast, remove default temperature 2025-10-28 22:54:00 +01:00
Johann Schopplich
67c0df8cb0 docs: overhaul retrieval accuracy benchmark 2025-10-28 20:22:43 +01:00
Johann Schopplich
52dc9c4b3f docs: clarify retrieval accuracy metrics 2025-10-28 08:39:43 +01:00
Johann Schopplich
cdd4a20c67 refactor: benchmarks code style 2025-10-28 08:02:57 +01:00
Johann Schopplich
352e936370 docs: update notes & limitations guide 2025-10-28 07:44:35 +01:00
Johann Schopplich
8b9924ff05 refactor: token efficiency benchmark code 2025-10-28 07:42:49 +01:00
Johann Schopplich
b839d35ad0 docs: how the benchmarks work section 2025-10-27 20:35:43 +01:00
Johann Schopplich
4ec7e84f5f refactor: shared utils for benchmark scripts 2025-10-27 17:37:27 +01:00
Johann Schopplich
7b76acde31 docs: add benchmarks for gemini-2.5-flash 2025-10-27 16:02:51 +01:00
Johann Schopplich
77696ce932 docs: benchmarks for XML format 2025-10-27 14:50:26 +01:00
Johann Schopplich
b9f54ba585 docs: update benchmark reports' readability 2025-10-27 14:18:37 +01:00
Johann Schopplich
05b3d43023 test: refactor accuracy benchmark generation 2025-10-27 14:07:20 +01:00
Johann Schopplich
1a5e6199ac test: update retrieval accuracy benchmarks 2025-10-27 13:45:48 +01:00
Johann Schopplich
b2c58d2b97 chore: fix linting issues 2025-10-27 11:49:40 +01:00
Johann Schopplich
3c840259fe test: add LLM retrieval accuracy tests 2025-10-27 11:48:33 +01:00