docs: add dedicated docs website

2026-01-29 15:24:10 +08:00 · 2025-11-18 07:23:10 +01:00
parent 3e08f3b72b
commit 4b4f7c05f9
38 changed files with 4399 additions and 541 deletions
--- a/benchmarks/results/retrieval-accuracy.md
+++ b/benchmarks/results/retrieval-accuracy.md
@@ -138,11 +138,11 @@ grok-4-fast-non-reasoning

 | Format | Accuracy | Tokens | Correct/Total |
 | ------ | -------- | ------ | ------------- |
-| `toon` | 62.9% | 8,780 | 83/132 |
-| `csv` | 61.4% | 8,528 | 81/132 |
-| `yaml` | 59.8% | 13,142 | 79/132 |
-| `json-compact` | 55.3% | 11,465 | 73/132 |
-| `json-pretty` | 56.1% | 15,158 | 74/132 |
+| `toon` | 62.9% | 8,779 | 83/132 |
+| `csv` | 61.4% | 8,527 | 81/132 |
+| `yaml` | 59.8% | 13,141 | 79/132 |
+| `json-compact` | 55.3% | 11,464 | 73/132 |
+| `json-pretty` | 56.1% | 15,157 | 74/132 |
 | `xml` | 48.5% | 17,105 | 64/132 |

 ##### Semi-uniform event logs
@@ -273,7 +273,7 @@ grok-4-fast-non-reasoning

 #### What's Being Measured

-This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
+This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.

 #### Datasets Tested

--- a/benchmarks/src/report.ts
+++ b/benchmarks/src/report.ts
@@ -280,7 +280,7 @@ ${modelPerformance}

 #### What's Being Measured

-This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
+This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.

 #### Datasets Tested