docs: add dedicated docs website

This commit is contained in:
Johann Schopplich
2025-11-18 07:23:10 +01:00
parent 3e08f3b72b
commit 4b4f7c05f9
38 changed files with 4399 additions and 541 deletions

View File

@@ -138,11 +138,11 @@ grok-4-fast-non-reasoning
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
| `toon` | 62.9% | 8,780 | 83/132 |
| `csv` | 61.4% | 8,528 | 81/132 |
| `yaml` | 59.8% | 13,142 | 79/132 |
| `json-compact` | 55.3% | 11,465 | 73/132 |
| `json-pretty` | 56.1% | 15,158 | 74/132 |
| `toon` | 62.9% | 8,779 | 83/132 |
| `csv` | 61.4% | 8,527 | 81/132 |
| `yaml` | 59.8% | 13,141 | 79/132 |
| `json-compact` | 55.3% | 11,464 | 73/132 |
| `json-pretty` | 56.1% | 15,157 | 74/132 |
| `xml` | 48.5% | 17,105 | 64/132 |
##### Semi-uniform event logs
@@ -273,7 +273,7 @@ grok-4-fast-non-reasoning
#### What's Being Measured
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output only to read and understand it.
#### Datasets Tested

View File

@@ -280,7 +280,7 @@ ${modelPerformance}
#### What's Being Measured
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output only to read and understand it.
#### Datasets Tested