docs: add dedicated docs website

2026-01-29 15:24:10 +08:00 · 2025-11-18 07:23:10 +01:00
parent 3e08f3b72b
commit 4b4f7c05f9
38 changed files with 4399 additions and 541 deletions
--- a/docs/guide/getting-started.md
+++ b/docs/guide/getting-started.md
@@ -0,0 +1,239 @@
+# Getting Started
+
+## What is TOON?
+
+**Token-Oriented Object Notation** is a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
+
+TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
+
+Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
+
+###  Why TOON?
+
+Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:
+
+```json
+{
+  "users": [
+    { "id": 1, "name": "Alice", "role": "admin" },
+    { "id": 2, "name": "Bob", "role": "user" }
+  ]
+}
+```
+
+YAML already reduces some redundancy with indentation instead of braces:
+
+```yaml
+users:
+  - id: 1
+    name: Alice
+    role: admin
+  - id: 2
+    name: Bob
+    role: user
+```
+
+TOON goes further by declaring fields once and streaming data as rows:
+
+```yaml
+users[2]{id,name,role}:
+  1,Alice,admin
+  2,Bob,user
+```
+
+The `[2]` declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The `{id,name,role}` declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.
+
+For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:
+
+::: code-group
+
+```json [JSON (235 tokens)]
+{
+  "context": {
+    "task": "Our favorite hikes together",
+    "location": "Boulder",
+    "season": "spring_2025"
+  },
+  "friends": ["ana", "luis", "sam"],
+  "hikes": [
+    {
+      "id": 1,
+      "name": "Blue Lake Trail",
+      "distanceKm": 7.5,
+      "elevationGain": 320,
+      "companion": "ana",
+      "wasSunny": true
+    },
+    {
+      "id": 2,
+      "name": "Ridge Overlook",
+      "distanceKm": 9.2,
+      "elevationGain": 540,
+      "companion": "luis",
+      "wasSunny": false
+    },
+    {
+      "id": 3,
+      "name": "Wildflower Loop",
+      "distanceKm": 5.1,
+      "elevationGain": 180,
+      "companion": "sam",
+      "wasSunny": true
+    }
+  ]
+}
+```
+
+```yaml [TOON (106 tokens)]
+context:
+  task: Our favorite hikes together
+  location: Boulder
+  season: spring_2025
+friends[3]: ana,luis,sam
+hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
+  1,Blue Lake Trail,7.5,320,ana,true
+  2,Ridge Overlook,9.2,540,luis,false
+  3,Wildflower Loop,5.1,180,sam,true
+```
+
+:::
+
+Notice how TOON combines YAML's indentation for the `context` object with inline format for the primitive `friends` array and tabular format for the structured `hikes` array. Each format is chosen automatically based on the data structure.
+
+### Design Goals
+
+TOON is optimized for specific use cases. It aims to:
+
+- Make uniform arrays of objects as compact as possible by declaring structure once and streaming data.
+- Stay fully lossless and deterministic – round-trips preserve all data and structure.
+- Keep parsing simple and robust for both LLMs and humans through explicit structure markers.
+- Provide validation guardrails (array lengths, field counts) that help detect truncation and malformed output.
+
+## When to Use TOON
+
+TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths (`[N]`) and field headers (`{fields}`) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.
+
+::: tip Production Ready
+TOON is production-ready and actively maintained, with implementations in TypeScript, Python, Go, Rust, .NET, and more. The format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the [specification](https://github.com/toon-format/spec) or sharing feedback.
+:::
+
+## When Not to Use TOON
+
+TOON is not always the best choice. Consider alternatives when:
+
+- **Deeply nested or non-uniform structures** (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
+- **Semi-uniform arrays** (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
+- **Pure tabular data**: CSV is smaller than TOON for flat tables. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
+- **Latency-critical applications**: Benchmark on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite TOON's lower token count.
+
+> [!NOTE]
+> For data-driven comparisons across different structures, see [benchmarks](/guide/benchmarks). When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.
+
+## Installation
+
+### TypeScript Library
+
+Install the library via your preferred package manager:
+
+::: code-group
+
+```bash [npm]
+npm install @toon-format/toon
+```
+
+```bash [pnpm]
+pnpm add @toon-format/toon
+```
+
+```bash [yarn]
+yarn add @toon-format/toon
+```
+
+:::
+
+### CLI
+
+The CLI can be used without installation via `npx`, or installed globally:
+
+::: code-group
+
+```bash [npx (no install)]
+npx @toon-format/cli input.json -o output.toon
+```
+
+```bash [npm]
+npm install -g @toon-format/cli
+```
+
+```bash [pnpm]
+pnpm add -g @toon-format/cli
+```
+
+```bash [yarn]
+yarn global add @toon-format/cli
+```
+
+:::
+
+For full CLI documentation, see the [CLI reference](/cli/).
+
+## Your First Example
+
+The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.
+
+Let's encode a simple dataset with the TypeScript library:
+
+```ts
+import { encode } from '@toon-format/toon'
+
+const data = {
+  users: [
+    { id: 1, name: 'Alice', role: 'admin' },
+    { id: 2, name: 'Bob', role: 'user' }
+  ]
+}
+
+console.log(encode(data))
+```
+
+**Output:**
+
+```yaml
+users[2]{id,name,role}:
+  1,Alice,admin
+  2,Bob,user
+```
+
+### Decoding Back to JSON
+
+Decoding is just as simple:
+
+```ts
+import { decode } from '@toon-format/toon'
+
+const toon = `
+users[2]{id,name,role}:
+  1,Alice,admin
+  2,Bob,user
+`
+
+const data = decode(toon)
+console.log(JSON.stringify(data, null, 2))
+```
+
+**Output:**
+
+```json
+{
+  "users": [
+    { "id": 1, "name": "Alice", "role": "admin" },
+    { "id": 2, "name": "Bob", "role": "user" }
+  ]
+}
+```
+
+Round-tripping is lossless: `decode(encode(x))` always equals `x` (after normalization of non-JSON types like `Date`, `NaN`, etc.).
+
+## Where to Go Next
+
+Now that you've seen your first TOON document, read the [Format Overview](/guide/format-overview) for complete syntax details (objects, arrays, quoting rules, key folding), then explore [Using TOON with LLMs](/guide/llm-prompts) to see how to use it effectively in prompts. For implementation details, check the [API reference](/reference/api) (TypeScript) or the [specification](/reference/spec) (language-agnostic normative rules).