mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
docs: add dedicated docs website
This commit is contained in:
239
docs/guide/getting-started.md
Normal file
239
docs/guide/getting-started.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Getting Started
|
||||
|
||||
## What is TOON?
|
||||
|
||||
**Token-Oriented Object Notation** is a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
|
||||
|
||||
TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
|
||||
|
||||
Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
|
||||
|
||||
### Why TOON?
|
||||
|
||||
Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:
|
||||
|
||||
```json
|
||||
{
|
||||
"users": [
|
||||
{ "id": 1, "name": "Alice", "role": "admin" },
|
||||
{ "id": 2, "name": "Bob", "role": "user" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
YAML already reduces some redundancy with indentation instead of braces:
|
||||
|
||||
```yaml
|
||||
users:
|
||||
- id: 1
|
||||
name: Alice
|
||||
role: admin
|
||||
- id: 2
|
||||
name: Bob
|
||||
role: user
|
||||
```
|
||||
|
||||
TOON goes further by declaring fields once and streaming data as rows:
|
||||
|
||||
```yaml
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
```
|
||||
|
||||
The `[2]` declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The `{id,name,role}` declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.
|
||||
|
||||
For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON (235 tokens)]
|
||||
{
|
||||
"context": {
|
||||
"task": "Our favorite hikes together",
|
||||
"location": "Boulder",
|
||||
"season": "spring_2025"
|
||||
},
|
||||
"friends": ["ana", "luis", "sam"],
|
||||
"hikes": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Blue Lake Trail",
|
||||
"distanceKm": 7.5,
|
||||
"elevationGain": 320,
|
||||
"companion": "ana",
|
||||
"wasSunny": true
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Ridge Overlook",
|
||||
"distanceKm": 9.2,
|
||||
"elevationGain": 540,
|
||||
"companion": "luis",
|
||||
"wasSunny": false
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "Wildflower Loop",
|
||||
"distanceKm": 5.1,
|
||||
"elevationGain": 180,
|
||||
"companion": "sam",
|
||||
"wasSunny": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON (106 tokens)]
|
||||
context:
|
||||
task: Our favorite hikes together
|
||||
location: Boulder
|
||||
season: spring_2025
|
||||
friends[3]: ana,luis,sam
|
||||
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
|
||||
1,Blue Lake Trail,7.5,320,ana,true
|
||||
2,Ridge Overlook,9.2,540,luis,false
|
||||
3,Wildflower Loop,5.1,180,sam,true
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
Notice how TOON combines YAML's indentation for the `context` object with inline format for the primitive `friends` array and tabular format for the structured `hikes` array. Each format is chosen automatically based on the data structure.
|
||||
|
||||
### Design Goals
|
||||
|
||||
TOON is optimized for specific use cases. It aims to:
|
||||
|
||||
- Make uniform arrays of objects as compact as possible by declaring structure once and streaming data.
|
||||
- Stay fully lossless and deterministic – round-trips preserve all data and structure.
|
||||
- Keep parsing simple and robust for both LLMs and humans through explicit structure markers.
|
||||
- Provide validation guardrails (array lengths, field counts) that help detect truncation and malformed output.
|
||||
|
||||
## When to Use TOON
|
||||
|
||||
TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths (`[N]`) and field headers (`{fields}`) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.
|
||||
|
||||
::: tip Production Ready
|
||||
TOON is production-ready and actively maintained, with implementations in TypeScript, Python, Go, Rust, .NET, and more. The format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the [specification](https://github.com/toon-format/spec) or sharing feedback.
|
||||
:::
|
||||
|
||||
## When Not to Use TOON
|
||||
|
||||
TOON is not always the best choice. Consider alternatives when:
|
||||
|
||||
- **Deeply nested or non-uniform structures** (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
|
||||
- **Semi-uniform arrays** (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
|
||||
- **Pure tabular data**: CSV is smaller than TOON for flat tables. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
|
||||
- **Latency-critical applications**: Benchmark on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite TOON's lower token count.
|
||||
|
||||
> [!NOTE]
|
||||
> For data-driven comparisons across different structures, see [benchmarks](/guide/benchmarks). When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.
|
||||
|
||||
## Installation
|
||||
|
||||
### TypeScript Library
|
||||
|
||||
Install the library via your preferred package manager:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npm]
|
||||
npm install @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn add @toon-format/toon
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### CLI
|
||||
|
||||
The CLI can be used without installation via `npx`, or installed globally:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npx (no install)]
|
||||
npx @toon-format/cli input.json -o output.toon
|
||||
```
|
||||
|
||||
```bash [npm]
|
||||
npm install -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn global add @toon-format/cli
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
For full CLI documentation, see the [CLI reference](/cli/).
|
||||
|
||||
## Your First Example
|
||||
|
||||
The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.
|
||||
|
||||
Let's encode a simple dataset with the TypeScript library:
|
||||
|
||||
```ts
|
||||
import { encode } from '@toon-format/toon'
|
||||
|
||||
const data = {
|
||||
users: [
|
||||
{ id: 1, name: 'Alice', role: 'admin' },
|
||||
{ id: 2, name: 'Bob', role: 'user' }
|
||||
]
|
||||
}
|
||||
|
||||
console.log(encode(data))
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```yaml
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
```
|
||||
|
||||
### Decoding Back to JSON
|
||||
|
||||
Decoding is just as simple:
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const toon = `
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
`
|
||||
|
||||
const data = decode(toon)
|
||||
console.log(JSON.stringify(data, null, 2))
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"users": [
|
||||
{ "id": 1, "name": "Alice", "role": "admin" },
|
||||
{ "id": 2, "name": "Bob", "role": "user" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Round-tripping is lossless: `decode(encode(x))` always equals `x` (after normalization of non-JSON types like `Date`, `NaN`, etc.).
|
||||
|
||||
## Where to Go Next
|
||||
|
||||
Now that you've seen your first TOON document, read the [Format Overview](/guide/format-overview) for complete syntax details (objects, arrays, quoting rules, key folding), then explore [Using TOON with LLMs](/guide/llm-prompts) to see how to use it effectively in prompts. For implementation details, check the [API reference](/reference/api) (TypeScript) or the [specification](/reference/spec) (language-agnostic normative rules).
|
||||
Reference in New Issue
Block a user