7.4 KiB
Getting Started
What is TOON?
Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
Why TOON?
Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
YAML already reduces some redundancy with indentation instead of braces:
users:
- id: 1
name: Alice
role: admin
- id: 2
name: Bob
role: user
TOON goes further by declaring fields once and streaming data as rows:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
The [2] declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The {id,name,role} declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.
For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:
::: code-group
{
"context": {
"task": "Our favorite hikes together",
"location": "Boulder",
"season": "spring_2025"
},
"friends": ["ana", "luis", "sam"],
"hikes": [
{
"id": 1,
"name": "Blue Lake Trail",
"distanceKm": 7.5,
"elevationGain": 320,
"companion": "ana",
"wasSunny": true
},
{
"id": 2,
"name": "Ridge Overlook",
"distanceKm": 9.2,
"elevationGain": 540,
"companion": "luis",
"wasSunny": false
},
{
"id": 3,
"name": "Wildflower Loop",
"distanceKm": 5.1,
"elevationGain": 180,
"companion": "sam",
"wasSunny": true
}
]
}
context:
task: Our favorite hikes together
location: Boulder
season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
1,Blue Lake Trail,7.5,320,ana,true
2,Ridge Overlook,9.2,540,luis,false
3,Wildflower Loop,5.1,180,sam,true
:::
Notice how TOON combines YAML's indentation for the context object with inline format for the primitive friends array and tabular format for the structured hikes array. Each format is chosen automatically based on the data structure.
Design Goals
TOON is optimized for specific use cases. It aims to:
- Make uniform arrays of objects as compact as possible by declaring structure once and streaming data.
- Stay fully lossless and deterministic – round-trips preserve all data and structure.
- Keep parsing simple and robust for both LLMs and humans through explicit structure markers.
- Provide validation guardrails (array lengths, field counts) that help detect truncation and malformed output.
When to Use TOON
TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths ([N]) and field headers ({fields}) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.
::: tip Production Ready TOON is production-ready and actively maintained, with implementations in TypeScript, Python, Go, Rust, .NET, and more. The format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the specification or sharing feedback. :::
When Not to Use TOON
TOON is not always the best choice. Consider alternatives when:
- Deeply nested or non-uniform structures (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
- Semi-uniform arrays (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
- Pure tabular data: CSV is smaller than TOON for flat tables. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
- Latency-critical applications: Benchmark on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite TOON's lower token count.
Note
For data-driven comparisons across different structures, see benchmarks. When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.
Installation
TypeScript Library
Install the library via your preferred package manager:
::: code-group
npm install @toon-format/toon
pnpm add @toon-format/toon
yarn add @toon-format/toon
:::
CLI
The CLI can be used without installation via npx, or installed globally:
::: code-group
npx @toon-format/cli input.json -o output.toon
npm install -g @toon-format/cli
pnpm add -g @toon-format/cli
yarn global add @toon-format/cli
:::
For full CLI documentation, see the CLI reference.
Your First Example
The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.
Let's encode a simple dataset with the TypeScript library:
import { encode } from '@toon-format/toon'
const data = {
users: [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'user' }
]
}
console.log(encode(data))
Output:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Decoding Back to JSON
Decoding is just as simple:
import { decode } from '@toon-format/toon'
const toon = `
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
`
const data = decode(toon)
console.log(JSON.stringify(data, null, 2))
Output:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
Round-tripping is lossless: decode(encode(x)) always equals x (after normalization of non-JSON types like Date, NaN, etc.).
Where to Go Next
Now that you've seen your first TOON document, read the Format Overview for complete syntax details (objects, arrays, quoting rules, key folding), then explore Using TOON with LLMs to see how to use it effectively in prompts. For implementation details, check the API reference (TypeScript) or the specification (language-agnostic normative rules).