docs: add dedicated docs website
39
.github/workflows/deploy.yml
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
name: Deploy
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
with:
|
||||
fetch-depth: 0
|
||||
- uses: pnpm/action-setup@v4
|
||||
- uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: 24
|
||||
- name: Get pnpm store directory
|
||||
id: pnpm-cache
|
||||
run: echo "pnpm_cache_dir=$(pnpm store path)" >> $GITHUB_OUTPUT
|
||||
- uses: actions/cache@v4
|
||||
with:
|
||||
path: ${{ steps.pnpm-cache.outputs.pnpm_cache_dir }}
|
||||
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
|
||||
restore-keys: |
|
||||
${{ runner.os }}-pnpm-store-
|
||||
|
||||
- run: pnpm install --frozen-lockfile
|
||||
- run: pnpm run docs:build
|
||||
|
||||
- name: Deploy to Cloudflare
|
||||
run: cd docs && npx wrangler deploy
|
||||
env:
|
||||
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
|
||||
2
.gitignore
vendored
@@ -2,5 +2,7 @@ dist
|
||||
node_modules
|
||||
.DS_Store
|
||||
.env
|
||||
docs/.vitepress/dist
|
||||
docs/.vitepress/cache
|
||||
packages/toon/test/fixtures/*.json
|
||||
packages/toon/test/fixtures/*.toon
|
||||
|
||||
573
README.md
@@ -29,9 +29,8 @@ Think of it as a translation layer: use JSON programmatically, and encode it as
|
||||
- [Playgrounds](#playgrounds)
|
||||
- [CLI](#cli)
|
||||
- [Format Overview](#format-overview)
|
||||
- [API](#api)
|
||||
- [Using TOON in LLM Prompts](#using-toon-in-llm-prompts)
|
||||
- [Syntax Cheatsheet](#syntax-cheatsheet)
|
||||
- [Using TOON with LLMs](#using-toon-with-llms)
|
||||
- [Documentation](#documentation)
|
||||
- [Other Implementations](#other-implementations)
|
||||
- [📋 Full Specification](https://github.com/toon-format/spec/blob/main/SPEC.md)
|
||||
|
||||
@@ -115,14 +114,12 @@ hikes:
|
||||
|
||||
TOON conveys the same information with **even fewer tokens** – combining YAML-like indentation with CSV-style tabular arrays:
|
||||
|
||||
```toon
|
||||
```yaml
|
||||
context:
|
||||
task: Our favorite hikes together
|
||||
location: Boulder
|
||||
season: spring_2025
|
||||
|
||||
friends[3]: ana,luis,sam
|
||||
|
||||
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
|
||||
1,Blue Lake Trail,7.5,320,ana,true
|
||||
2,Ridge Overlook,9.2,540,luis,false
|
||||
@@ -131,14 +128,12 @@ hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
|
||||
|
||||
## Key Features
|
||||
|
||||
- 💸 **Token-efficient:** typically 30-60% fewer tokens on large uniform arrays vs formatted JSON[^1]
|
||||
- 🤿 **LLM-friendly guardrails:** explicit lengths and fields enable validation
|
||||
- 🍱 **Minimal syntax:** removes redundant punctuation (braces, brackets, most quotes)
|
||||
- 📐 **Indentation-based structure:** like YAML, uses whitespace instead of braces
|
||||
- 🧺 **Tabular arrays:** declare keys once, stream data as rows
|
||||
- 🔗 **Optional key folding:** collapses single-key wrapper chains into dotted paths (e.g., `data.metadata.items`) to reduce indentation and tokens
|
||||
|
||||
[^1]: For flat tabular data, CSV is more compact. TOON adds minimal overhead to provide explicit structure and validation that improves LLM reliability.
|
||||
- 📊 **Token-Efficient & Accurate:** TOON reaches 74% accuracy (vs JSON's 70%) while using ~40% fewer tokens in mixed-structure benchmarks across 4 models.
|
||||
- 🔁 **JSON Data Model:** Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips.
|
||||
- 🛤️ **LLM-Friendly Guardrails:** Explicit [N] lengths and {fields} headers give models a clear schema to follow, improving parsing reliability.
|
||||
- 📐 **Minimal Syntax:** Uses indentation instead of braces and minimizes quoting, giving YAML-like readability with CSV-style compactness.
|
||||
- 🧺 **Tabular Arrays:** Uniform arrays of objects collapse into tables that declare fields once and stream row values line by line.
|
||||
- 🌐 **Multi-Language Ecosystem:** Spec-driven implementations in TypeScript, Python, Go, Rust, .NET, and other languages.
|
||||
|
||||
## When Not to Use TOON
|
||||
|
||||
@@ -302,11 +297,11 @@ grok-4-fast-non-reasoning
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `toon` | 62.9% | 8,780 | 83/132 |
|
||||
| `csv` | 61.4% | 8,528 | 81/132 |
|
||||
| `yaml` | 59.8% | 13,142 | 79/132 |
|
||||
| `json-compact` | 55.3% | 11,465 | 73/132 |
|
||||
| `json-pretty` | 56.1% | 15,158 | 74/132 |
|
||||
| `toon` | 62.9% | 8,779 | 83/132 |
|
||||
| `csv` | 61.4% | 8,527 | 81/132 |
|
||||
| `yaml` | 59.8% | 13,141 | 79/132 |
|
||||
| `json-compact` | 55.3% | 11,464 | 73/132 |
|
||||
| `json-pretty` | 56.1% | 15,157 | 74/132 |
|
||||
| `xml` | 48.5% | 17,105 | 64/132 |
|
||||
|
||||
##### Semi-uniform event logs
|
||||
@@ -437,7 +432,7 @@ grok-4-fast-non-reasoning
|
||||
|
||||
#### What's Being Measured
|
||||
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.
|
||||
|
||||
#### Datasets Tested
|
||||
|
||||
@@ -737,7 +732,7 @@ npx @toon-format/cli input.json -o output.toon
|
||||
echo '{"name": "Ada", "role": "dev"}' | npx @toon-format/cli
|
||||
```
|
||||
|
||||
See [CLI section](#cli) for all options and examples.
|
||||
See the [CLI section](#cli) for all options and examples.
|
||||
|
||||
### TypeScript Library
|
||||
|
||||
@@ -774,22 +769,12 @@ console.log(encode(data))
|
||||
|
||||
Experiment with TOON format interactively using these community-built tools for token comparison, format conversion, and validation:
|
||||
|
||||
- **[Format Tokenization Playground](https://www.curiouslychase.com/playground/format-tokenization-exploration)**
|
||||
- **[TOON Tools](https://toontools.vercel.app/)**
|
||||
- [Format Tokenization Playground](https://www.curiouslychase.com/playground/format-tokenization-exploration)
|
||||
- [TOON Tools](https://toontools.vercel.app/)
|
||||
|
||||
## CLI
|
||||
|
||||
Command-line tool for converting between JSON and TOON formats.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
npx @toon-format/cli [options] [input]
|
||||
```
|
||||
|
||||
**Standard input:** Omit the input argument or use `-` to read from stdin. This enables piping data directly from other commands.
|
||||
|
||||
**Auto-detection:** The CLI automatically detects the operation based on file extension (`.json` → encode, `.toon` → decode). When reading from stdin, use `--encode` or `--decode` flags to specify the operation (defaults to encode).
|
||||
Command-line tool for quick JSON↔TOON conversions, token analysis, and pipeline integration. Auto-detects format from file extension, supports stdin/stdout workflows, and offers delimiter options for maximum efficiency.
|
||||
|
||||
```bash
|
||||
# Encode JSON to TOON (auto-detected)
|
||||
@@ -798,516 +783,52 @@ npx @toon-format/cli input.json -o output.toon
|
||||
# Decode TOON to JSON (auto-detected)
|
||||
npx @toon-format/cli data.toon -o output.json
|
||||
|
||||
# Output to stdout
|
||||
npx @toon-format/cli input.json
|
||||
|
||||
# Pipe from stdin (no argument needed)
|
||||
cat data.json | npx @toon-format/cli
|
||||
echo '{"name": "Ada"}' | npx @toon-format/cli
|
||||
|
||||
# Explicit stdin with hyphen (equivalent to above)
|
||||
cat data.json | npx @toon-format/cli -
|
||||
# Output to stdout
|
||||
npx @toon-format/cli input.json
|
||||
|
||||
# Decode from stdin
|
||||
cat data.toon | npx @toon-format/cli --decode
|
||||
# Show token savings
|
||||
npx @toon-format/cli data.json --stats
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Description |
|
||||
| ------ | ----------- |
|
||||
| `-o, --output <file>` | Output file path (prints to stdout if omitted) |
|
||||
| `-e, --encode` | Force encode mode (overrides auto-detection) |
|
||||
| `-d, --decode` | Force decode mode (overrides auto-detection) |
|
||||
| `--delimiter <char>` | Array delimiter: `,` (comma), `\t` (tab), `\|` (pipe) |
|
||||
| `--indent <number>` | Indentation size (default: `2`) |
|
||||
| `--stats` | Show token count estimates and savings (encode only) |
|
||||
| `--no-strict` | Disable strict validation when decoding |
|
||||
| `--key-folding <mode>` | Key folding mode: `off`, `safe` (default: `off`) - collapses nested chains |
|
||||
| `--flatten-depth <number>` | Maximum segments to fold (default: `Infinity`) - requires `--key-folding safe` |
|
||||
| `--expand-paths <mode>` | Path expansion mode: `off`, `safe` (default: `off`) - reconstructs dotted keys |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Show token savings when encoding
|
||||
npx @toon-format/cli data.json --stats -o output.toon
|
||||
|
||||
# Tab-separated output (often more token-efficient)
|
||||
npx @toon-format/cli data.json --delimiter "\t" -o output.toon
|
||||
|
||||
# Pipe-separated output
|
||||
npx @toon-format/cli data.json --delimiter "|" -o output.toon
|
||||
|
||||
# Lenient decoding (skip validation)
|
||||
npx @toon-format/cli data.toon --no-strict -o output.json
|
||||
|
||||
# Key folding for nested data
|
||||
npx @toon-format/cli data.json --key-folding safe -o output.toon
|
||||
|
||||
# Stdin workflows
|
||||
echo '{"name": "Ada", "age": 30}' | npx @toon-format/cli --stats
|
||||
cat large-dataset.json | npx @toon-format/cli --delimiter "\t" > output.toon
|
||||
```
|
||||
> [!TIP]
|
||||
> See the full [CLI documentation](https://toonformat.dev/cli/) for all options, examples, and advanced usage.
|
||||
|
||||
## Format Overview
|
||||
|
||||
> [!NOTE]
|
||||
> For precise formatting rules and implementation details, see the [full specification](https://github.com/toon-format/spec).
|
||||
Detailed syntax references, implementation guides, and quick lookups for understanding and using the TOON format.
|
||||
|
||||
### Objects
|
||||
- [Format Overview](https://toonformat.dev/guide/format-overview) – Complete syntax documentation
|
||||
- [Syntax Cheatsheet](https://toonformat.dev/reference/syntax-cheatsheet) – Quick reference
|
||||
- [API Reference](https://toonformat.dev/reference/api) – Encode/decode usage (TypeScript)
|
||||
|
||||
Simple objects with primitive values:
|
||||
## Using TOON with LLMs
|
||||
|
||||
```ts
|
||||
encode({
|
||||
id: 123,
|
||||
name: 'Ada',
|
||||
active: true
|
||||
})
|
||||
```
|
||||
TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern. Wrap data in ` ```toon` code blocks for input, and show the expected header template when asking models to generate TOON. Use tab delimiters for even better token efficiency.
|
||||
|
||||
```
|
||||
id: 123
|
||||
name: Ada
|
||||
active: true
|
||||
```
|
||||
Follow the detailed [LLM integration guide](https://toonformat.dev/guide/llm-prompts) for strategies, examples, and validation techniques.
|
||||
|
||||
Nested objects:
|
||||
## Documentation
|
||||
|
||||
```ts
|
||||
encode({
|
||||
user: {
|
||||
id: 123,
|
||||
name: 'Ada'
|
||||
}
|
||||
})
|
||||
```
|
||||
Comprehensive guides, references, and resources to help you get the most out of the TOON format and tools.
|
||||
|
||||
```
|
||||
user:
|
||||
id: 123
|
||||
name: Ada
|
||||
```
|
||||
**Getting Started**
|
||||
- [Introduction & Installation](https://toonformat.dev/guide/getting-started) – What TOON is, when to use it, first steps
|
||||
- [Format Overview](https://toonformat.dev/guide/format-overview) – Complete syntax with examples
|
||||
- [Benchmarks](https://toonformat.dev/guide/benchmarks) – Accuracy & token efficiency results
|
||||
|
||||
### Key Folding (Optional)
|
||||
**Tools & Integration**
|
||||
- [CLI](https://toonformat.dev/cli/) – Command-line tool for JSON↔TOON conversions
|
||||
- [Using TOON with LLMs](https://toonformat.dev/guide/llm-prompts) – Prompting strategies & validation
|
||||
- [Playgrounds](https://toonformat.dev/ecosystem/tools-and-playgrounds) – Interactive tools
|
||||
|
||||
New in spec v1.5: Optionally collapse single-key wrapper chains into dotted paths to reduce tokens. Enable with `keyFolding: 'safe'`.
|
||||
|
||||
Standard nesting:
|
||||
|
||||
```
|
||||
data:
|
||||
metadata:
|
||||
items[2]: a,b
|
||||
```
|
||||
|
||||
With key folding:
|
||||
|
||||
```
|
||||
data.metadata.items[2]: a,b
|
||||
```
|
||||
|
||||
Round-trip with path expansion:
|
||||
|
||||
```ts
|
||||
import { decode, encode } from '@toon-format/toon'
|
||||
|
||||
const original = { data: { metadata: { items: ['a', 'b'] } } }
|
||||
|
||||
const toon = encode(original, { keyFolding: 'safe' })
|
||||
// → "data.metadata.items[2]: a,b"
|
||||
|
||||
const restored = decode(toon, { expandPaths: 'safe' })
|
||||
// → Matches original structure
|
||||
```
|
||||
|
||||
See §13.4 in the [specification](https://github.com/toon-format/spec/blob/main/SPEC.md#134-key-folding-and-path-expansion) for folding rules and safety guarantees.
|
||||
|
||||
### Arrays
|
||||
|
||||
> [!TIP]
|
||||
> TOON includes the array length in brackets (e.g., `items[3]`). When using comma delimiters (default), the delimiter is implicit. When using tab or pipe delimiters, the delimiter is explicitly shown in the header (e.g., `tags[2|]` or `[2 ]`). This encoding helps LLMs identify the delimiter and track the number of elements, reducing errors when generating or validating structured output.
|
||||
|
||||
#### Primitive Arrays (Inline)
|
||||
|
||||
```ts
|
||||
encode({
|
||||
tags: ['admin', 'ops', 'dev']
|
||||
})
|
||||
```
|
||||
|
||||
```
|
||||
tags[3]: admin,ops,dev
|
||||
```
|
||||
|
||||
#### Arrays of Objects (Tabular)
|
||||
|
||||
When all objects share the same primitive fields, TOON uses an efficient **tabular format**:
|
||||
|
||||
```ts
|
||||
encode({
|
||||
items: [
|
||||
{ sku: 'A1', qty: 2, price: 9.99 },
|
||||
{ sku: 'B2', qty: 1, price: 14.5 }
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
```
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
```
|
||||
|
||||
**Tabular formatting applies recursively:** nested arrays of objects (whether as object properties or inside list items) also use tabular format if they meet the same requirements.
|
||||
|
||||
```ts
|
||||
encode({
|
||||
items: [
|
||||
{
|
||||
users: [
|
||||
{ id: 1, name: 'Ada' },
|
||||
{ id: 2, name: 'Bob' }
|
||||
],
|
||||
status: 'active'
|
||||
}
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
```
|
||||
items[1]:
|
||||
- users[2]{id,name}:
|
||||
1,Ada
|
||||
2,Bob
|
||||
status: active
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Tabular format requires identical field sets across all objects (same keys, order doesn't matter) and primitive values only (strings, numbers, booleans, null).
|
||||
|
||||
#### Mixed and Non-Uniform Arrays
|
||||
|
||||
Arrays that don't meet the tabular requirements use list format:
|
||||
|
||||
```
|
||||
items[3]:
|
||||
- 1
|
||||
- a: 1
|
||||
- text
|
||||
```
|
||||
|
||||
When objects appear in list format, the first field is placed on the hyphen line:
|
||||
|
||||
```
|
||||
items[2]:
|
||||
- id: 1
|
||||
name: First
|
||||
- id: 2
|
||||
name: Second
|
||||
extra: true
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **Nested array indentation:** When the first field of a list item is an array (primitive, tabular, or nested), its contents are indented two spaces under the header line, and subsequent fields of the same object appear at that same indentation level. This remains unambiguous because list items begin with `"- "`, tabular arrays declare a fixed row count in their header, and object fields contain `":"`.
|
||||
|
||||
#### Arrays of Arrays
|
||||
|
||||
When you have arrays containing primitive inner arrays:
|
||||
|
||||
```ts
|
||||
encode({
|
||||
pairs: [
|
||||
[1, 2],
|
||||
[3, 4]
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
```
|
||||
pairs[2]:
|
||||
- [2]: 1,2
|
||||
- [2]: 3,4
|
||||
```
|
||||
|
||||
#### Empty Arrays and Objects
|
||||
|
||||
Empty containers have special representations:
|
||||
|
||||
```ts
|
||||
encode({ items: [] }) // items[0]:
|
||||
encode([]) // [0]:
|
||||
encode({}) // (empty output)
|
||||
encode({ config: {} }) // config:
|
||||
```
|
||||
|
||||
### Quoting Rules
|
||||
|
||||
TOON quotes strings **only when necessary** to maximize token efficiency:
|
||||
|
||||
- Inner spaces are allowed; leading or trailing spaces force quotes.
|
||||
- Unicode and emoji are safe unquoted.
|
||||
- Quotes and control characters are escaped with backslash.
|
||||
|
||||
> [!NOTE]
|
||||
> When using alternative delimiters (tab or pipe), the quoting rules adapt automatically. Strings containing the active delimiter will be quoted, while other delimiters remain safe.
|
||||
|
||||
#### Object Keys and Field Names
|
||||
|
||||
Keys are unquoted if they match the identifier pattern: start with a letter or underscore, followed by letters, digits, underscores, or dots (e.g., `id`, `userName`, `user_name`, `user.name`, `_private`). All other keys must be quoted (e.g., `"user name"`, `"order-id"`, `"123"`, `"order:id"`, `""`).
|
||||
|
||||
#### String Values
|
||||
|
||||
String values are quoted when any of the following is true:
|
||||
|
||||
| Condition | Examples |
|
||||
|---|---|
|
||||
| Empty string | `""` |
|
||||
| Leading or trailing spaces | `" padded "`, `" "` |
|
||||
| Contains active delimiter, colon, quote, backslash, or control chars | `"a,b"` (comma), `"a\tb"` (tab), `"a\|b"` (pipe), `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"` |
|
||||
| Looks like boolean/number/null | `"true"`, `"false"`, `"null"`, `"42"`, `"-3.14"`, `"1e-6"`, `"05"` |
|
||||
| Starts with `"- "` (list-like) | `"- item"` |
|
||||
| Looks like structural token | `"[5]"`, `"{key}"`, `"[3]: x,y"` |
|
||||
|
||||
**Examples of unquoted strings:** Unicode and emoji are safe (`hello 👋 world`), as are strings with inner spaces (`hello world`).
|
||||
|
||||
> [!IMPORTANT]
|
||||
> **Delimiter-aware quoting:** Unquoted strings never contain `:` or the active delimiter. This makes TOON reliably parseable with simple heuristics: split key/value on first `: `, and split array values on the delimiter declared in the array header. When using tab or pipe delimiters, commas don't need quoting – only the active delimiter triggers quoting for both array values and object values.
|
||||
|
||||
### Type Conversions
|
||||
|
||||
Some non-JSON types are automatically normalized for LLM-safe output:
|
||||
|
||||
| Input | Output |
|
||||
|---|---|
|
||||
| Number (finite) | Decimal form, no scientific notation (e.g., `-0` → `0`, `1e6` → `1000000`) |
|
||||
| Number (`NaN`, `±Infinity`) | `null` |
|
||||
| `BigInt` | If within safe integer range: converted to number. Otherwise: quoted decimal string (e.g., `"9007199254740993"`) |
|
||||
| `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
|
||||
| `undefined` | `null` |
|
||||
| `function` | `null` |
|
||||
| `symbol` | `null` |
|
||||
|
||||
## API
|
||||
|
||||
### `encode(value: unknown, options?: EncodeOptions): string`
|
||||
|
||||
Converts any JSON-serializable value to TOON format.
|
||||
|
||||
**Parameters:**
|
||||
|
||||
- `value` – Any JSON-serializable value (object, array, primitive, or nested structure). Non-JSON-serializable values (functions, symbols, undefined, non-finite numbers) are converted to `null`. Dates are converted to ISO strings, and BigInts are emitted as decimal integers (no quotes).
|
||||
- `options` – Optional encoding options:
|
||||
- `indent?: number` – Number of spaces per indentation level (default: `2`)
|
||||
- `delimiter?: ',' | '\t' | '|'` – Delimiter for array values and tabular rows (default: `','`)
|
||||
- `keyFolding?: 'off' | 'safe'` – Enable key folding to collapse single-key wrapper chains into dotted paths (default: `'off'`). When `'safe'`, only valid identifier segments are folded
|
||||
- `flattenDepth?: number` – Maximum number of segments to fold when `keyFolding` is enabled (default: `Infinity`). Values 0-1 have no practical effect
|
||||
|
||||
**Returns:**
|
||||
|
||||
A TOON-formatted string with no trailing newline or spaces.
|
||||
|
||||
**Example:**
|
||||
|
||||
```ts
|
||||
import { encode } from '@toon-format/toon'
|
||||
|
||||
const items = [
|
||||
{ sku: 'A1', qty: 2, price: 9.99 },
|
||||
{ sku: 'B2', qty: 1, price: 14.5 }
|
||||
]
|
||||
|
||||
encode({ items })
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
```
|
||||
|
||||
#### Delimiter Options
|
||||
|
||||
The `delimiter` option allows you to choose between comma (default), tab, or pipe delimiters for array values and tabular rows. Alternative delimiters can provide additional token savings in specific contexts.
|
||||
|
||||
##### Tab Delimiter (`\t`)
|
||||
|
||||
Using tab delimiters instead of commas can reduce token count further, especially for tabular data:
|
||||
|
||||
```ts
|
||||
const data = {
|
||||
items: [
|
||||
{ sku: 'A1', name: 'Widget', qty: 2, price: 9.99 },
|
||||
{ sku: 'B2', name: 'Gadget', qty: 1, price: 14.5 }
|
||||
]
|
||||
}
|
||||
|
||||
encode(data, { delimiter: '\t' })
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
items[2 ]{sku name qty price}:
|
||||
A1 Widget 2 9.99
|
||||
B2 Gadget 1 14.5
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
|
||||
- Tabs are single characters and often tokenize more efficiently than commas.
|
||||
- Tabs rarely appear in natural text, reducing the need for quote-escaping.
|
||||
- The delimiter is explicitly encoded in the array header, making it self-descriptive.
|
||||
|
||||
**Considerations:**
|
||||
|
||||
- Some terminals and editors may collapse or expand tabs visually.
|
||||
- String values containing tabs will still require quoting.
|
||||
|
||||
##### Pipe Delimiter (`|`)
|
||||
|
||||
Pipe delimiters offer a middle ground between commas and tabs:
|
||||
|
||||
```ts
|
||||
encode(data, { delimiter: '|' })
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```
|
||||
items[2|]{sku|name|qty|price}:
|
||||
A1|Widget|2|9.99
|
||||
B2|Gadget|1|14.5
|
||||
```
|
||||
|
||||
### `decode(input: string, options?: DecodeOptions): JsonValue`
|
||||
|
||||
Converts a TOON-formatted string back to JavaScript values.
|
||||
|
||||
**Parameters:**
|
||||
|
||||
- `input` – A TOON-formatted string to parse
|
||||
- `options` – Optional decoding options:
|
||||
- `indent?: number` – Expected number of spaces per indentation level (default: `2`)
|
||||
- `strict?: boolean` – Enable strict validation (default: `true`)
|
||||
- `expandPaths?: 'off' | 'safe'` – Enable path expansion to reconstruct dotted keys into nested objects (default: `'off'`). Pairs with `keyFolding: 'safe'` for lossless round-trips
|
||||
|
||||
**Returns:**
|
||||
|
||||
A JavaScript value (object, array, or primitive) representing the parsed TOON data.
|
||||
|
||||
**Example:**
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const toon = `
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
`
|
||||
|
||||
const data = decode(toon)
|
||||
// {
|
||||
// items: [
|
||||
// { sku: 'A1', qty: 2, price: 9.99 },
|
||||
// { sku: 'B2', qty: 1, price: 14.5 }
|
||||
// ]
|
||||
// }
|
||||
```
|
||||
|
||||
**Strict Mode:**
|
||||
|
||||
By default, the decoder validates input strictly:
|
||||
|
||||
- **Invalid escape sequences**: Throws on `"\x"`, unterminated strings.
|
||||
- **Syntax errors**: Throws on missing colons, malformed headers.
|
||||
- **Array length mismatches**: Throws when declared length doesn't match actual count.
|
||||
- **Delimiter mismatches**: Throws when row delimiters don't match header.
|
||||
|
||||
## Using TOON in LLM Prompts
|
||||
|
||||
TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.
|
||||
|
||||
### Sending TOON to LLMs (Input)
|
||||
|
||||
Wrap your encoded data in a fenced code block (label it \`\`\`toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit array lengths (`[N]`) and field headers (`{field1,field2}`) help the model track structure, especially for large tables.
|
||||
|
||||
### Generating TOON from LLMs (Output)
|
||||
|
||||
For output, be more explicit. When you want the model to **generate** TOON:
|
||||
|
||||
- **Show the expected header** (`users[N]{id,name,role}:`). The model fills rows instead of repeating keys, reducing generation errors.
|
||||
- **State the rules:** 2-space indent, no trailing spaces, `[N]` matches row count.
|
||||
|
||||
Here's a prompt that works for both reading and generating:
|
||||
|
||||
````
|
||||
Data is in TOON format (2-space indent, arrays show length and fields).
|
||||
|
||||
```toon
|
||||
users[3]{id,name,role,lastLogin}:
|
||||
1,Alice,admin,2025-01-15T10:30:00Z
|
||||
2,Bob,user,2025-01-14T15:22:00Z
|
||||
3,Charlie,user,2025-01-13T09:45:00Z
|
||||
```
|
||||
|
||||
Task: Return only users with role "user" as TOON. Use the same header. Set [N] to match the row count. Output only the code block.
|
||||
````
|
||||
|
||||
> [!TIP]
|
||||
> For large uniform tables, use `encode(data, { delimiter: '\t' })` and tell the model "fields are tab-separated." Tabs often tokenize better than commas and reduce the need for quote-escaping.
|
||||
|
||||
## Syntax Cheatsheet
|
||||
|
||||
<details>
|
||||
<summary><strong>Show format examples</strong></summary>
|
||||
|
||||
```
|
||||
// Object
|
||||
{ id: 1, name: 'Ada' } → id: 1
|
||||
name: Ada
|
||||
|
||||
// Nested object
|
||||
{ user: { id: 1 } } → user:
|
||||
id: 1
|
||||
|
||||
// Primitive array (inline)
|
||||
{ tags: ['foo', 'bar'] } → tags[2]: foo,bar
|
||||
|
||||
// Tabular array (uniform objects)
|
||||
{ items: [ → items[2]{id,qty}:
|
||||
{ id: 1, qty: 5 }, 1,5
|
||||
{ id: 2, qty: 3 } 2,3
|
||||
]}
|
||||
|
||||
// Mixed / non-uniform (list)
|
||||
{ items: [1, { a: 1 }, 'x'] } → items[3]:
|
||||
- 1
|
||||
- a: 1
|
||||
- x
|
||||
|
||||
// Array of arrays
|
||||
{ pairs: [[1, 2], [3, 4]] } → pairs[2]:
|
||||
- [2]: 1,2
|
||||
- [2]: 3,4
|
||||
|
||||
// Root array
|
||||
['x', 'y'] → [2]: x,y
|
||||
|
||||
// Empty containers
|
||||
{} → (empty output)
|
||||
{ items: [] } → items[0]:
|
||||
|
||||
// Special quoting
|
||||
{ note: 'hello, world' } → note: "hello, world"
|
||||
{ items: ['true', true] } → items[2]: "true",true
|
||||
```
|
||||
|
||||
</details>
|
||||
**Reference**
|
||||
- [API Reference](https://toonformat.dev/reference/api) – TypeScript/JavaScript encode/decode API
|
||||
- [Syntax Cheatsheet](https://toonformat.dev/reference/syntax-cheatsheet) – Quick format lookup
|
||||
- [Specification v2.0](https://github.com/toon-format/spec/blob/main/SPEC.md) – Normative rules for implementers
|
||||
|
||||
## Other Implementations
|
||||
|
||||
@@ -1321,7 +842,7 @@ Task: Return only users with role "user" as TOON. Use the same header. Set [N] t
|
||||
|
||||
- **.NET:** [toon_format](https://github.com/toon-format/toon-dotnet) *(in development)*
|
||||
- **Dart:** [toon](https://github.com/toon-format/toon-dart) *(in development)*
|
||||
- **Go:** [gotoon](https://github.com/toon-format/toon-go) *(in development)*
|
||||
- **Go:** [toon-go](https://github.com/toon-format/toon-go) *(in development)*
|
||||
- **Python:** [toon_format](https://github.com/toon-format/toon-python) *(in development)*
|
||||
- **Rust:** [toon_format](https://github.com/toon-format/toon-rust) *(in development)*
|
||||
|
||||
|
||||
7
automd.config.ts
Normal file
@@ -0,0 +1,7 @@
|
||||
import type { Config } from 'automd'
|
||||
|
||||
const config: Config = {
|
||||
input: ['docs/guide/benchmarks.md'],
|
||||
}
|
||||
|
||||
export default config
|
||||
@@ -138,11 +138,11 @@ grok-4-fast-non-reasoning
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `toon` | 62.9% | 8,780 | 83/132 |
|
||||
| `csv` | 61.4% | 8,528 | 81/132 |
|
||||
| `yaml` | 59.8% | 13,142 | 79/132 |
|
||||
| `json-compact` | 55.3% | 11,465 | 73/132 |
|
||||
| `json-pretty` | 56.1% | 15,158 | 74/132 |
|
||||
| `toon` | 62.9% | 8,779 | 83/132 |
|
||||
| `csv` | 61.4% | 8,527 | 81/132 |
|
||||
| `yaml` | 59.8% | 13,141 | 79/132 |
|
||||
| `json-compact` | 55.3% | 11,464 | 73/132 |
|
||||
| `json-pretty` | 56.1% | 15,157 | 74/132 |
|
||||
| `xml` | 48.5% | 17,105 | 64/132 |
|
||||
|
||||
##### Semi-uniform event logs
|
||||
@@ -273,7 +273,7 @@ grok-4-fast-non-reasoning
|
||||
|
||||
#### What's Being Measured
|
||||
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.
|
||||
|
||||
#### Datasets Tested
|
||||
|
||||
|
||||
@@ -280,7 +280,7 @@ ${modelPerformance}
|
||||
|
||||
#### What's Being Measured
|
||||
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.
|
||||
|
||||
#### Datasets Tested
|
||||
|
||||
|
||||
134
docs/.vitepress/config.ts
Normal file
@@ -0,0 +1,134 @@
|
||||
import type { DefaultTheme } from 'vitepress'
|
||||
import UnoCSS from 'unocss/vite'
|
||||
import { defineConfig } from 'vitepress'
|
||||
import { description, github, name, ogImage, ogUrl, releases, twitterImage, version } from './meta'
|
||||
|
||||
export default defineConfig({
|
||||
title: name,
|
||||
description,
|
||||
head: [
|
||||
['link', { rel: 'icon', href: '/favicon.svg', type: 'image/svg+xml' }],
|
||||
['meta', { name: 'author', content: 'Johann Schopplich' }],
|
||||
['meta', { property: 'og:type', content: 'website' }],
|
||||
['meta', { property: 'og:url', content: ogUrl }],
|
||||
['meta', { property: 'og:title', content: name }],
|
||||
['meta', { property: 'og:description', content: description }],
|
||||
['meta', { property: 'og:image', content: ogImage }],
|
||||
['meta', { name: 'twitter:title', content: name }],
|
||||
['meta', { name: 'twitter:description', content: description }],
|
||||
['meta', { name: 'twitter:image', content: twitterImage }],
|
||||
['meta', { name: 'twitter:site', content: '@jschopplich' }],
|
||||
['meta', { name: 'twitter:creator', content: '@jschopplich' }],
|
||||
['meta', { name: 'twitter:card', content: 'summary_large_image' }],
|
||||
],
|
||||
|
||||
vite: {
|
||||
// @ts-expect-error – UnoCSS types are not compatible with Vite yet
|
||||
plugins: [UnoCSS()],
|
||||
},
|
||||
|
||||
themeConfig: {
|
||||
logo: {
|
||||
dark: '/logo-nav-dark.svg',
|
||||
light: '/logo-nav-light.svg',
|
||||
},
|
||||
|
||||
nav: [
|
||||
{
|
||||
text: 'Guide',
|
||||
activeMatch: '^/guide/',
|
||||
items: [
|
||||
{ text: 'Getting Started', link: '/guide/getting-started' },
|
||||
{ text: 'Format Overview', link: '/guide/format-overview' },
|
||||
{ text: 'Using TOON with LLMs', link: '/guide/llm-prompts' },
|
||||
{ text: 'Benchmarks', link: '/guide/benchmarks' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: 'CLI',
|
||||
link: '/cli/',
|
||||
},
|
||||
{
|
||||
text: 'Reference',
|
||||
activeMatch: '^/reference/',
|
||||
items: [
|
||||
{ text: 'API', link: '/reference/api' },
|
||||
{ text: 'Syntax Cheatsheet', link: '/reference/syntax-cheatsheet' },
|
||||
{ text: 'Specification', link: '/reference/spec' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: 'Ecosystem',
|
||||
activeMatch: '^/ecosystem/',
|
||||
items: [
|
||||
{ text: 'Tools & Playgrounds', link: '/ecosystem/tools-and-playgrounds' },
|
||||
{ text: 'Implementations', link: '/ecosystem/implementations' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: `v${version}`,
|
||||
items: [
|
||||
{
|
||||
text: 'Release Notes',
|
||||
link: releases,
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
|
||||
sidebar: {
|
||||
'/guide/': sidebarPrimary(),
|
||||
'/cli/': sidebarPrimary(),
|
||||
'/reference/': sidebarPrimary(),
|
||||
'/ecosystem/': sidebarPrimary(),
|
||||
},
|
||||
|
||||
socialLinks: [
|
||||
{ icon: 'github', link: github },
|
||||
],
|
||||
|
||||
footer: {
|
||||
message: 'Released under the <a href="https://opensource.org/licenses/MIT" target="_blank">MIT License</a>.',
|
||||
copyright: 'Copyright © 2025-PRESENT <a href="https://github.com/johannschopplich" target="_blank">Johann Schopplich</a>',
|
||||
},
|
||||
|
||||
search: {
|
||||
provider: 'local',
|
||||
},
|
||||
},
|
||||
})
|
||||
|
||||
function sidebarPrimary(): DefaultTheme.SidebarItem[] {
|
||||
return [
|
||||
{
|
||||
text: 'Guide',
|
||||
items: [
|
||||
{ text: 'Getting Started', link: '/guide/getting-started' },
|
||||
{ text: 'Format Overview', link: '/guide/format-overview' },
|
||||
{ text: 'Using TOON with LLMs', link: '/guide/llm-prompts' },
|
||||
{ text: 'Benchmarks', link: '/guide/benchmarks' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: 'Tooling',
|
||||
items: [
|
||||
{ text: 'CLI Reference', link: '/cli/' },
|
||||
{ text: 'Tools & Playgrounds', link: '/ecosystem/tools-and-playgrounds' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: 'Ecosystem',
|
||||
items: [
|
||||
{ text: 'Implementations', link: '/ecosystem/implementations' },
|
||||
],
|
||||
},
|
||||
{
|
||||
text: 'Reference',
|
||||
items: [
|
||||
{ text: 'API (TypeScript)', link: '/reference/api' },
|
||||
{ text: 'Syntax Cheatsheet', link: '/reference/syntax-cheatsheet' },
|
||||
{ text: 'Specification', link: '/reference/spec' },
|
||||
],
|
||||
},
|
||||
]
|
||||
}
|
||||
12
docs/.vitepress/meta.ts
Normal file
@@ -0,0 +1,12 @@
|
||||
export { description, version } from '../../packages/toon/package.json'
|
||||
|
||||
/* VitePress head */
|
||||
export const name = 'TOON'
|
||||
export const ogUrl = 'https://toonformat.dev/'
|
||||
export const ogImage = `${ogUrl}og.png`
|
||||
export const twitterImage = `${ogUrl}twitter.png`
|
||||
|
||||
/* GitHub and social links */
|
||||
export const github = 'https://github.com/toon-format/toon'
|
||||
export const releases = 'https://github.com/toon-format/toon/releases'
|
||||
export const twitter = 'https://twitter.com/jschopplich'
|
||||
9
docs/.vitepress/theme/index.ts
Normal file
@@ -0,0 +1,9 @@
|
||||
import DefaultTheme from 'vitepress/theme'
|
||||
|
||||
import './vars.css'
|
||||
import './overrides.css'
|
||||
import 'uno.css'
|
||||
|
||||
export default {
|
||||
...DefaultTheme,
|
||||
}
|
||||
16
docs/.vitepress/theme/overrides.css
Normal file
@@ -0,0 +1,16 @@
|
||||
.dark [img-light] {
|
||||
display: none;
|
||||
}
|
||||
|
||||
html:not(.dark) [img-dark] {
|
||||
display: none;
|
||||
}
|
||||
|
||||
details summary {
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.VPHomeHero .image-src {
|
||||
max-width: 180px !important;
|
||||
max-height: 180px !important;
|
||||
}
|
||||
41
docs/.vitepress/theme/vars.css
Normal file
@@ -0,0 +1,41 @@
|
||||
/**
|
||||
* Colors Theme
|
||||
* -------------------------------------------------------------------------- */
|
||||
|
||||
:root {
|
||||
--vp-c-brand-1: #d97c06;
|
||||
--vp-c-brand-2: #C57105;
|
||||
--vp-c-brand-3: #B16505;
|
||||
--vp-nav-logo-height: 20px;
|
||||
}
|
||||
|
||||
/**
|
||||
* Component: Home
|
||||
* -------------------------------------------------------------------------- */
|
||||
|
||||
:root {
|
||||
--vp-home-hero-name-color: transparent;
|
||||
--vp-home-hero-name-background: -webkit-linear-gradient(
|
||||
120deg,
|
||||
#fde98a 15%,
|
||||
#d97c06
|
||||
);
|
||||
--vp-home-hero-image-background-image: linear-gradient(
|
||||
-45deg,
|
||||
#d97c0660 30%,
|
||||
#fde98a60
|
||||
);
|
||||
--vp-home-hero-image-filter: blur(30px);
|
||||
}
|
||||
|
||||
@media (min-width: 640px) {
|
||||
:root {
|
||||
--vp-home-hero-image-filter: blur(56px);
|
||||
}
|
||||
}
|
||||
|
||||
@media (min-width: 960px) {
|
||||
:root {
|
||||
--vp-home-hero-image-filter: blur(72px);
|
||||
}
|
||||
}
|
||||
268
docs/cli/index.md
Normal file
@@ -0,0 +1,268 @@
|
||||
# Command Line Interface
|
||||
|
||||
The `@toon-format/cli` package provides a command-line interface for encoding JSON to TOON and decoding TOON back to JSON. Use it for quick conversions without writing code, estimating token savings before sending data to LLMs, or integrating TOON into shell pipelines with tools like curl and jq. It supports stdin/stdout workflows, multiple delimiter options, token statistics, and all encoding/decoding features available in the library.
|
||||
|
||||
The CLI is built on top of the `@toon-format/toon` TypeScript implementation and adheres to the [latest specification](/reference/spec).
|
||||
|
||||
## Usage
|
||||
|
||||
### Without Installation
|
||||
|
||||
Use `npx` to run the CLI without installing:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [Encode]
|
||||
npx @toon-format/cli input.json -o output.toon
|
||||
```
|
||||
|
||||
```bash [Decode]
|
||||
npx @toon-format/cli data.toon -o output.json
|
||||
```
|
||||
|
||||
```bash [Stdin]
|
||||
echo '{"name": "Ada"}' | npx @toon-format/cli
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### Global Installation
|
||||
|
||||
Or install globally for repeated use:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npm]
|
||||
npm install -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn global add @toon-format/cli
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
After global installation, use the `toon` command:
|
||||
|
||||
```bash
|
||||
toon input.json -o output.toon
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### Auto-Detection
|
||||
|
||||
The CLI automatically detects the operation based on file extension:
|
||||
- `.json` files → encode (JSON to TOON)
|
||||
- `.toon` files → decode (TOON to JSON)
|
||||
|
||||
When reading from stdin, use `--encode` or `--decode` flags to specify the operation (defaults to encode).
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [Encode JSON to TOON]
|
||||
toon input.json -o output.toon
|
||||
```
|
||||
|
||||
```bash [Decode TOON to JSON]
|
||||
toon data.toon -o output.json
|
||||
```
|
||||
|
||||
```bash [Output to stdout]
|
||||
toon input.json
|
||||
```
|
||||
|
||||
```bash [Pipe from stdin]
|
||||
cat data.json | toon
|
||||
echo '{"name": "Ada"}' | toon
|
||||
```
|
||||
|
||||
```bash [Decode from stdin]
|
||||
cat data.toon | toon --decode
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### Standard Input
|
||||
|
||||
Omit the input argument or use `-` to read from stdin. This enables piping data directly from other commands:
|
||||
|
||||
```bash
|
||||
# No argument needed
|
||||
cat data.json | toon
|
||||
|
||||
# Explicit stdin with hyphen (equivalent)
|
||||
cat data.json | toon -
|
||||
|
||||
# Decode from stdin
|
||||
cat data.toon | toon --decode
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
| Option | Description |
|
||||
| ------ | ----------- |
|
||||
| `-o, --output <file>` | Output file path (prints to stdout if omitted) |
|
||||
| `-e, --encode` | Force encode mode (overrides auto-detection) |
|
||||
| `-d, --decode` | Force decode mode (overrides auto-detection) |
|
||||
| `--delimiter <char>` | Array delimiter: `,` (comma), `\t` (tab), `\|` (pipe) |
|
||||
| `--indent <number>` | Indentation size (default: `2`) |
|
||||
| `--stats` | Show token count estimates and savings (encode only) |
|
||||
| `--no-strict` | Disable strict validation when decoding |
|
||||
| `--key-folding <mode>` | Key folding mode: `off`, `safe` (default: `off`) |
|
||||
| `--flatten-depth <number>` | Maximum segments to fold (default: `Infinity`) – requires `--key-folding safe` |
|
||||
| `--expand-paths <mode>` | Path expansion mode: `off`, `safe` (default: `off`) |
|
||||
|
||||
## Advanced Examples
|
||||
|
||||
### Token Statistics
|
||||
|
||||
Show token savings when encoding:
|
||||
|
||||
```bash
|
||||
toon data.json --stats -o output.toon
|
||||
```
|
||||
|
||||
This helps you estimate token cost savings before sending data to LLMs.
|
||||
|
||||
### Alternative Delimiters
|
||||
|
||||
TOON supports three delimiters: comma (default), tab, and pipe. Alternative delimiters can provide additional token savings in specific contexts.
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [Tab-separated]
|
||||
toon data.json --delimiter "\t" -o output.toon
|
||||
```
|
||||
|
||||
```bash [Pipe-separated]
|
||||
toon data.json --delimiter "|" -o output.toon
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
**Tab delimiter example:**
|
||||
|
||||
::: code-group
|
||||
|
||||
```yaml [Tab]
|
||||
items[2 ]{id name qty price}:
|
||||
A1 Widget 2 9.99
|
||||
B2 Gadget 1 14.5
|
||||
```
|
||||
|
||||
```yaml [Comma (default)]
|
||||
items[2]{id,name,qty,price}:
|
||||
A1,Widget,2,9.99
|
||||
B2,Gadget,1,14.5
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
> [!TIP]
|
||||
> Tab delimiters often tokenize more efficiently than commas and reduce the need for quote-escaping. Use `--delimiter "\t"` for maximum token savings on large tabular data.
|
||||
|
||||
### Lenient Decoding
|
||||
|
||||
Skip validation for faster processing:
|
||||
|
||||
```bash
|
||||
toon data.toon --no-strict -o output.json
|
||||
```
|
||||
|
||||
Lenient mode (`--no-strict`) disables strict validation checks like array count matching, indentation multiples, and delimiter consistency. Use this when you trust the input and want faster decoding.
|
||||
|
||||
### Stdin Workflows
|
||||
|
||||
The CLI integrates seamlessly with Unix pipes and other command-line tools:
|
||||
|
||||
```bash
|
||||
# Convert API response to TOON
|
||||
curl https://api.example.com/data | toon --stats
|
||||
|
||||
# Process large dataset
|
||||
cat large-dataset.json | toon --delimiter "\t" > output.toon
|
||||
|
||||
# Chain with jq
|
||||
jq '.results' data.json | toon > filtered.toon
|
||||
```
|
||||
|
||||
### Key Folding
|
||||
|
||||
Collapse nested wrapper chains to reduce tokens (since spec v1.5):
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [Basic key folding]
|
||||
toon input.json --key-folding safe -o output.toon
|
||||
```
|
||||
|
||||
```bash [Limit folding depth]
|
||||
toon input.json --key-folding safe --flatten-depth 2 -o output.toon
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
**Example:**
|
||||
|
||||
For data like:
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"metadata": {
|
||||
"items": ["a", "b"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
With `--key-folding safe`, output becomes:
|
||||
|
||||
```yaml
|
||||
data.metadata.items[2]: a,b
|
||||
```
|
||||
|
||||
Instead of:
|
||||
|
||||
```yaml
|
||||
data:
|
||||
metadata:
|
||||
items[2]: a,b
|
||||
```
|
||||
|
||||
### Path Expansion
|
||||
|
||||
Reconstruct nested structure from folded keys when decoding:
|
||||
|
||||
```bash
|
||||
toon data.toon --expand-paths safe -o output.json
|
||||
```
|
||||
|
||||
This pairs with `--key-folding safe` for lossless round-trips.
|
||||
|
||||
### Round-Trip Workflow
|
||||
|
||||
```bash
|
||||
# Encode with folding
|
||||
toon input.json --key-folding safe -o compressed.toon
|
||||
|
||||
# Decode with expansion (restores original structure)
|
||||
toon compressed.toon --expand-paths safe -o output.json
|
||||
|
||||
# Verify round-trip
|
||||
diff input.json output.json
|
||||
```
|
||||
|
||||
### Combined Options
|
||||
|
||||
Combine multiple options for maximum efficiency:
|
||||
|
||||
```bash
|
||||
# Key folding + tab delimiter + stats
|
||||
toon data.json --key-folding safe --delimiter "\t" --stats -o output.toon
|
||||
```
|
||||
52
docs/ecosystem/implementations.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Implementations
|
||||
|
||||
TOON has official and community implementations across multiple programming languages. All implementations are intended to conform to the same [specification](https://github.com/toon-format/spec) to ensure compatibility and interoperability.
|
||||
|
||||
The code examples throughout this documentation site use the TypeScript implementation by default, but the format and concepts apply equally to all languages.
|
||||
|
||||
> [!NOTE]
|
||||
> When implementing TOON in other languages, please follow the [specification](https://github.com/toon-format/spec/blob/main/SPEC.md) to ensure compatibility across implementations. The [conformance tests](https://github.com/toon-format/spec/tree/main/tests) provide language-agnostic test fixtures that validate your implementation.
|
||||
|
||||
## Official Implementations
|
||||
|
||||
These implementations are actively being developed by dedicated teams. Contributions are welcome! Join the effort by opening issues, submitting PRs, or discussing implementation details in the respective repositories.
|
||||
|
||||
| Language | Repository | Status | Maintainer |
|
||||
|----------|------------|--------|------------|
|
||||
| **.NET** | [toon-dotnet](https://github.com/toon-format/toon-dotnet) | In Development | Official Team |
|
||||
| **Dart** | [toon-dart](https://github.com/toon-format/toon-dart) | In Development | Official Team |
|
||||
| **Go** | [toon-go](https://github.com/toon-format/toon-go) | In Development | Official Team |
|
||||
| **Python** | [toon-python](https://github.com/toon-format/toon-python) | In Development | Official Team |
|
||||
| **Rust** | [toon-rust](https://github.com/toon-format/toon-rust) | In Development | Official Team |
|
||||
| **TypeScript/JavaScript** | [toon](https://github.com/toon-format/toon/tree/main/packages/toon) | ✅ Stable | Official Team |
|
||||
|
||||
## Community Implementations
|
||||
|
||||
Community members have created implementations in additional languages:
|
||||
|
||||
| Language | Repository | Maintainer |
|
||||
|----------|------------|------------|
|
||||
| **C++** | [ctoon](https://github.com/mohammadraziei/ctoon) | [@mohammadraziei](https://github.com/mohammadraziei) |
|
||||
| **Clojure** | [toon](https://github.com/vadelabs/toon) | [@vadelabs](https://github.com/vadelabs) |
|
||||
| **Crystal** | [toon-crystal](https://github.com/mamantoha/toon-crystal) | [@mamantoha](https://github.com/mamantoha) |
|
||||
| **Elixir** | [toon_ex](https://github.com/kentaro/toon_ex) | [@kentaro](https://github.com/kentaro) |
|
||||
| **Gleam** | [toon_codec](https://github.com/axelbellec/toon_codec) | [@axelbellec](https://github.com/axelbellec) |
|
||||
| **Go** | [gotoon](https://github.com/alpkeskin/gotoon) | [@alpkeskin](https://github.com/alpkeskin) |
|
||||
| **Java** | [JToon](https://github.com/felipestanzani/JToon) | [@felipestanzani](https://github.com/felipestanzani) |
|
||||
| **Kotlin** | [kotlin-toon](https://github.com/vexpera-br/kotlin-toon) | [@vexpera-br](https://github.com/vexpera-br) |
|
||||
| **Lua/Neovim** | [toon.nvim](https://github.com/thalesgelinger/toon.nvim) | [@thalesgelinger](https://github.com/thalesgelinger) |
|
||||
| **OCaml** | [ocaml-toon](https://github.com/davesnx/ocaml-toon) | [@davesnx](https://github.com/davesnx) |
|
||||
| **PHP** | [toon-php](https://github.com/HelgeSverre/toon-php) | [@HelgeSverre](https://github.com/HelgeSverre) |
|
||||
| **R** | [toon](https://github.com/laresbernardo/toon) | [@laresbernardo](https://github.com/laresbernardo) |
|
||||
| **Ruby** | [toon-ruby](https://github.com/andrepcg/toon-ruby) | [@andrepcg](https://github.com/andrepcg) |
|
||||
| **Scala** | [toon4s](https://github.com/vim89/toon4s) | [@vim89](https://github.com/vim89) |
|
||||
| **Swift** | [TOONEncoder](https://github.com/mattt/TOONEncoder) | [@mattt](https://github.com/mattt) |
|
||||
|
||||
## Contributing an Implementation
|
||||
|
||||
Building a TOON implementation for a new language? Great! Here are some steps to get started:
|
||||
|
||||
1. **Follow the spec**: Implement the [latest specification](https://github.com/toon-format/spec/blob/main/SPEC.md).
|
||||
2. **Add tests**: Run the [reference test suite](https://github.com/toon-format/spec/tree/main/tests).
|
||||
3. **Document usage**: Provide clear README with installation and usage examples.
|
||||
4. **Share it**: Open a PR to add your implementation to the README at [github.com/toon-format/toon](https://github.com/toon-format/toon).
|
||||
42
docs/ecosystem/tools-and-playgrounds.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Tools & Playgrounds
|
||||
|
||||
Experiment with TOON format interactively using these community-built tools for token comparison, format conversion, and validation.
|
||||
|
||||
## Playgrounds
|
||||
|
||||
Experiment with TOON format interactively using these community-built tools for token comparison, format conversion, and validation:
|
||||
|
||||
- [Format Tokenization Playground](https://www.curiouslychase.com/playground/format-tokenization-exploration)
|
||||
- [TOON Tools](https://toontools.vercel.app/)
|
||||
|
||||
## CLI Tool
|
||||
|
||||
The official TOON CLI provides command-line conversion, token statistics, and all encoding/decoding features. See the [CLI reference](/cli/) for full documentation.
|
||||
|
||||
```bash
|
||||
npx @toon-format/cli input.json --stats -o output.toon
|
||||
```
|
||||
|
||||
## Editor Support
|
||||
|
||||
TOON syntax highlighting is available for popular editors:
|
||||
|
||||
- **VS Code**: Use YAML syntax highlighting as a close approximation (`.toon` files can be associated with YAML language mode).
|
||||
- **Vim/Neovim**: [toon.nvim](https://github.com/thalesgelinger/toon.nvim)
|
||||
|
||||
> [!NOTE]
|
||||
> Native TOON syntax highlighting extensions are in development. Contributions welcome!
|
||||
|
||||
## Web APIs
|
||||
|
||||
If you're building web applications that work with TOON, you can use the TypeScript library in the browser:
|
||||
|
||||
```ts
|
||||
import { decode, encode } from '@toon-format/toon'
|
||||
|
||||
// Works in browsers, Node.js, Deno, and Bun
|
||||
const toon = encode(data)
|
||||
const data = decode(toon)
|
||||
```
|
||||
|
||||
See the [API reference](/reference/api) for details.
|
||||
579
docs/guide/benchmarks.md
Normal file
@@ -0,0 +1,579 @@
|
||||
# Benchmarks
|
||||
|
||||
The benchmarks on this page measure TOON's performance across two key dimensions:
|
||||
|
||||
- **Retrieval Accuracy**: How well LLMs understand and extract information from different input formats.
|
||||
- **Token Efficiency**: How many tokens each format requires to represent the same data.
|
||||
|
||||
Benchmarks are organized into two tracks to ensure fair comparisons:
|
||||
|
||||
- **Mixed-Structure Track**: Datasets with nested or semi-uniform structures (TOON vs JSON, YAML, XML). CSV excluded as it cannot properly represent these structures.
|
||||
- **Flat-Only Track**: Datasets with flat tabular structures where CSV is applicable (CSV vs TOON vs JSON, YAML, XML).
|
||||
|
||||
## Retrieval Accuracy
|
||||
|
||||
<!-- automd:file src="../../benchmarks/results/retrieval-accuracy.md" -->
|
||||
|
||||
Benchmarks test LLM comprehension across different input formats using 209 data retrieval questions on 4 models.
|
||||
|
||||
<details>
|
||||
<summary><strong>Show Dataset Catalog</strong></summary>
|
||||
|
||||
#### Dataset Catalog
|
||||
|
||||
| Dataset | Rows | Structure | CSV Support | Eligibility |
|
||||
| ------- | ---- | --------- | ----------- | ----------- |
|
||||
| Uniform employee records | 100 | uniform | ✓ | 100% |
|
||||
| E-commerce orders with nested structures | 50 | nested | ✗ | 33% |
|
||||
| Time-series analytics data | 60 | uniform | ✓ | 100% |
|
||||
| Top 100 GitHub repositories | 100 | uniform | ✓ | 100% |
|
||||
| Semi-uniform event logs | 75 | semi-uniform | ✗ | 50% |
|
||||
| Deeply nested configuration | 11 | deep | ✗ | 0% |
|
||||
| Valid complete dataset (control) | 20 | uniform | ✓ | 100% |
|
||||
| Array truncated: 3 rows removed from end | 17 | uniform | ✓ | 100% |
|
||||
| Extra rows added beyond declared length | 23 | uniform | ✓ | 100% |
|
||||
| Inconsistent field count (missing salary in row 10) | 20 | uniform | ✓ | 100% |
|
||||
| Missing required fields (no email in multiple rows) | 20 | uniform | ✓ | 100% |
|
||||
|
||||
**Structure classes:**
|
||||
- **uniform**: All objects have identical fields with primitive values
|
||||
- **semi-uniform**: Mix of uniform and non-uniform structures
|
||||
- **nested**: Objects with nested structures (nested objects or arrays)
|
||||
- **deep**: Highly nested with minimal tabular eligibility
|
||||
|
||||
**CSV Support:** ✓ (supported), ✗ (not supported – would require lossy flattening)
|
||||
|
||||
**Eligibility:** Percentage of arrays that qualify for TOON's tabular format (uniform objects with primitive values)
|
||||
|
||||
</details>
|
||||
|
||||
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
||||
|
||||
Each format's overall performance, balancing accuracy against token cost:
|
||||
|
||||
```
|
||||
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
|
||||
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
||||
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
||||
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
||||
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
||||
```
|
||||
|
||||
TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
||||
|
||||
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
||||
|
||||
#### Per-Model Accuracy
|
||||
|
||||
Accuracy across 4 LLMs on 209 data retrieval questions:
|
||||
|
||||
```
|
||||
claude-haiku-4-5-20251001
|
||||
→ TOON ████████████░░░░░░░░ 59.8% (125/209)
|
||||
JSON ███████████░░░░░░░░░ 57.4% (120/209)
|
||||
YAML ███████████░░░░░░░░░ 56.0% (117/209)
|
||||
XML ███████████░░░░░░░░░ 55.5% (116/209)
|
||||
JSON compact ███████████░░░░░░░░░ 55.0% (115/209)
|
||||
CSV ██████████░░░░░░░░░░ 50.5% (55/109)
|
||||
|
||||
gemini-2.5-flash
|
||||
→ TOON ██████████████████░░ 87.6% (183/209)
|
||||
CSV █████████████████░░░ 86.2% (94/109)
|
||||
JSON compact ████████████████░░░░ 82.3% (172/209)
|
||||
YAML ████████████████░░░░ 79.4% (166/209)
|
||||
XML ████████████████░░░░ 79.4% (166/209)
|
||||
JSON ███████████████░░░░░ 77.0% (161/209)
|
||||
|
||||
gpt-5-nano
|
||||
→ TOON ██████████████████░░ 90.9% (190/209)
|
||||
JSON compact ██████████████████░░ 90.9% (190/209)
|
||||
JSON ██████████████████░░ 89.0% (186/209)
|
||||
CSV ██████████████████░░ 89.0% (97/109)
|
||||
YAML █████████████████░░░ 87.1% (182/209)
|
||||
XML ████████████████░░░░ 80.9% (169/209)
|
||||
|
||||
grok-4-fast-non-reasoning
|
||||
→ TOON ███████████░░░░░░░░░ 57.4% (120/209)
|
||||
JSON ███████████░░░░░░░░░ 55.5% (116/209)
|
||||
JSON compact ███████████░░░░░░░░░ 54.5% (114/209)
|
||||
YAML ███████████░░░░░░░░░ 53.6% (112/209)
|
||||
XML ███████████░░░░░░░░░ 52.6% (110/209)
|
||||
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
||||
```
|
||||
|
||||
**Key tradeoff:** TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
||||
|
||||
<details>
|
||||
<summary><strong>Performance by dataset, model, and question type</strong></summary>
|
||||
|
||||
#### Performance by Question Type
|
||||
|
||||
| Question Type | TOON | JSON compact | JSON | CSV | YAML | XML |
|
||||
| ------------- | ---- | ---- | ---- | ---- | ---- | ---- |
|
||||
| Field Retrieval | 99.6% | 99.3% | 99.3% | 100.0% | 98.2% | 98.9% |
|
||||
| Aggregation | 54.4% | 47.2% | 48.8% | 44.0% | 47.6% | 41.3% |
|
||||
| Filtering | 56.3% | 57.3% | 50.5% | 49.1% | 51.0% | 47.9% |
|
||||
| Structure Awareness | 88.0% | 83.0% | 83.0% | 85.9% | 80.0% | 80.0% |
|
||||
| Structural Validation | 70.0% | 45.0% | 50.0% | 80.0% | 60.0% | 80.0% |
|
||||
|
||||
#### Performance by Dataset
|
||||
|
||||
##### Uniform employee records
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 72.0% | 2,352 | 118/164 |
|
||||
| `toon` | 73.8% | 2,518 | 121/164 |
|
||||
| `json-compact` | 69.5% | 3,953 | 114/164 |
|
||||
| `yaml` | 68.3% | 4,982 | 112/164 |
|
||||
| `json-pretty` | 68.3% | 6,360 | 112/164 |
|
||||
| `xml` | 69.5% | 7,324 | 114/164 |
|
||||
|
||||
##### E-commerce orders with nested structures
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `toon` | 81.1% | 7,232 | 133/164 |
|
||||
| `json-compact` | 76.8% | 6,794 | 126/164 |
|
||||
| `yaml` | 75.6% | 8,347 | 124/164 |
|
||||
| `json-pretty` | 76.2% | 10,713 | 125/164 |
|
||||
| `xml` | 74.4% | 12,023 | 122/164 |
|
||||
|
||||
##### Time-series analytics data
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 73.3% | 1,406 | 88/120 |
|
||||
| `toon` | 72.5% | 1,548 | 87/120 |
|
||||
| `json-compact` | 71.7% | 2,349 | 86/120 |
|
||||
| `yaml` | 71.7% | 2,949 | 86/120 |
|
||||
| `json-pretty` | 68.3% | 3,676 | 82/120 |
|
||||
| `xml` | 68.3% | 4,384 | 82/120 |
|
||||
|
||||
##### Top 100 GitHub repositories
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `toon` | 62.9% | 8,779 | 83/132 |
|
||||
| `csv` | 61.4% | 8,527 | 81/132 |
|
||||
| `yaml` | 59.8% | 13,141 | 79/132 |
|
||||
| `json-compact` | 55.3% | 11,464 | 73/132 |
|
||||
| `json-pretty` | 56.1% | 15,157 | 74/132 |
|
||||
| `xml` | 48.5% | 17,105 | 64/132 |
|
||||
|
||||
##### Semi-uniform event logs
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `json-compact` | 63.3% | 4,819 | 76/120 |
|
||||
| `toon` | 57.5% | 5,799 | 69/120 |
|
||||
| `json-pretty` | 59.2% | 6,797 | 71/120 |
|
||||
| `yaml` | 48.3% | 5,827 | 58/120 |
|
||||
| `xml` | 46.7% | 7,709 | 56/120 |
|
||||
|
||||
##### Deeply nested configuration
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `json-compact` | 92.2% | 574 | 107/116 |
|
||||
| `toon` | 95.7% | 666 | 111/116 |
|
||||
| `yaml` | 91.4% | 686 | 106/116 |
|
||||
| `json-pretty` | 94.0% | 932 | 109/116 |
|
||||
| `xml` | 92.2% | 1,018 | 107/116 |
|
||||
|
||||
##### Valid complete dataset (control)
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `toon` | 100.0% | 544 | 4/4 |
|
||||
| `json-compact` | 100.0% | 795 | 4/4 |
|
||||
| `yaml` | 100.0% | 1,003 | 4/4 |
|
||||
| `json-pretty` | 100.0% | 1,282 | 4/4 |
|
||||
| `csv` | 25.0% | 492 | 1/4 |
|
||||
| `xml` | 0.0% | 1,467 | 0/4 |
|
||||
|
||||
##### Array truncated: 3 rows removed from end
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 100.0% | 425 | 4/4 |
|
||||
| `xml` | 100.0% | 1,251 | 4/4 |
|
||||
| `toon` | 0.0% | 474 | 0/4 |
|
||||
| `json-compact` | 0.0% | 681 | 0/4 |
|
||||
| `json-pretty` | 0.0% | 1,096 | 0/4 |
|
||||
| `yaml` | 0.0% | 859 | 0/4 |
|
||||
|
||||
##### Extra rows added beyond declared length
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 100.0% | 566 | 4/4 |
|
||||
| `toon` | 75.0% | 621 | 3/4 |
|
||||
| `xml` | 100.0% | 1,692 | 4/4 |
|
||||
| `yaml` | 75.0% | 1,157 | 3/4 |
|
||||
| `json-compact` | 50.0% | 917 | 2/4 |
|
||||
| `json-pretty` | 50.0% | 1,476 | 2/4 |
|
||||
|
||||
##### Inconsistent field count (missing salary in row 10)
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 75.0% | 489 | 3/4 |
|
||||
| `yaml` | 100.0% | 996 | 4/4 |
|
||||
| `toon` | 100.0% | 1,019 | 4/4 |
|
||||
| `json-compact` | 75.0% | 790 | 3/4 |
|
||||
| `xml` | 100.0% | 1,458 | 4/4 |
|
||||
| `json-pretty` | 75.0% | 1,274 | 3/4 |
|
||||
|
||||
##### Missing required fields (no email in multiple rows)
|
||||
|
||||
| Format | Accuracy | Tokens | Correct/Total |
|
||||
| ------ | -------- | ------ | ------------- |
|
||||
| `csv` | 100.0% | 329 | 4/4 |
|
||||
| `xml` | 100.0% | 1,411 | 4/4 |
|
||||
| `toon` | 75.0% | 983 | 3/4 |
|
||||
| `yaml` | 25.0% | 960 | 1/4 |
|
||||
| `json-pretty` | 25.0% | 1,230 | 1/4 |
|
||||
| `json-compact` | 0.0% | 755 | 0/4 |
|
||||
|
||||
#### Performance by Model
|
||||
|
||||
##### claude-haiku-4-5-20251001
|
||||
|
||||
| Format | Accuracy | Correct/Total |
|
||||
| ------ | -------- | ------------- |
|
||||
| `toon` | 59.8% | 125/209 |
|
||||
| `json-pretty` | 57.4% | 120/209 |
|
||||
| `yaml` | 56.0% | 117/209 |
|
||||
| `xml` | 55.5% | 116/209 |
|
||||
| `json-compact` | 55.0% | 115/209 |
|
||||
| `csv` | 50.5% | 55/109 |
|
||||
|
||||
##### gemini-2.5-flash
|
||||
|
||||
| Format | Accuracy | Correct/Total |
|
||||
| ------ | -------- | ------------- |
|
||||
| `toon` | 87.6% | 183/209 |
|
||||
| `csv` | 86.2% | 94/109 |
|
||||
| `json-compact` | 82.3% | 172/209 |
|
||||
| `yaml` | 79.4% | 166/209 |
|
||||
| `xml` | 79.4% | 166/209 |
|
||||
| `json-pretty` | 77.0% | 161/209 |
|
||||
|
||||
##### gpt-5-nano
|
||||
|
||||
| Format | Accuracy | Correct/Total |
|
||||
| ------ | -------- | ------------- |
|
||||
| `toon` | 90.9% | 190/209 |
|
||||
| `json-compact` | 90.9% | 190/209 |
|
||||
| `json-pretty` | 89.0% | 186/209 |
|
||||
| `csv` | 89.0% | 97/109 |
|
||||
| `yaml` | 87.1% | 182/209 |
|
||||
| `xml` | 80.9% | 169/209 |
|
||||
|
||||
##### grok-4-fast-non-reasoning
|
||||
|
||||
| Format | Accuracy | Correct/Total |
|
||||
| ------ | -------- | ------------- |
|
||||
| `toon` | 57.4% | 120/209 |
|
||||
| `json-pretty` | 55.5% | 116/209 |
|
||||
| `json-compact` | 54.5% | 114/209 |
|
||||
| `yaml` | 53.6% | 112/209 |
|
||||
| `xml` | 52.6% | 110/209 |
|
||||
| `csv` | 52.3% | 57/109 |
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong>How the benchmark works</strong></summary>
|
||||
|
||||
#### What's Being Measured
|
||||
|
||||
This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it. This does **not** test the model's ability to generate TOON output – only to read and understand it.
|
||||
|
||||
#### Datasets Tested
|
||||
|
||||
Eleven datasets designed to test different structural patterns and validation capabilities:
|
||||
|
||||
**Primary datasets:**
|
||||
1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
||||
2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays.
|
||||
3. **Analytics** (60 days of metrics): Time-series data with dates and numeric values.
|
||||
4. **GitHub** (100 repositories): Real-world data from top GitHub repos by stars.
|
||||
5. **Event Logs** (75 logs): Semi-uniform data with ~50% flat logs and ~50% with nested error objects.
|
||||
6. **Nested Config** (1 configuration): Deeply nested configuration with minimal tabular eligibility.
|
||||
|
||||
**Structural validation datasets:**
|
||||
7. **Control**: Valid complete dataset (baseline for validation)
|
||||
8. **Truncated**: Array with 3 rows removed from end (tests [N] length detection)
|
||||
9. **Extra rows**: Array with 3 additional rows beyond declared length
|
||||
10. **Width mismatch**: Inconsistent field count (missing salary in row 10)
|
||||
11. **Missing fields**: Systematic field omissions (no email in multiple rows)
|
||||
|
||||
#### Question Types
|
||||
|
||||
209 questions are generated dynamically across five categories:
|
||||
|
||||
- **Field retrieval (33%)**: Direct value lookups or values that can be read straight off a record (including booleans and simple counts such as array lengths)
|
||||
- Example: "What is Alice's salary?" → `75000`
|
||||
- Example: "How many items are in order ORD-0042?" → `3`
|
||||
- Example: "What is the customer name for order ORD-0042?" → `John Doe`
|
||||
|
||||
- **Aggregation (30%)**: Dataset-level totals and averages plus single-condition filters (counts, sums, min/max comparisons)
|
||||
- Example: "How many employees work in Engineering?" → `17`
|
||||
- Example: "What is the total revenue across all orders?" → `45123.50`
|
||||
- Example: "How many employees have salary > 80000?" → `23`
|
||||
|
||||
- **Filtering (23%)**: Multi-condition queries requiring compound logic (AND constraints across fields)
|
||||
- Example: "How many employees in Sales have salary > 80000?" → `5`
|
||||
- Example: "How many active employees have more than 10 years of experience?" → `8`
|
||||
|
||||
- **Structure awareness (12%)**: Tests format-native structural affordances (TOON's [N] count and {fields}, CSV's header row)
|
||||
- Example: "How many employees are in the dataset?" → `100`
|
||||
- Example: "List the field names for employees" → `id, name, email, department, salary, yearsExperience, active`
|
||||
- Example: "What is the department of the last employee?" → `Sales`
|
||||
|
||||
- **Structural validation (2%)**: Tests ability to detect incomplete, truncated, or corrupted data using structural metadata
|
||||
- Example: "Is this data complete and valid?" → `YES` (control dataset) or `NO` (corrupted datasets)
|
||||
- Tests TOON's [N] length validation and {fields} consistency checking
|
||||
- Demonstrates CSV's lack of structural validation capabilities
|
||||
|
||||
#### Evaluation Process
|
||||
|
||||
1. **Format conversion**: Each dataset is converted to all 6 formats (TOON, JSON compact, JSON, CSV, YAML, XML).
|
||||
2. **Query LLM**: Each model receives formatted data + question in a prompt and extracts the answer.
|
||||
3. **Validate deterministically**: Answers are validated using type-aware comparison (e.g., `50000` = `$50,000`, `Engineering` = `engineering`, `2025-01-01` = `January 1, 2025`) without requiring an LLM judge.
|
||||
|
||||
#### Models & Configuration
|
||||
|
||||
- **Models tested**: `claude-haiku-4-5-20251001`, `gemini-2.5-flash`, `gpt-5-nano`, `grok-4-fast-non-reasoning`
|
||||
- **Token counting**: Using `gpt-tokenizer` with `o200k_base` encoding (GPT-5 tokenizer)
|
||||
- **Temperature**: Not set (models use their defaults)
|
||||
- **Total evaluations**: 209 questions × 6 formats × 4 models = 5,016 LLM calls
|
||||
|
||||
</details>
|
||||
|
||||
<!-- /automd -->
|
||||
|
||||
> [!NOTE]
|
||||
> **Key takeaway:** TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets. The explicit structure (array lengths `[N]` and field lists `{fields}`) helps models track and validate data more reliably.
|
||||
|
||||
## Token Efficiency
|
||||
|
||||
Token counts are measured using the GPT-5 `o200k_base` tokenizer via [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). Savings are calculated against formatted JSON (2-space indentation) as the primary baseline, with additional comparisons to compact JSON (minified), YAML, and XML. Actual savings vary by model and tokenizer.
|
||||
|
||||
The benchmarks test datasets across different structural patterns (uniform, semi-uniform, nested, deeply nested) to show where TOON excels and where other formats may be better.
|
||||
|
||||
<!-- automd:file src="../../benchmarks/results/token-efficiency.md" -->
|
||||
|
||||
#### Mixed-Structure Track
|
||||
|
||||
Datasets with nested or semi-uniform structures. CSV excluded as it cannot properly represent these structures.
|
||||
|
||||
```
|
||||
🛒 E-commerce orders with nested structures ┊ Tabular: 33%
|
||||
│
|
||||
TOON █████████████░░░░░░░ 72,771 tokens
|
||||
├─ vs JSON (−33.1%) 108,806 tokens
|
||||
├─ vs JSON compact (+5.5%) 68,975 tokens
|
||||
├─ vs YAML (−14.2%) 84,780 tokens
|
||||
└─ vs XML (−40.5%) 122,406 tokens
|
||||
|
||||
🧾 Semi-uniform event logs ┊ Tabular: 50%
|
||||
│
|
||||
TOON █████████████████░░░ 153,211 tokens
|
||||
├─ vs JSON (−15.0%) 180,176 tokens
|
||||
├─ vs JSON compact (+19.9%) 127,731 tokens
|
||||
├─ vs YAML (−0.8%) 154,505 tokens
|
||||
└─ vs XML (−25.2%) 204,777 tokens
|
||||
|
||||
🧩 Deeply nested configuration ┊ Tabular: 0%
|
||||
│
|
||||
TOON ██████████████░░░░░░ 631 tokens
|
||||
├─ vs JSON (−31.3%) 919 tokens
|
||||
├─ vs JSON compact (+11.9%) 564 tokens
|
||||
├─ vs YAML (−6.2%) 673 tokens
|
||||
└─ vs XML (−37.4%) 1,008 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
TOON ████████████████░░░░ 226,613 tokens
|
||||
├─ vs JSON (−21.8%) 289,901 tokens
|
||||
├─ vs JSON compact (+14.9%) 197,270 tokens
|
||||
├─ vs YAML (−5.6%) 239,958 tokens
|
||||
└─ vs XML (−31.0%) 328,191 tokens
|
||||
```
|
||||
|
||||
#### Flat-Only Track
|
||||
|
||||
Datasets with flat tabular structures where CSV is applicable.
|
||||
|
||||
```
|
||||
👥 Uniform employee records ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 46,954 tokens
|
||||
TOON ████████████████████ 49,831 tokens (+6.1% vs CSV)
|
||||
├─ vs JSON (−60.7%) 126,860 tokens
|
||||
├─ vs JSON compact (−36.8%) 78,856 tokens
|
||||
├─ vs YAML (−50.0%) 99,706 tokens
|
||||
└─ vs XML (−66.0%) 146,444 tokens
|
||||
|
||||
📈 Time-series analytics data ┊ Tabular: 100%
|
||||
│
|
||||
CSV ██████████████████░░ 8,388 tokens
|
||||
TOON ████████████████████ 9,120 tokens (+8.7% vs CSV)
|
||||
├─ vs JSON (−59.0%) 22,250 tokens
|
||||
├─ vs JSON compact (−35.8%) 14,216 tokens
|
||||
├─ vs YAML (−48.9%) 17,863 tokens
|
||||
└─ vs XML (−65.7%) 26,621 tokens
|
||||
|
||||
⭐ Top 100 GitHub repositories ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 8,513 tokens
|
||||
TOON ████████████████████ 8,745 tokens (+2.7% vs CSV)
|
||||
├─ vs JSON (−42.3%) 15,145 tokens
|
||||
├─ vs JSON compact (−23.7%) 11,455 tokens
|
||||
├─ vs YAML (−33.4%) 13,129 tokens
|
||||
└─ vs XML (−48.8%) 17,095 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
CSV ███████████████████░ 63,855 tokens
|
||||
TOON ████████████████████ 67,696 tokens (+6.0% vs CSV)
|
||||
├─ vs JSON (−58.8%) 164,255 tokens
|
||||
├─ vs JSON compact (−35.2%) 104,527 tokens
|
||||
├─ vs YAML (−48.2%) 130,698 tokens
|
||||
└─ vs XML (−64.4%) 190,160 tokens
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary><strong>Show detailed examples</strong></summary>
|
||||
|
||||
#### 📈 Time-series analytics data
|
||||
|
||||
**Savings:** 13,130 tokens (59.0% reduction vs JSON)
|
||||
|
||||
**JSON** (22,250 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
"date": "2025-01-01",
|
||||
"views": 5715,
|
||||
"clicks": 211,
|
||||
"conversions": 28,
|
||||
"revenue": 7976.46,
|
||||
"bounceRate": 0.47
|
||||
},
|
||||
{
|
||||
"date": "2025-01-02",
|
||||
"views": 7103,
|
||||
"clicks": 393,
|
||||
"conversions": 28,
|
||||
"revenue": 8360.53,
|
||||
"bounceRate": 0.32
|
||||
},
|
||||
{
|
||||
"date": "2025-01-03",
|
||||
"views": 7248,
|
||||
"clicks": 378,
|
||||
"conversions": 24,
|
||||
"revenue": 3212.57,
|
||||
"bounceRate": 0.5
|
||||
},
|
||||
{
|
||||
"date": "2025-01-04",
|
||||
"views": 2927,
|
||||
"clicks": 77,
|
||||
"conversions": 11,
|
||||
"revenue": 1211.69,
|
||||
"bounceRate": 0.62
|
||||
},
|
||||
{
|
||||
"date": "2025-01-05",
|
||||
"views": 3530,
|
||||
"clicks": 82,
|
||||
"conversions": 8,
|
||||
"revenue": 462.77,
|
||||
"bounceRate": 0.56
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (9,120 tokens):
|
||||
|
||||
```
|
||||
metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
|
||||
2025-01-01,5715,211,28,7976.46,0.47
|
||||
2025-01-02,7103,393,28,8360.53,0.32
|
||||
2025-01-03,7248,378,24,3212.57,0.5
|
||||
2025-01-04,2927,77,11,1211.69,0.62
|
||||
2025-01-05,3530,82,8,462.77,0.56
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### ⭐ Top 100 GitHub repositories
|
||||
|
||||
**Savings:** 6,400 tokens (42.3% reduction vs JSON)
|
||||
|
||||
**JSON** (15,145 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"repositories": [
|
||||
{
|
||||
"id": 28457823,
|
||||
"name": "freeCodeCamp",
|
||||
"repo": "freeCodeCamp/freeCodeCamp",
|
||||
"description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…",
|
||||
"createdAt": "2014-12-24T17:49:19Z",
|
||||
"updatedAt": "2025-10-28T11:58:08Z",
|
||||
"pushedAt": "2025-10-28T10:17:16Z",
|
||||
"stars": 430886,
|
||||
"watchers": 8583,
|
||||
"forks": 42146,
|
||||
"defaultBranch": "main"
|
||||
},
|
||||
{
|
||||
"id": 132750724,
|
||||
"name": "build-your-own-x",
|
||||
"repo": "codecrafters-io/build-your-own-x",
|
||||
"description": "Master programming by recreating your favorite technologies from scratch.",
|
||||
"createdAt": "2018-05-09T12:03:18Z",
|
||||
"updatedAt": "2025-10-28T12:37:11Z",
|
||||
"pushedAt": "2025-10-10T18:45:01Z",
|
||||
"stars": 430877,
|
||||
"watchers": 6332,
|
||||
"forks": 40453,
|
||||
"defaultBranch": "master"
|
||||
},
|
||||
{
|
||||
"id": 21737465,
|
||||
"name": "awesome",
|
||||
"repo": "sindresorhus/awesome",
|
||||
"description": "😎 Awesome lists about all kinds of interesting topics",
|
||||
"createdAt": "2014-07-11T13:42:37Z",
|
||||
"updatedAt": "2025-10-28T12:40:21Z",
|
||||
"pushedAt": "2025-10-27T17:57:31Z",
|
||||
"stars": 410052,
|
||||
"watchers": 8017,
|
||||
"forks": 32029,
|
||||
"defaultBranch": "main"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (8,745 tokens):
|
||||
|
||||
```
|
||||
repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}:
|
||||
28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…","2014-12-24T17:49:19Z","2025-10-28T11:58:08Z","2025-10-28T10:17:16Z",430886,8583,42146,main
|
||||
132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-28T12:37:11Z","2025-10-10T18:45:01Z",430877,6332,40453,master
|
||||
21737465,awesome,sindresorhus/awesome,😎 Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-28T12:40:21Z","2025-10-27T17:57:31Z",410052,8017,32029,main
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<!-- /automd -->
|
||||
299
docs/guide/format-overview.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Format Overview
|
||||
|
||||
TOON syntax reference with concrete examples. See [Getting Started](/guide/getting-started) for introduction.
|
||||
|
||||
## Data Model
|
||||
|
||||
TOON models data the same way as JSON:
|
||||
|
||||
- **Primitives**: strings, numbers, booleans, and `null`
|
||||
- **Objects**: mappings from string keys to values
|
||||
- **Arrays**: ordered sequences of values
|
||||
|
||||
### Root Forms
|
||||
|
||||
A TOON document can represent different root forms:
|
||||
|
||||
- **Root object** (most common): Fields appear at depth 0 with no parent key
|
||||
- **Root array**: Begins with `[N]:` or `[N]{fields}:` at depth 0
|
||||
- **Root primitive**: A single primitive value (string, number, boolean, or null)
|
||||
|
||||
Most examples in these docs use root objects, but the format supports all three forms equally ([spec §5](https://github.com/toon-format/spec/blob/main/SPEC.md#5-concrete-syntax-and-root-form)).
|
||||
|
||||
## Objects
|
||||
|
||||
### Simple Objects
|
||||
|
||||
Objects with primitive values use `key: value` syntax, with one field per line:
|
||||
|
||||
```yaml
|
||||
id: 123
|
||||
name: Ada
|
||||
active: true
|
||||
```
|
||||
|
||||
Indentation replaces braces. One space follows the colon.
|
||||
|
||||
### Nested Objects
|
||||
|
||||
Nested objects add one indentation level (default: 2 spaces):
|
||||
|
||||
```yaml
|
||||
user:
|
||||
id: 123
|
||||
name: Ada
|
||||
```
|
||||
|
||||
When a key ends with `:` and has no value on the same line, it opens a nested object. All lines at the next indentation level belong to that object.
|
||||
|
||||
### Empty Objects
|
||||
|
||||
An empty object at the root yields an empty document (no lines). A nested empty object is `key:` alone, with no children.
|
||||
|
||||
## Arrays
|
||||
|
||||
TOON detects array structure and chooses the most efficient representation. Arrays always declare their length in brackets: `[N]`.
|
||||
|
||||
### Primitive Arrays (Inline)
|
||||
|
||||
Arrays of primitives (strings, numbers, booleans, null) are rendered inline:
|
||||
|
||||
```yaml
|
||||
tags[3]: admin,ops,dev
|
||||
```
|
||||
|
||||
The delimiter (comma by default) separates values. Strings containing the active delimiter must be quoted.
|
||||
|
||||
### Arrays of Objects (Tabular)
|
||||
|
||||
When all objects in an array share the same set of primitive-valued keys, TOON uses tabular format:
|
||||
|
||||
::: code-group
|
||||
|
||||
```yaml [Basic Tabular]
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
```
|
||||
|
||||
```yaml [With Spaces in Values]
|
||||
users[2]{id,name,role}:
|
||||
1,Alice Admin,admin
|
||||
2,"Bob Smith",user
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
The header `items[2]{sku,qty,price}:` declares:
|
||||
- **Array length**: `[2]` means 2 rows
|
||||
- **Field names**: `{sku,qty,price}` defines the columns
|
||||
- **Active delimiter**: comma (default)
|
||||
|
||||
Each row contains values in the same order as the field list. Values are encoded as primitives (strings, numbers, booleans, null) and separated by the delimiter.
|
||||
|
||||
> [!NOTE]
|
||||
> Tabular format requires identical field sets across all objects (same keys, order per object may vary) and primitive values only (no nested arrays/objects).
|
||||
|
||||
### Mixed and Non-Uniform Arrays
|
||||
|
||||
Arrays that don't meet the tabular requirements use list format with hyphen markers:
|
||||
|
||||
```yaml
|
||||
items[3]:
|
||||
- 1
|
||||
- a: 1
|
||||
- text
|
||||
```
|
||||
|
||||
Each element starts with `- ` at one indentation level deeper than the parent array header.
|
||||
|
||||
### Arrays of Arrays
|
||||
|
||||
When you have arrays containing primitive inner arrays:
|
||||
|
||||
```yaml
|
||||
pairs[2]:
|
||||
- [2]: 1,2
|
||||
- [2]: 3,4
|
||||
```
|
||||
|
||||
Each inner array gets its own header on the list-item line.
|
||||
|
||||
### Empty Arrays
|
||||
|
||||
Empty arrays have special representations:
|
||||
|
||||
```yaml
|
||||
items[0]:
|
||||
```
|
||||
|
||||
The header declares length zero, with no elements following.
|
||||
|
||||
## Array Headers
|
||||
|
||||
### Header Syntax
|
||||
|
||||
Array headers follow this pattern:
|
||||
|
||||
```
|
||||
key[N<delimiter?>]<{fields}>:
|
||||
```
|
||||
|
||||
Where:
|
||||
- **N** is the non-negative integer length
|
||||
- **delimiter** (optional) explicitly declares the active delimiter:
|
||||
- Absent → comma (`,`)
|
||||
- `\t` (tab character) → tab delimiter
|
||||
- `|` → pipe delimiter
|
||||
- **fields** (optional) for tabular arrays: `{field1,field2,field3}`
|
||||
|
||||
> [!TIP]
|
||||
> The array length `[N]` helps LLMs validate structure. If you ask a model to generate TOON output, explicit lengths let you detect truncation or malformed data.
|
||||
|
||||
### Delimiter Options
|
||||
|
||||
TOON supports three delimiters: comma (default), tab, and pipe. The delimiter is scoped to the array header that declares it.
|
||||
|
||||
::: code-group
|
||||
|
||||
```yaml [Comma (default)]
|
||||
items[2]{sku,name,qty,price}:
|
||||
A1,Widget,2,9.99
|
||||
B2,Gadget,1,14.5
|
||||
```
|
||||
|
||||
```yaml [Tab]
|
||||
items[2 ]{sku name qty price}:
|
||||
A1 Widget 2 9.99
|
||||
B2 Gadget 1 14.5
|
||||
```
|
||||
|
||||
```yaml [Pipe]
|
||||
items[2|]{sku|name|qty|price}:
|
||||
A1|Widget|2|9.99
|
||||
B2|Gadget|1|14.5
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
Tab and pipe delimiters are explicitly encoded in the header brackets and field braces. Commas don't require quoting when tab or pipe is active, and vice versa.
|
||||
|
||||
> [!TIP]
|
||||
> Tab delimiters often tokenize more efficiently than commas, especially for data with few quoted strings. Use `encode(data, { delimiter: '\t' })` for additional token savings.
|
||||
|
||||
## Key Folding (Optional)
|
||||
|
||||
Key folding is an optional encoder feature (since spec v1.5) that collapses chains of single-key objects into dotted paths, reducing tokens for deeply nested data.
|
||||
|
||||
### Basic Folding
|
||||
|
||||
Standard nesting:
|
||||
|
||||
```yaml
|
||||
data:
|
||||
metadata:
|
||||
items[2]: a,b
|
||||
```
|
||||
|
||||
With key folding (`keyFolding: 'safe'`):
|
||||
|
||||
```yaml
|
||||
data.metadata.items[2]: a,b
|
||||
```
|
||||
|
||||
The three nested objects collapse into a single dotted key `data.metadata.items`.
|
||||
|
||||
### When Folding Applies
|
||||
|
||||
A chain of objects is foldable when:
|
||||
- Each object in the chain has exactly one key (leading to the next object or a leaf value)
|
||||
- The leaf value is a primitive, array, or empty object
|
||||
- All segments are valid identifier segments (letters, digits, underscores only; no dots within segments)
|
||||
- The resulting folded key doesn't collide with existing keys
|
||||
|
||||
::: details Advanced Folding Rules
|
||||
**Segment Requirements (safe mode):**
|
||||
- All folded segments must match `^[A-Za-z_][A-Za-z0-9_]*$` (no dots, hyphens, or other special characters)
|
||||
- No segment may require quoting per §7.3 of the spec
|
||||
- The resulting folded key must not equal any existing sibling literal key at the same depth (collision avoidance)
|
||||
|
||||
**Depth Limit:**
|
||||
- The `flattenDepth` option (default: `Infinity`) controls how many segments to fold
|
||||
- `flattenDepth: 2` folds only two-segment chains: `{a: {b: val}}` → `a.b: val`
|
||||
- Values less than 2 have no practical effect
|
||||
|
||||
**Round-Trip with Path Expansion:**
|
||||
To reconstruct the original structure when decoding, use `expandPaths: 'safe'`. This splits dotted keys back into nested objects using the same safety rules ([spec §13.4](https://github.com/toon-format/spec/blob/main/SPEC.md#134-key-folding-and-path-expansion)).
|
||||
:::
|
||||
|
||||
### Round-Trip with Path Expansion
|
||||
|
||||
When decoding TOON that used key folding, enable path expansion to restore the nested structure:
|
||||
|
||||
```ts
|
||||
import { decode, encode } from '@toon-format/toon'
|
||||
|
||||
const original = { data: { metadata: { items: ['a', 'b'] } } }
|
||||
|
||||
// Encode with folding
|
||||
const toon = encode(original, { keyFolding: 'safe' })
|
||||
// → "data.metadata.items[2]: a,b"
|
||||
|
||||
// Decode with expansion
|
||||
const restored = decode(toon, { expandPaths: 'safe' })
|
||||
// → { data: { metadata: { items: ['a', 'b'] } } }
|
||||
```
|
||||
|
||||
Path expansion is off by default, so dotted keys are treated as literal keys unless explicitly enabled.
|
||||
|
||||
## Quoting and Types
|
||||
|
||||
### When Strings Need Quotes
|
||||
|
||||
TOON quotes strings **only when necessary** to maximize token efficiency. A string must be quoted if:
|
||||
|
||||
- It's empty (`""`)
|
||||
- It has leading or trailing whitespace
|
||||
- It equals `true`, `false`, or `null` (case-sensitive)
|
||||
- It looks like a number (e.g., `"42"`, `"-3.14"`, `"1e-6"`, or `"05"` with leading zeros)
|
||||
- It contains special characters: colon (`:`), quote (`"`), backslash (`\`), brackets, braces, or control characters (newline, tab, carriage return)
|
||||
- It contains the relevant delimiter (the active delimiter inside an array scope, or the document delimiter elsewhere)
|
||||
- It equals `"-"` or starts with `"-"` followed by any character
|
||||
|
||||
Otherwise, strings can be unquoted. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted:
|
||||
|
||||
```yaml
|
||||
message: Hello 世界 👋
|
||||
note: This has inner spaces
|
||||
```
|
||||
|
||||
### Escape Sequences
|
||||
|
||||
In quoted strings and keys, only five escape sequences are valid:
|
||||
|
||||
| Character | Escape |
|
||||
|-----------|--------|
|
||||
| Backslash (`\`) | `\\` |
|
||||
| Double quote (`"`) | `\"` |
|
||||
| Newline (U+000A) | `\n` |
|
||||
| Carriage return (U+000D) | `\r` |
|
||||
| Tab (U+0009) | `\t` |
|
||||
|
||||
All other escape sequences (e.g., `\x`, `\u`) are invalid and will cause an error in strict mode.
|
||||
|
||||
### Type Conversions
|
||||
|
||||
Numbers are emitted in canonical decimal form (no exponent notation, no trailing zeros). Non-JSON types are normalized before encoding:
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| Finite number | Canonical decimal (e.g., `1e6` → `1000000`, `1.5000` → `1.5`, `-0` → `0`) |
|
||||
| `NaN`, `Infinity`, `-Infinity` | `null` |
|
||||
| `BigInt` (within safe range) | Number |
|
||||
| `BigInt` (out of range) | Quoted decimal string (e.g., `"9007199254740993"`) |
|
||||
| `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
|
||||
| `undefined`, `function`, `symbol` | `null` |
|
||||
|
||||
Decoders accept both decimal and exponent forms on input (e.g., `42`, `-3.14`, `1e-6`), and treat tokens with forbidden leading zeros (e.g., `"05"`) as strings, not numbers.
|
||||
|
||||
For complete rules on quoting, escaping, type conversions, and strict-mode decoding, see [spec §2–4 (data model), §7 (strings and keys), and §14 (strict mode)](https://github.com/toon-format/spec/blob/main/SPEC.md).
|
||||
239
docs/guide/getting-started.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Getting Started
|
||||
|
||||
## What is TOON?
|
||||
|
||||
**Token-Oriented Object Notation** is a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
|
||||
|
||||
TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
|
||||
|
||||
Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
|
||||
|
||||
### Why TOON?
|
||||
|
||||
Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:
|
||||
|
||||
```json
|
||||
{
|
||||
"users": [
|
||||
{ "id": 1, "name": "Alice", "role": "admin" },
|
||||
{ "id": 2, "name": "Bob", "role": "user" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
YAML already reduces some redundancy with indentation instead of braces:
|
||||
|
||||
```yaml
|
||||
users:
|
||||
- id: 1
|
||||
name: Alice
|
||||
role: admin
|
||||
- id: 2
|
||||
name: Bob
|
||||
role: user
|
||||
```
|
||||
|
||||
TOON goes further by declaring fields once and streaming data as rows:
|
||||
|
||||
```yaml
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
```
|
||||
|
||||
The `[2]` declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The `{id,name,role}` declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.
|
||||
|
||||
For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON (235 tokens)]
|
||||
{
|
||||
"context": {
|
||||
"task": "Our favorite hikes together",
|
||||
"location": "Boulder",
|
||||
"season": "spring_2025"
|
||||
},
|
||||
"friends": ["ana", "luis", "sam"],
|
||||
"hikes": [
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Blue Lake Trail",
|
||||
"distanceKm": 7.5,
|
||||
"elevationGain": 320,
|
||||
"companion": "ana",
|
||||
"wasSunny": true
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"name": "Ridge Overlook",
|
||||
"distanceKm": 9.2,
|
||||
"elevationGain": 540,
|
||||
"companion": "luis",
|
||||
"wasSunny": false
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"name": "Wildflower Loop",
|
||||
"distanceKm": 5.1,
|
||||
"elevationGain": 180,
|
||||
"companion": "sam",
|
||||
"wasSunny": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON (106 tokens)]
|
||||
context:
|
||||
task: Our favorite hikes together
|
||||
location: Boulder
|
||||
season: spring_2025
|
||||
friends[3]: ana,luis,sam
|
||||
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
|
||||
1,Blue Lake Trail,7.5,320,ana,true
|
||||
2,Ridge Overlook,9.2,540,luis,false
|
||||
3,Wildflower Loop,5.1,180,sam,true
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
Notice how TOON combines YAML's indentation for the `context` object with inline format for the primitive `friends` array and tabular format for the structured `hikes` array. Each format is chosen automatically based on the data structure.
|
||||
|
||||
### Design Goals
|
||||
|
||||
TOON is optimized for specific use cases. It aims to:
|
||||
|
||||
- Make uniform arrays of objects as compact as possible by declaring structure once and streaming data.
|
||||
- Stay fully lossless and deterministic – round-trips preserve all data and structure.
|
||||
- Keep parsing simple and robust for both LLMs and humans through explicit structure markers.
|
||||
- Provide validation guardrails (array lengths, field counts) that help detect truncation and malformed output.
|
||||
|
||||
## When to Use TOON
|
||||
|
||||
TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths (`[N]`) and field headers (`{fields}`) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.
|
||||
|
||||
::: tip Production Ready
|
||||
TOON is production-ready and actively maintained, with implementations in TypeScript, Python, Go, Rust, .NET, and more. The format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the [specification](https://github.com/toon-format/spec) or sharing feedback.
|
||||
:::
|
||||
|
||||
## When Not to Use TOON
|
||||
|
||||
TOON is not always the best choice. Consider alternatives when:
|
||||
|
||||
- **Deeply nested or non-uniform structures** (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
|
||||
- **Semi-uniform arrays** (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
|
||||
- **Pure tabular data**: CSV is smaller than TOON for flat tables. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
|
||||
- **Latency-critical applications**: Benchmark on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite TOON's lower token count.
|
||||
|
||||
> [!NOTE]
|
||||
> For data-driven comparisons across different structures, see [benchmarks](/guide/benchmarks). When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.
|
||||
|
||||
## Installation
|
||||
|
||||
### TypeScript Library
|
||||
|
||||
Install the library via your preferred package manager:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npm]
|
||||
npm install @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn add @toon-format/toon
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### CLI
|
||||
|
||||
The CLI can be used without installation via `npx`, or installed globally:
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npx (no install)]
|
||||
npx @toon-format/cli input.json -o output.toon
|
||||
```
|
||||
|
||||
```bash [npm]
|
||||
npm install -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add -g @toon-format/cli
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn global add @toon-format/cli
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
For full CLI documentation, see the [CLI reference](/cli/).
|
||||
|
||||
## Your First Example
|
||||
|
||||
The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.
|
||||
|
||||
Let's encode a simple dataset with the TypeScript library:
|
||||
|
||||
```ts
|
||||
import { encode } from '@toon-format/toon'
|
||||
|
||||
const data = {
|
||||
users: [
|
||||
{ id: 1, name: 'Alice', role: 'admin' },
|
||||
{ id: 2, name: 'Bob', role: 'user' }
|
||||
]
|
||||
}
|
||||
|
||||
console.log(encode(data))
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```yaml
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
```
|
||||
|
||||
### Decoding Back to JSON
|
||||
|
||||
Decoding is just as simple:
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const toon = `
|
||||
users[2]{id,name,role}:
|
||||
1,Alice,admin
|
||||
2,Bob,user
|
||||
`
|
||||
|
||||
const data = decode(toon)
|
||||
console.log(JSON.stringify(data, null, 2))
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"users": [
|
||||
{ "id": 1, "name": "Alice", "role": "admin" },
|
||||
{ "id": 2, "name": "Bob", "role": "user" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Round-tripping is lossless: `decode(encode(x))` always equals `x` (after normalization of non-JSON types like `Date`, `NaN`, etc.).
|
||||
|
||||
## Where to Go Next
|
||||
|
||||
Now that you've seen your first TOON document, read the [Format Overview](/guide/format-overview) for complete syntax details (objects, arrays, quoting rules, key folding), then explore [Using TOON with LLMs](/guide/llm-prompts) to see how to use it effectively in prompts. For implementation details, check the [API reference](/reference/api) (TypeScript) or the [specification](/reference/spec) (language-agnostic normative rules).
|
||||
139
docs/guide/llm-prompts.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Using TOON with LLMs
|
||||
|
||||
TOON is designed for passing structured data to Large Language Models with reduced token costs and improved reliability. This guide shows how to use TOON effectively in prompts, both for input (sending data to models) and output (getting models to generate TOON).
|
||||
|
||||
This guide is about the TOON format itself. Code examples use the TypeScript library for demonstration, but the same patterns and techniques apply regardless of which programming language you're using.
|
||||
|
||||
## Why TOON for LLMs
|
||||
|
||||
LLM tokens cost money, and JSON is verbose – repeating every field name for every record in an array. TOON minimizes tokens especially for uniform arrays by declaring fields once and streaming data as rows, typically saving 30-60% compared to formatted JSON.
|
||||
|
||||
TOON adds structure guardrails: explicit `[N]` lengths and `{fields}` headers make it easier for models to track rows and for you to validate output. Strict mode helps detect truncation and malformed TOON when decoding model responses.
|
||||
|
||||
## Sending TOON as Input
|
||||
|
||||
TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.
|
||||
|
||||
Wrap your encoded data in a fenced code block (label it ` ```toon` for clarity):
|
||||
|
||||
````md
|
||||
Data is in TOON format (2-space indent, arrays show length and fields).
|
||||
|
||||
```toon
|
||||
users[3]{id,name,role,lastLogin}:
|
||||
1,Alice,admin,2025-01-15T10:30:00Z
|
||||
2,Bob,user,2025-01-14T15:22:00Z
|
||||
3,Charlie,user,2025-01-13T09:45:00Z
|
||||
```
|
||||
|
||||
Task: Summarize the user roles and their last activity.
|
||||
````
|
||||
|
||||
The indentation and headers are usually enough – models treat TOON like familiar YAML or CSV. The explicit array lengths (`[N]`) and field headers (`{fields}`) help the model track structure, especially for large tables.
|
||||
|
||||
> [!NOTE]
|
||||
> Most models don't have built-in TOON syntax highlighting, so ` ```toon` or ` ```yaml` both work fine. The structure is what matters.
|
||||
|
||||
## Generating TOON from LLMs
|
||||
|
||||
For output, be more explicit. When you want the model to **generate** TOON:
|
||||
|
||||
- **Show the expected header** (e.g., `users[N]{id,name,role}:`). The model fills rows instead of repeating keys, reducing generation errors.
|
||||
- **State the rules**: 2-space indent, no trailing spaces, `[N]` matches row count.
|
||||
|
||||
Here's a prompt that works for both reading and generating:
|
||||
|
||||
````md
|
||||
Data is in TOON format (2-space indent, arrays show length and fields).
|
||||
|
||||
```toon
|
||||
users[3]{id,name,role,lastLogin}:
|
||||
1,Alice,admin,2025-01-15T10:30:00Z
|
||||
2,Bob,user,2025-01-14T15:22:00Z
|
||||
3,Charlie,user,2025-01-13T09:45:00Z
|
||||
```
|
||||
|
||||
Task: Return only users with role "user" as TOON. Use the same header format. Set [N] to match the row count. Output only the code block.
|
||||
````
|
||||
|
||||
**Expected output:**
|
||||
|
||||
```toon
|
||||
users[2]{id,name,role,lastLogin}:
|
||||
2,Bob,user,2025-01-14T15:22:00Z
|
||||
3,Charlie,user,2025-01-13T09:45:00Z
|
||||
```
|
||||
|
||||
The model adjusts `[N]` to `2` and generates two rows.
|
||||
|
||||
### Validation with Strict Mode
|
||||
|
||||
When decoding model-generated TOON, use strict mode (default) to catch errors:
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
try {
|
||||
const data = decode(modelOutput, { strict: true })
|
||||
// Success – data is valid
|
||||
}
|
||||
catch (error) {
|
||||
// Model output was malformed (count mismatch, invalid escapes, etc.)
|
||||
console.error('Validation failed:', error.message)
|
||||
}
|
||||
```
|
||||
|
||||
Strict mode checks counts, indentation, and escaping so you can detect truncation or malformed TOON. For complete details, see the [API reference](/reference/api#decode).
|
||||
|
||||
## Delimiter Choices for Token Efficiency
|
||||
|
||||
Use `delimiter: '\t'` for tab-separated tables if you want even fewer tokens. Tabs are single characters, often tokenize more efficiently than commas, and rarely appear in natural text (reducing quote-escaping).
|
||||
|
||||
```ts
|
||||
const toon = encode(data, { delimiter: '\t' })
|
||||
```
|
||||
|
||||
Tell the model "fields are tab-separated" when using tabs. For more on delimiters, see the [Format Overview](/guide/format-overview#delimiter-options).
|
||||
|
||||
## Tips and Pitfalls
|
||||
|
||||
**Show, don't describe.** Don't explain TOON syntax in detail – just show an example. Models learn the pattern from context. A simple code block with 2-5 rows is more effective than paragraphs of explanation.
|
||||
|
||||
**Keep examples small.** Use 2-5 rows in your examples, not hundreds. The model generalizes from the pattern. Large examples waste tokens without improving accuracy.
|
||||
|
||||
**Always validate output.** Decode generated TOON with `strict: true` (default) to catch errors early. Don't assume model output is valid TOON without checking.
|
||||
|
||||
## Real-World Example
|
||||
|
||||
Here's a complete workflow: send data to a model and validate its TOON response.
|
||||
|
||||
**Prompt with TOON input:**
|
||||
|
||||
````md
|
||||
System logs in TOON format (tab-separated):
|
||||
|
||||
```toon
|
||||
events[4 ]{id level message timestamp}:
|
||||
1 error Connection timeout 2025-01-15T10:00:00Z
|
||||
2 warn Slow query 2025-01-15T10:05:00Z
|
||||
3 info User login 2025-01-15T10:10:00Z
|
||||
4 error Database error 2025-01-15T10:15:00Z
|
||||
```
|
||||
|
||||
Task: Return only error-level events as TOON. Use the same format.
|
||||
````
|
||||
|
||||
**Validate the response:**
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const modelResponse = `
|
||||
events[2 ]{id level message timestamp}:
|
||||
1 error Connection timeout 2025-01-15T10:00:00Z
|
||||
4 error Database error 2025-01-15T10:15:00Z
|
||||
`
|
||||
|
||||
const filtered = decode(modelResponse, { strict: true })
|
||||
// ✓ Validated – model correctly filtered and adjusted [N] to 2
|
||||
```
|
||||
51
docs/index.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
layout: home
|
||||
|
||||
hero:
|
||||
name: TOON
|
||||
text: Token-Oriented Object Notation
|
||||
tagline: A compact, human-readable encoding of the JSON data model for LLM prompts.
|
||||
image:
|
||||
dark: /logo-nav-dark.svg
|
||||
light: /logo-nav-light.svg
|
||||
alt: TOON Logo
|
||||
actions:
|
||||
- theme: brand
|
||||
text: Get Started
|
||||
link: /guide/getting-started
|
||||
- theme: alt
|
||||
text: Format Overview
|
||||
link: /guide/format-overview
|
||||
- theme: alt
|
||||
text: CLI
|
||||
link: /cli/
|
||||
- theme: alt
|
||||
text: Spec v2.0
|
||||
link: /reference/spec
|
||||
|
||||
features:
|
||||
- title: Token-Efficient & Accurate
|
||||
icon: 📊
|
||||
details: TOON reaches 74% accuracy (vs JSON's 70%) while using ~40% fewer tokens in mixed-structure benchmarks across 4 models.
|
||||
link: /guide/benchmarks
|
||||
- title: JSON Data Model
|
||||
icon: 🔁
|
||||
details: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips.
|
||||
link: /guide/format-overview
|
||||
- title: LLM-Friendly Guardrails
|
||||
icon: 🛤️
|
||||
details: Explicit [N] lengths and {fields} headers give models a clear schema to follow, improving parsing reliability.
|
||||
link: /guide/format-overview#arrays
|
||||
- title: Minimal Syntax
|
||||
icon: 📐
|
||||
details: Uses indentation instead of braces and minimizes quoting, giving YAML-like readability with CSV-style compactness.
|
||||
link: /guide/format-overview#arrays
|
||||
- title: Tabular Arrays
|
||||
icon: 🧺
|
||||
details: Uniform arrays of objects collapse into tables that declare fields once and stream row values line by line.
|
||||
link: /guide/format-overview#arrays
|
||||
- title: Multi-Language Ecosystem
|
||||
icon: 🌐
|
||||
details: Spec-driven implementations in TypeScript, Python, Go, Rust, .NET, and other languages.
|
||||
link: /ecosystem/implementations
|
||||
---
|
||||
14
docs/package.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"name": "@toon-format/docs",
|
||||
"type": "module",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "vitepress dev",
|
||||
"build": "vitepress build",
|
||||
"preview": "vitepress preview"
|
||||
},
|
||||
"devDependencies": {
|
||||
"unocss": "^66.5.6",
|
||||
"vitepress": "^1.6.4"
|
||||
}
|
||||
}
|
||||
BIN
docs/public/favicon.ico
Normal file
|
After Width: | Height: | Size: 5.3 KiB |
1
docs/public/favicon.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 180 180"><g clip-path="url(#a)"><path fill="#fef3c0" d="M160 20h20v140h-20v20H20v-20H0V20h20V0h140z"/><path fill="#fff" d="M120 40h20v20h-20z"/><path fill="#1b1b1f" d="M160 80h-60V20h60zm-40-20h20V40h-20zM140 100v20h-20v40h-20v-60zm20 60h-20v-40h20z"/><path fill="#fff" d="M40 120h20v20H40z"/><path fill="#1b1b1f" d="M80 160H20v-60h60zm-40-20h20v-20H40zM60 80H40V40H20V20h60v20H60z"/></g><defs><clipPath id="a"><path fill="#fff" d="M0 0h180v180H0z"/></clipPath></defs></svg>
|
||||
|
After Width: | Height: | Size: 540 B |
1
docs/public/logo-nav-dark.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 180 180"><g fill="#fff" clip-path="url(#a)"><path d="M180 77.143h-77.143V0H180zm-51.429-25.714h25.715V25.714h-25.715zM154.286 102.857v25.714h-25.715V180h-25.714v-77.143zM180 180h-25.714v-51.429H180zM77.143 180H0v-77.143h77.143zm-51.429-25.714h25.715v-25.715H25.714zM51.429 77.143H25.714V25.714H0V0h77.143v25.714H51.429z"/></g><defs><clipPath id="a"><path fill="#fff" d="M0 0h180v180H0z"/></clipPath></defs></svg>
|
||||
|
After Width: | Height: | Size: 478 B |
1
docs/public/logo-nav-light.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 180 180"><g fill="#1b1b1f" clip-path="url(#a)"><path d="M180 77.143h-77.143V0H180zm-51.429-25.714h25.715V25.714h-25.715zM154.286 102.857v25.714h-25.715V180h-25.714v-77.143zM180 180h-25.714v-51.429H180zM77.143 180H0v-77.143h77.143zm-51.429-25.714h25.715v-25.715H25.714zM51.429 77.143H25.714V25.714H0V0h77.143v25.714H51.429z"/></g><defs><clipPath id="a"><path fill="#fff" d="M0 0h180v180H0z"/></clipPath></defs></svg>
|
||||
|
After Width: | Height: | Size: 481 B |
1
docs/public/logo.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 180 180"><path fill="#fef3c0" d="M0 0h180v180H0z"/><path fill="#fff" d="M120 40h20v20h-20z"/><path fill="#1b1b1f" d="M160 80h-60V20h60zm-40-20h20V40h-20zM140 100v20h-20v40h-20v-60zm20 60h-20v-40h20z"/><path fill="#fff" d="M40 120h20v20H40z"/><path fill="#1b1b1f" d="M80 160H20v-60h60zm-40-20h20v-20H40zM60 80H40V40H20V20h60v20H60z"/></svg>
|
||||
|
After Width: | Height: | Size: 405 B |
BIN
docs/public/og.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
docs/public/twitter.png
Normal file
|
After Width: | Height: | Size: 17 KiB |
296
docs/reference/api.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# API Reference
|
||||
|
||||
TypeScript/JavaScript API documentation for the `@toon-format/toon` package. For format rules, see the [Format Overview](/guide/format-overview) or the [Specification](/reference/spec). For other languages, see [Implementations](/ecosystem/implementations).
|
||||
|
||||
## Installation
|
||||
|
||||
::: code-group
|
||||
|
||||
```bash [npm]
|
||||
npm install @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [pnpm]
|
||||
pnpm add @toon-format/toon
|
||||
```
|
||||
|
||||
```bash [yarn]
|
||||
yarn add @toon-format/toon
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## `encode(value, options?)`
|
||||
|
||||
Converts any JSON-serializable value to TOON format.
|
||||
|
||||
```ts
|
||||
import { encode } from '@toon-format/toon'
|
||||
|
||||
const toon = encode(data, {
|
||||
indent: 2,
|
||||
delimiter: ',',
|
||||
keyFolding: 'off',
|
||||
flattenDepth: Infinity
|
||||
})
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `value` | `unknown` | Any JSON-serializable value (object, array, primitive, or nested structure) |
|
||||
| `options` | `EncodeOptions?` | Optional encoding options (see below) |
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `indent` | `number` | `2` | Number of spaces per indentation level |
|
||||
| `delimiter` | `','` \| `'\t'` \| `'\|'` | `','` | Delimiter for array values and tabular rows |
|
||||
| `keyFolding` | `'off'` \| `'safe'` | `'off'` | Enable key folding to collapse single-key wrapper chains into dotted paths |
|
||||
| `flattenDepth` | `number` | `Infinity` | Maximum number of segments to fold when `keyFolding` is enabled (values 0-1 have no practical effect) |
|
||||
|
||||
### Return Value
|
||||
|
||||
Returns a TOON-formatted string with no trailing newline or spaces.
|
||||
|
||||
### Type Normalization
|
||||
|
||||
Non-JSON-serializable values are normalized before encoding:
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| Finite number | Canonical decimal (no exponent, no leading/trailing zeros: `1e6` → `1000000`, `-0` → `0`) |
|
||||
| `NaN`, `Infinity`, `-Infinity` | `null` |
|
||||
| `BigInt` (within safe range) | Number |
|
||||
| `BigInt` (out of range) | Quoted decimal string (e.g., `"9007199254740993"`) |
|
||||
| `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
|
||||
| `undefined`, `function`, `symbol` | `null` |
|
||||
|
||||
### Example
|
||||
|
||||
```ts
|
||||
import { encode } from '@toon-format/toon'
|
||||
|
||||
const items = [
|
||||
{ sku: 'A1', qty: 2, price: 9.99 },
|
||||
{ sku: 'B2', qty: 1, price: 14.5 }
|
||||
]
|
||||
|
||||
console.log(encode({ items }))
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```yaml
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
```
|
||||
|
||||
### Delimiter Options
|
||||
|
||||
::: code-group
|
||||
|
||||
```ts [Comma (default)]
|
||||
encode(data, { delimiter: ',' })
|
||||
```
|
||||
|
||||
```ts [Tab]
|
||||
encode(data, { delimiter: '\t' })
|
||||
```
|
||||
|
||||
```ts [Pipe]
|
||||
encode(data, { delimiter: '|' })
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
::: details Why Use Tab Delimiters?
|
||||
Tab delimiters (`\t`) often tokenize more efficiently than commas:
|
||||
- Tabs are single characters
|
||||
- Tabs rarely appear in natural text, reducing quote-escaping
|
||||
- The delimiter is explicitly encoded in the array header
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
items[2 ]{sku name qty price}:
|
||||
A1 Widget 2 9.99
|
||||
B2 Gadget 1 14.5
|
||||
```
|
||||
|
||||
For maximum token savings on large tabular data, combine tab delimiters with key folding:
|
||||
```ts
|
||||
encode(data, { delimiter: '\t', keyFolding: 'safe' })
|
||||
```
|
||||
:::
|
||||
|
||||
## `decode(input, options?)`
|
||||
|
||||
Converts a TOON-formatted string back to JavaScript values.
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const data = decode(toon, {
|
||||
indent: 2,
|
||||
strict: true,
|
||||
expandPaths: 'off'
|
||||
})
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `input` | `string` | A TOON-formatted string to parse |
|
||||
| `options` | `DecodeOptions?` | Optional decoding options (see below) |
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
|
||||
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
|
||||
| `expandPaths` | `'off'` \| `'safe'` | `'off'` | Enable path expansion to reconstruct dotted keys into nested objects (pairs with `keyFolding: 'safe'`) |
|
||||
|
||||
### Return Value
|
||||
|
||||
Returns a JavaScript value (object, array, or primitive) representing the parsed TOON data.
|
||||
|
||||
### Strict Mode
|
||||
|
||||
By default (`strict: true`), the decoder validates input strictly:
|
||||
|
||||
- **Invalid escape sequences**: Throws on `\x`, unterminated strings
|
||||
- **Syntax errors**: Throws on missing colons, malformed headers
|
||||
- **Array length mismatches**: Throws when declared length doesn't match actual count
|
||||
- **Delimiter mismatches**: Throws when row delimiters don't match header
|
||||
- **Indentation errors**: Throws when leading spaces aren't exact multiples of `indentSize`
|
||||
|
||||
Set `strict: false` to skip validation for lenient parsing.
|
||||
|
||||
### Example
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const toon = `
|
||||
items[2]{sku,qty,price}:
|
||||
A1,2,9.99
|
||||
B2,1,14.5
|
||||
`
|
||||
|
||||
const data = decode(toon)
|
||||
console.log(data)
|
||||
```
|
||||
|
||||
**Output:**
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{ "sku": "A1", "qty": 2, "price": 9.99 },
|
||||
{ "sku": "B2", "qty": 1, "price": 14.5 }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Path Expansion
|
||||
|
||||
When `expandPaths: 'safe'` is enabled, dotted keys are split into nested objects:
|
||||
|
||||
```ts
|
||||
import { decode } from '@toon-format/toon'
|
||||
|
||||
const toon = 'data.metadata.items[2]: a,b'
|
||||
|
||||
const data = decode(toon, { expandPaths: 'safe' })
|
||||
console.log(data)
|
||||
// { data: { metadata: { items: ['a', 'b'] } } }
|
||||
```
|
||||
|
||||
This pairs with `keyFolding: 'safe'` for lossless round-trips.
|
||||
|
||||
::: details Expansion Conflict Resolution
|
||||
When multiple expanded keys construct overlapping paths, the decoder merges them recursively:
|
||||
- **Object + Object**: Deep merge recursively
|
||||
- **Object + Non-object** (array or primitive): Conflict
|
||||
- With `strict: true` (default): Error
|
||||
- With `strict: false`: Last-write-wins (LWW)
|
||||
|
||||
Example conflict (strict mode):
|
||||
|
||||
```ts
|
||||
const toon = 'a.b: 1\na: 2'
|
||||
decode(toon, { expandPaths: 'safe', strict: true })
|
||||
// Error: "Expansion conflict at path 'a' (object vs primitive)"
|
||||
```
|
||||
|
||||
Example conflict (lenient mode):
|
||||
|
||||
```ts
|
||||
const toon = 'a.b: 1\na: 2'
|
||||
decode(toon, { expandPaths: 'safe', strict: false })
|
||||
// { a: 2 } (last write wins)
|
||||
```
|
||||
:::
|
||||
|
||||
## Round-Trip Compatibility
|
||||
|
||||
TOON provides lossless round-trips after normalization:
|
||||
|
||||
```ts
|
||||
import { decode, encode } from '@toon-format/toon'
|
||||
|
||||
const original = {
|
||||
users: [
|
||||
{ id: 1, name: 'Alice', role: 'admin' },
|
||||
{ id: 2, name: 'Bob', role: 'user' }
|
||||
]
|
||||
}
|
||||
|
||||
const toon = encode(original)
|
||||
const restored = decode(toon)
|
||||
|
||||
console.log(JSON.stringify(original) === JSON.stringify(restored))
|
||||
// true
|
||||
```
|
||||
|
||||
### With Key Folding
|
||||
|
||||
```ts
|
||||
import { decode, encode } from '@toon-format/toon'
|
||||
|
||||
const original = { data: { metadata: { items: ['a', 'b'] } } }
|
||||
|
||||
// Encode with folding
|
||||
const toon = encode(original, { keyFolding: 'safe' })
|
||||
// → "data.metadata.items[2]: a,b"
|
||||
|
||||
// Decode with expansion
|
||||
const restored = decode(toon, { expandPaths: 'safe' })
|
||||
// → { data: { metadata: { items: ['a', 'b'] } } }
|
||||
|
||||
console.log(JSON.stringify(original) === JSON.stringify(restored))
|
||||
// true
|
||||
```
|
||||
|
||||
## Types
|
||||
|
||||
```ts
|
||||
interface EncodeOptions {
|
||||
indent?: number
|
||||
delimiter?: ',' | '\t' | '|'
|
||||
keyFolding?: 'off' | 'safe'
|
||||
flattenDepth?: number
|
||||
}
|
||||
|
||||
interface DecodeOptions {
|
||||
indent?: number
|
||||
strict?: boolean
|
||||
expandPaths?: 'off' | 'safe'
|
||||
}
|
||||
```
|
||||
137
docs/reference/spec.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Specification
|
||||
|
||||
The [TOON specification](https://github.com/toon-format/spec) is the authoritative reference for implementing encoders, decoders, and validators. It defines the concrete syntax, normative encoding/decoding behavior, and strict-mode validation rules.
|
||||
|
||||
You don't need this page to *use* TOON. It's mainly for implementers and contributors. If you're looking to learn how to use TOON, start with the [Getting Started](/guide/getting-started) guide instead.
|
||||
|
||||
> [!TIP]
|
||||
> TOON is production-ready, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the spec or sharing feedback.
|
||||
|
||||
## Current Version
|
||||
|
||||
**Spec v2.0** (2025-11-10) is the current stable version.
|
||||
|
||||
## Guided Tour of the Spec
|
||||
|
||||
### Core Concepts
|
||||
|
||||
**[§1 Terminology and Conventions](https://github.com/toon-format/spec/blob/main/SPEC.md#1-terminology-and-conventions)**
|
||||
Defines key terms like "indentation level", "active delimiter", "strict mode", and RFC2119 keywords (MUST, SHOULD, MAY).
|
||||
|
||||
**[§2 Data Model](https://github.com/toon-format/spec/blob/main/SPEC.md#2-data-model)**
|
||||
Specifies the JSON data model (objects, arrays, primitives), array/object ordering requirements, and canonical number formatting (no exponent notation, no leading/trailing zeros).
|
||||
|
||||
**[§3 Encoding Normalization](https://github.com/toon-format/spec/blob/main/SPEC.md#3-encoding-normalization-reference-encoder)**
|
||||
Defines how non-JSON types (Date, BigInt, NaN, Infinity, undefined, etc.) are normalized before encoding. Required reading for encoder implementers.
|
||||
|
||||
**[§4 Decoding Interpretation](https://github.com/toon-format/spec/blob/main/SPEC.md#4-decoding-interpretation-reference-decoder)**
|
||||
Specifies how decoders map text tokens to host values (quoted strings, unquoted primitives, numeric parsing with leading-zero handling). Decoders default to strict mode (`strict = true`) in the reference implementation; strict-mode errors are enumerated in §14.
|
||||
|
||||
### Syntax Rules
|
||||
|
||||
**[§5 Concrete Syntax and Root Form](https://github.com/toon-format/spec/blob/main/SPEC.md#5-concrete-syntax-and-root-form)**
|
||||
Defines TOON's line-oriented, indentation-based notation and how to determine whether the root is an object, array, or primitive.
|
||||
|
||||
**[§6 Header Syntax](https://github.com/toon-format/spec/blob/main/SPEC.md#6-header-syntax-normative)**
|
||||
Normative ABNF grammar for array headers: `key[N<delim?>]{fields}:`. Specifies bracket segments, delimiter symbols, and field lists.
|
||||
|
||||
**[§7 Strings and Keys](https://github.com/toon-format/spec/blob/main/SPEC.md#7-strings-and-keys)**
|
||||
Complete quoting rules (when strings MUST be quoted), escape sequences (only `\\`, `\"`, `\n`, `\r`, `\t` are valid), and key encoding requirements.
|
||||
|
||||
**[§8 Objects](https://github.com/toon-format/spec/blob/main/SPEC.md#8-objects)**
|
||||
Object field encoding (key: value), nesting rules, key order preservation, and empty object handling.
|
||||
|
||||
**[§9 Arrays](https://github.com/toon-format/spec/blob/main/SPEC.md#9-arrays)**
|
||||
Covers all array forms: primitive (inline), arrays of objects (tabular), mixed/non-uniform (list), and arrays of arrays. Includes tabular detection requirements.
|
||||
|
||||
**[§10 Objects as List Items](https://github.com/toon-format/spec/blob/main/SPEC.md#10-objects-as-list-items)**
|
||||
Indentation rules for objects appearing in list items (first field on hyphen line, nested object rules).
|
||||
|
||||
**[§11 Delimiters](https://github.com/toon-format/spec/blob/main/SPEC.md#11-delimiters)**
|
||||
Delimiter scoping (document vs active), delimiter-aware quoting, and parsing rules for comma/tab/pipe delimiters.
|
||||
|
||||
**[§12 Indentation and Whitespace](https://github.com/toon-format/spec/blob/main/SPEC.md#12-indentation-and-whitespace)**
|
||||
Encoding requirements (consistent spaces, no tabs in indentation, no trailing spaces/newlines) and decoding rules (strict vs non-strict indentation handling).
|
||||
|
||||
### Conformance and Validation
|
||||
|
||||
**[§13 Conformance and Options](https://github.com/toon-format/spec/blob/main/SPEC.md#13-conformance-and-options)**
|
||||
Defines conformance classes (encoder, decoder, validator), required options, and conformance checklists.
|
||||
|
||||
**[§13.4 Key Folding and Path Expansion](https://github.com/toon-format/spec/blob/main/SPEC.md#134-key-folding-and-path-expansion)**
|
||||
Optional encoder feature (key folding) and decoder feature (path expansion) for collapsing/expanding dotted paths. Specifies safety requirements and conflict resolution.
|
||||
|
||||
**[§14 Strict Mode Errors and Diagnostics](https://github.com/toon-format/spec/blob/main/SPEC.md#14-strict-mode-errors-and-diagnostics-authoritative-checklist)**
|
||||
**Authoritative checklist** of all strict-mode errors: array count mismatches, syntax errors, indentation errors, structural errors, and path expansion conflicts.
|
||||
|
||||
### Implementation Guidance
|
||||
|
||||
**[§19 TOON Core Profile](https://github.com/toon-format/spec/blob/main/SPEC.md#19-toon-core-profile-normative-subset)**
|
||||
Normative subset of the most common, memory-friendly rules. Useful for minimal implementations.
|
||||
|
||||
**[Appendix G: Host Type Normalization Examples](https://github.com/toon-format/spec/blob/main/SPEC.md#appendix-g-host-type-normalization-examples-informative)**
|
||||
Non-normative guidance for Go, JavaScript, Python, and Rust implementations on normalizing language-specific types.
|
||||
|
||||
**[Appendix C: Test Suite and Compliance](https://github.com/toon-format/spec/blob/main/SPEC.md#appendix-c-test-suite-and-compliance-informative)**
|
||||
Reference test suite at [github.com/toon-format/spec/tree/main/tests](https://github.com/toon-format/spec/tree/main/tests) for validating implementations.
|
||||
|
||||
## Spec Sections at a Glance
|
||||
|
||||
| Section | Topic | When to Read |
|
||||
|---------|-------|--------------|
|
||||
| §1-4 | Data model, normalization, decoding | Implementing encoders/decoders |
|
||||
| §5-6 | Syntax, headers, root form | Implementing parsers |
|
||||
| §7 | Strings, keys, quoting, escaping | Implementing string handling |
|
||||
| §8-10 | Objects, arrays, list items | Implementing structure encoding |
|
||||
| §11-12 | Delimiters, indentation, whitespace | Implementing formatting and validation |
|
||||
| §13 | Conformance, options, key folding | Implementing options and features |
|
||||
| §14 | Strict-mode errors | Implementing validators |
|
||||
| §19 | Core profile | Minimal implementations |
|
||||
|
||||
## Conformance Checklists
|
||||
|
||||
The spec includes three conformance checklists:
|
||||
|
||||
### [Encoder Checklist (§13.1)](https://github.com/toon-format/spec/blob/main/SPEC.md#131-encoder-conformance-checklist)
|
||||
|
||||
Key requirements:
|
||||
- Produce UTF-8 with LF line endings
|
||||
- Use consistent indentation (default 2 spaces, no tabs)
|
||||
- Escape only `\\`, `\"`, `\n`, `\r`, `\t` in quoted strings
|
||||
- Quote strings with active delimiter, colon, or structural characters
|
||||
- Emit array lengths `[N]` matching actual count
|
||||
- Preserve object key order
|
||||
- Normalize numbers to non-exponential decimal form
|
||||
- Convert `-0` to `0`, `NaN`/±Infinity to `null`
|
||||
- No trailing spaces or trailing newline
|
||||
|
||||
### [Decoder Checklist (§13.2)](https://github.com/toon-format/spec/blob/main/SPEC.md#132-decoder-conformance-checklist)
|
||||
|
||||
Key requirements:
|
||||
- Parse array headers per §6 (length, delimiter, fields)
|
||||
- Split inline arrays and tabular rows using active delimiter only
|
||||
- Unescape quoted strings with only valid escapes
|
||||
- Type unquoted primitives: true/false/null → booleans/null, numeric → number, else → string
|
||||
- Enforce strict-mode rules when `strict=true`
|
||||
- Preserve array order and object key order
|
||||
|
||||
### [Validator Checklist (§13.3)](https://github.com/toon-format/spec/blob/main/SPEC.md#133-validator-conformance-checklist)
|
||||
|
||||
Validators should verify:
|
||||
- Structural conformance (headers, indentation, list markers)
|
||||
- Whitespace invariants (no trailing spaces/newlines)
|
||||
- Delimiter consistency between headers and rows
|
||||
- Array length counts match declared `[N]`
|
||||
- All strict-mode requirements
|
||||
|
||||
## Versioning
|
||||
|
||||
The spec uses semantic versioning (major.minor):
|
||||
- **Major version** (e.g., v2.0): Breaking changes, incompatible with previous versions
|
||||
- **Minor version** (e.g., v1.5 → v1.6): Clarifications, additional requirements, or backward-compatible additions
|
||||
|
||||
See [Appendix D: Document Changelog](https://github.com/toon-format/spec/blob/main/SPEC.md#appendix-d-document-changelog-informative) for detailed version history.
|
||||
|
||||
## Contributing to the Spec
|
||||
|
||||
The spec is community-maintained at [github.com/toon-format/spec](https://github.com/toon-format/spec). We welcome contributions of all kinds: reporting ambiguities or errors, proposing clarifications and examples, adding test cases to the reference suite, or discussing edge cases and normative behavior. Your feedback helps shape the format.
|
||||
333
docs/reference/syntax-cheatsheet.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Syntax Cheatsheet
|
||||
|
||||
Quick reference for mapping JSON to TOON format. For rigorous, normative syntax rules and edge cases, see the [specification](/reference/spec).
|
||||
|
||||
## Objects
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Ada"
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
id: 1
|
||||
name: Ada
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Nested Objects
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"user": {
|
||||
"id": 1,
|
||||
"name": "Ada"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
user:
|
||||
id: 1
|
||||
name: Ada
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Primitive Arrays
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"tags": ["foo", "bar", "baz"]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
tags[3]: foo,bar,baz
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Tabular Arrays
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"items": [
|
||||
{ "id": 1, "qty": 5 },
|
||||
{ "id": 2, "qty": 3 }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
items[2]{id,qty}:
|
||||
1,5
|
||||
2,3
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Mixed / Non-Uniform Arrays
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"items": [1, { "a": 1 }, "x"]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
items[3]:
|
||||
- 1
|
||||
- a: 1
|
||||
- x
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Arrays of Arrays
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"pairs": [[1, 2], [3, 4]]
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
pairs[2]:
|
||||
- [2]: 1,2
|
||||
- [2]: 3,4
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Root Arrays
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
["x", "y", "z"]
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
[3]: x,y,z
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Empty Containers
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [Empty Object]
|
||||
{}
|
||||
```
|
||||
|
||||
```yaml [Empty Object]
|
||||
(empty output)
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [Empty Array]
|
||||
{
|
||||
"items": []
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [Empty Array]
|
||||
items[0]:
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Quoting Special Cases
|
||||
|
||||
### Strings That Look Like Literals
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"version": "123",
|
||||
"enabled": "true"
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
version: "123"
|
||||
enabled: "true"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
These strings must be quoted because they look like numbers/booleans.
|
||||
|
||||
### Strings with Active Delimiter
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"note": "hello, world"
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
note: "hello, world"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
Strings containing the active delimiter (comma by default) must be quoted.
|
||||
|
||||
### Strings with Leading/Trailing Spaces
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"message": " padded "
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
message: " padded "
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
### Empty String
|
||||
|
||||
::: code-group
|
||||
|
||||
```json [JSON]
|
||||
{
|
||||
"name": ""
|
||||
}
|
||||
```
|
||||
|
||||
```yaml [TOON]
|
||||
name: ""
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Quoting Rules Summary
|
||||
|
||||
Strings **must** be quoted if they:
|
||||
|
||||
- Are empty (`""`)
|
||||
- Have leading or trailing whitespace
|
||||
- Equal `true`, `false`, or `null` (case-sensitive)
|
||||
- Look like numbers (e.g., `"42"`, `"-3.14"`, `"1e-6"`, `"05"`)
|
||||
- Contain special characters: `:`, `"`, `\`, `[`, `]`, `{`, `}`, newline, tab, carriage return
|
||||
- Contain the active delimiter (comma by default, or tab/pipe if declared in header)
|
||||
- Equal `"-"` or start with `"-"` followed by any character
|
||||
|
||||
Otherwise, strings can be unquoted. Unicode and emoji are safe:
|
||||
|
||||
```yaml
|
||||
message: Hello 世界 👋
|
||||
note: This has inner spaces
|
||||
```
|
||||
|
||||
## Escape Sequences
|
||||
|
||||
Only five escape sequences are valid in quoted strings:
|
||||
|
||||
| Character | Escape |
|
||||
|-----------|--------|
|
||||
| Backslash (`\`) | `\\` |
|
||||
| Double quote (`"`) | `\"` |
|
||||
| Newline | `\n` |
|
||||
| Carriage return | `\r` |
|
||||
| Tab | `\t` |
|
||||
|
||||
All other escapes (e.g., `\x`, `\u`) are invalid.
|
||||
|
||||
## Array Headers
|
||||
|
||||
### Basic Header
|
||||
|
||||
```
|
||||
key[N]:
|
||||
```
|
||||
|
||||
- `N` = array length
|
||||
- Default delimiter: comma
|
||||
|
||||
### Tabular Header
|
||||
|
||||
```
|
||||
key[N]{field1,field2,field3}:
|
||||
```
|
||||
|
||||
- `N` = array length
|
||||
- `{fields}` = column names
|
||||
- Default delimiter: comma
|
||||
|
||||
### Alternative Delimiters
|
||||
|
||||
::: code-group
|
||||
|
||||
```yaml [Tab Delimiter]
|
||||
items[2 ]{id name}:
|
||||
1 Alice
|
||||
2 Bob
|
||||
```
|
||||
|
||||
```yaml [Pipe Delimiter]
|
||||
items[2|]{id|name}:
|
||||
1|Alice
|
||||
2|Bob
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
The delimiter symbol appears inside the brackets and braces.
|
||||
|
||||
## Key Folding (Optional)
|
||||
|
||||
Standard nesting:
|
||||
|
||||
```yaml
|
||||
data:
|
||||
metadata:
|
||||
items[2]: a,b
|
||||
```
|
||||
|
||||
With key folding (`keyFolding: 'safe'`):
|
||||
|
||||
```yaml
|
||||
data.metadata.items[2]: a,b
|
||||
```
|
||||
|
||||
See [Format Overview – Key Folding](/guide/format-overview#key-folding-optional) for details.
|
||||
|
||||
## Type Conversions
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| Finite number | Canonical decimal (no exponent, no trailing zeros) |
|
||||
| `NaN`, `Infinity`, `-Infinity` | `null` |
|
||||
| `BigInt` (safe range) | Number |
|
||||
| `BigInt` (out of range) | Quoted decimal string |
|
||||
| `Date` | ISO string (quoted) |
|
||||
| `undefined`, `function`, `symbol` | `null` |
|
||||
14
docs/uno.config.ts
Normal file
@@ -0,0 +1,14 @@
|
||||
import type { UserConfig } from 'unocss'
|
||||
import { defineConfig, presetIcons, presetWind4, transformerDirectives } from 'unocss'
|
||||
|
||||
const config: UserConfig = defineConfig({
|
||||
presets: [
|
||||
presetWind4(),
|
||||
presetIcons(),
|
||||
],
|
||||
transformers: [
|
||||
transformerDirectives(),
|
||||
],
|
||||
})
|
||||
|
||||
export default config
|
||||
7
docs/wrangler.toml
Normal file
@@ -0,0 +1,7 @@
|
||||
name = "toon-docs"
|
||||
compatibility_date = "2025-10-01"
|
||||
routes = [ { pattern = "toonformat.dev", custom_domain = true } ]
|
||||
|
||||
[assets]
|
||||
directory = "./.vitepress/dist/"
|
||||
not_found_handling = "404-page"
|
||||
@@ -2,8 +2,9 @@
|
||||
import antfu from '@antfu/eslint-config'
|
||||
|
||||
export default antfu().append({
|
||||
files: ['README.md', 'SPEC.md'],
|
||||
files: ['README.md', 'SPEC.md', '**/docs/**/*'],
|
||||
rules: {
|
||||
'yaml/quotes': 'off',
|
||||
'style/no-tabs': 'off',
|
||||
},
|
||||
})
|
||||
|
||||
@@ -7,6 +7,9 @@
|
||||
"scripts": {
|
||||
"build": "pnpm -r --filter=./packages/** run build",
|
||||
"automd": "automd",
|
||||
"docs:dev": "vitepress dev docs",
|
||||
"docs:build": "vitepress build docs",
|
||||
"docs:preview": "vitepress preview docs",
|
||||
"lint": "eslint .",
|
||||
"lint:fix": "eslint . --fix",
|
||||
"test": "pnpm -r test",
|
||||
|
||||
@@ -94,12 +94,6 @@ Example output:
|
||||
toon data.json --delimiter "\t" -o output.toon
|
||||
```
|
||||
|
||||
#### Pipe-separated with length markers
|
||||
|
||||
```bash
|
||||
toon data.json --delimiter "|" --length-marker -o output.toon
|
||||
```
|
||||
|
||||
### Lenient Decoding
|
||||
|
||||
Skip validation for faster processing:
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
"type": "module",
|
||||
"version": "1.0.0",
|
||||
"packageManager": "pnpm@10.21.0",
|
||||
"description": "Token-Oriented Object Notation (TOON) – A compact, deterministic JSON format for LLM prompts",
|
||||
"description": "Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware encoding of JSON for LLM prompts",
|
||||
"author": "Johann Schopplich <hello@johannschopplich.com>",
|
||||
"license": "MIT",
|
||||
"homepage": "https://toonformat.dev",
|
||||
|
||||
1604
pnpm-lock.yaml
generated
@@ -1,4 +1,5 @@
|
||||
packages:
|
||||
- docs
|
||||
- benchmarks
|
||||
- packages/*
|
||||
|
||||
|
||||