Token-Oriented Object Notation (TOON)
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – this is where TOON comes in.
Token-Oriented Object Notation is a compact, human-readable format designed for passing structured data to Large Language Models. It reduces token usage compared to JSON by:
- Removing redundant punctuation (braces/brackets, most quotes)
- Using indentation for structure
- Tabularizing arrays of objects
- Writing inline primitive arrays without spaces
Token Benchmarks
| Example | JSON | TOON | Saved | Reduction |
|---|---|---|---|---|
| 👤 Simple user object | 31 | 18 | 13 | 41.9% |
| 🏷️ User with tags | 48 | 28 | 20 | 41.7% |
| 📦 Small product catalog | 117 | 49 | 68 | 58.1% |
| 👥 API response with users | 123 | 53 | 70 | 56.9% |
| ⚙️ Nested configuration | 67 | 41 | 26 | 38.8% |
| 🛒 E-commerce order | 163 | 94 | 69 | 42.3% |
| 📊 Analytics data | 209 | 94 | 115 | 55.0% |
| 📈 Large dataset (50 records) | 2159 | 762 | 1397 | 64.7% |
| Total | 2917 | 1139 | 1778 | 61.0% |
View detailed examples
📦 Small product catalog
Savings: 68 tokens (58.1% reduction)
JSON (117 tokens):
{
"items": [
{
"sku": "A1",
"name": "Widget",
"qty": 2,
"price": 9.99
},
{
"sku": "B2",
"name": "Gadget",
"qty": 1,
"price": 14.5
},
{
"sku": "C3",
"name": "Doohickey",
"qty": 5,
"price": 7.25
}
]
}
TOON (49 tokens):
items[3]{sku,name,qty,price}:
A1,Widget,2,9.99
B2,Gadget,1,14.5
C3,Doohickey,5,7.25
👥 API response with users
Savings: 70 tokens (56.9% reduction)
JSON (123 tokens):
{
"users": [
{
"id": 1,
"name": "Alice",
"email": "alice@example.com",
"active": true
},
{
"id": 2,
"name": "Bob",
"email": "bob@example.com",
"active": true
},
{
"id": 3,
"name": "Charlie",
"email": "charlie@example.com",
"active": false
}
],
"total": 3,
"page": 1
}
TOON (53 tokens):
users[3]{id,name,email,active}:
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false
total: 3
page: 1
📊 Analytics data
Savings: 115 tokens (55.0% reduction)
JSON (209 tokens):
{
"metrics": [
{
"date": "2025-01-01",
"views": 1234,
"clicks": 89,
"conversions": 12
},
{
"date": "2025-01-02",
"views": 2345,
"clicks": 156,
"conversions": 23
},
{
"date": "2025-01-03",
"views": 1890,
"clicks": 123,
"conversions": 18
},
{
"date": "2025-01-04",
"views": 3456,
"clicks": 234,
"conversions": 34
},
{
"date": "2025-01-05",
"views": 2789,
"clicks": 178,
"conversions": 27
}
]
}
TOON (94 tokens):
metrics[5]{date,views,clicks,conversions}:
2025-01-01,1234,89,12
2025-01-02,2345,156,23
2025-01-03,1890,123,18
2025-01-04,3456,234,34
2025-01-05,2789,178,27
Note
Measured with
gpt-tokenizerusingo200k_baseencoding (used by GPT-5 and other modern models). Savings will vary across models and tokenizers.
Why TOON?
Standard JSON is verbose and token-expensive in LLM contexts:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
TOON conveys the same information with fewer tokens:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Key Features
- 📉 Token-efficient: typically 30–60% fewer tokens vs JSON on GPT-style tokenizers
- 📊 Tabular arrays: write object keys once, list rows beneath
- ✂️ Minimal quoting: only when required (e.g., commas, colons, ambiguous primitives)
- 📐 Indentation-based structure: no braces/brackets for objects
- 🎯 Inline primitive arrays: written without spaces after commas
- 🎲 Deterministic: stable key order, no trailing spaces/newline
Installation
# npm
npm install toon
# pnpm
pnpm add toon
# yarn
yarn add toon
Quick Start
import { encode } from 'toon'
const data = {
user: {
id: 123,
name: 'Ada',
tags: ['admin', 'ops'],
active: true
}
}
console.log(encode(data))
Output:
user:
id: 123
name: Ada
tags[2]: admin,ops
active: true
Canonical Formatting Rules
TOON formatting is deterministic and minimal:
- Indentation: 2 spaces per nesting level.
- Lines:
key: valuefor primitives (single space after colon).key:for nested/empty objects (no trailing space on that line).
- Arrays:
- Primitive arrays inline:
key[N]: v1,v2(no spaces after commas). - List items: two spaces, hyphen, space (
" - …").
- Primitive arrays inline:
- Whitespace invariants:
- No trailing spaces at end of any line.
- No trailing newline at end of output.
Format Overview
Objects
Simple objects with primitive values:
encode({
id: 123,
name: 'Ada',
active: true
})
id: 123
name: Ada
active: true
Nested objects:
encode({
user: {
id: 123,
name: 'Ada'
}
})
user:
id: 123
name: Ada
Arrays
Primitive Arrays (Inline)
encode({
tags: ['admin', 'ops', 'dev']
})
tags[3]: admin,ops,dev
Arrays of Objects (Tabular)
When all objects share the same primitive fields, TOON uses an efficient tabular format:
encode({
items: [
{ sku: 'A1', qty: 2, price: 9.99 },
{ sku: 'B2', qty: 1, price: 14.5 }
]
})
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Mixed and Non-Uniform Arrays
Arrays that don't meet the tabular requirements use list format:
items[3]:
- 1
- a: 1
- text
When objects appear in list format, the first field is placed on the hyphen line:
items[2]:
- id: 1
name: First
- id: 2
name: Second
extra: true
Arrays of Arrays
When you have arrays containing primitive inner arrays:
encode({
pairs: [
[1, 2],
[3, 4]
]
})
pairs[2]:
- [2]: 1,2
- [2]: 3,4
Empty Arrays and Objects
Empty containers have special representations:
encode({ items: [] }) // items[0]:
encode([]) // [0]:
encode({}) // (empty output)
encode({ config: {} }) // config:
Quoting Rules
TOON quotes strings only when necessary to maximize token efficiency. Inner spaces are allowed; leading or trailing spaces force quotes. Unicode and emoji are safe unquoted.
Keys
Keys are quoted when any of the following is true:
| Condition | Examples |
|---|---|
| Contains spaces, commas, colons, quotes, control chars | "full name", "a,b", "order:id", "tab\there" |
| Contains brackets or braces | "[index]", "{key}" |
| Leading hyphen | "-lead" |
| Numeric-only key | "123" |
| Empty key | "" |
Notes:
- Quotes and control characters in keys are escaped (e.g.,
"he said \"hi\"","line\nbreak").
String Values
String values are quoted when any of the following is true:
| Condition | Examples |
|---|---|
| Empty string | "" |
| Contains comma, colon, quote, backslash, or control chars | "a,b", "a:b", "say \"hi\"", "C:\\Users", "line1\\nline2" |
| Leading or trailing spaces | " padded ", " " |
| Looks like boolean/number/null | "true", "false", "null", "42", "-3.14", "1e-6", "05" |
Starts with "- " (list-like) |
"- item" |
| Looks like structural token | "[5]", "{key}", "[3]: x,y" |
Examples
note: "hello, world"
items[3]: x,"true","- item"
hello 👋 world // unquoted
" padded " // quoted
value: null // null value
name: "" // empty string (quoted)
text: "line1\nline2" // multi-line string (escaped)
Tabular Format Requirements
For arrays of objects to use the efficient tabular format, all of the following must be true:
| Requirement | Detail |
|---|---|
| All elements are objects | No primitives in the array |
| Identical key sets | No missing or extra keys across rows |
| Primitive values only | No nested arrays or objects |
| Header key order | Taken from the first object |
| Header key quoting | Same rules as object keys |
| Row value quoting | Same rules as string values |
If any condition fails, TOON falls back to list format.
Type Conversions
Some non-JSON types are automatically normalized for LLM-safe output:
| Input | Output |
|---|---|
| Number (finite) | Decimal form, no scientific notation; -0 → 0 |
Number (NaN, ±Infinity) |
null |
BigInt |
Decimal digits (no quotes) |
Date |
ISO string in quotes (e.g., "2025-01-01T00:00:00.000Z") |
undefined |
null |
function |
null |
symbol |
null |
Number normalization examples:
-0 → 0
1e6 → 1000000
1e-6 → 0.000001
API
encode(value: unknown): string
Converts any JSON-serializable value to TOON format.
Parameters:
value– Any JSON-serializable value (object, array, primitive, or nested structure). Non-JSON-serializable values (functions, symbols, undefined, non-finite numbers) are converted tonull. Dates are converted to ISO strings, and BigInts are emitted as decimal integers (no quotes).
Returns:
A TOON-formatted string with no trailing newline or spaces.
Example:
import { encode } from 'toon'
const items = [
{ sku: 'A1', qty: 2, price: 9.99 },
{ sku: 'B2', qty: 1, price: 14.5 }
]
console.log(encode({ items }))
Output:
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Using TOON in LLM Prompts
When incorporating TOON into your LLM workflows:
- Wrap TOON data in a fenced code block in your prompt.
- Tell the model: "Do not add extra punctuation or spaces; follow the exact TOON format."
- When asking the model to generate TOON, specify the same rules (2-space indentation, no trailing spaces, quoting rules).
Token Savings Example
Here's a realistic API response to illustrate the token savings:
JSON:
{
"users": [
{ "id": 1, "name": "Alice", "email": "alice@example.com", "active": true },
{ "id": 2, "name": "Bob", "email": "bob@example.com", "active": true },
{ "id": 3, "name": "Charlie", "email": "charlie@example.com", "active": false }
]
}
TOON:
users[3]{id,name,email,active}:
1,Alice,alice@example.com,true
2,Bob,bob@example.com,true
3,Charlie,charlie@example.com,false
Typical savings vs JSON are in the 30–60% range on GPT-style tokenizers, driven by:
- Tabular arrays of objects (keys written once)
- No structural braces/brackets
- Minimal quoting
- No spaces after commas
Notes and Limitations
- Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., SentencePiece).
- TOON is designed for LLM contexts where human readability and token efficiency matter. It's not a drop-in replacement for JSON in APIs or storage.
- Tabular arrays require all objects to have exactly the same keys with primitive values only. Arrays with mixed types (primitives + objects/arrays), non-uniform objects, or nested structures will use a more verbose list format.
- Object key order is preserved from the input. In tabular arrays, header order follows the first object's keys.
- Arrays mixing primitives and objects/arrays always use list form:
items[2]: - a: 1 - [2]: 1,2 - Deterministic formatting: 2-space indentation, stable key order, no trailing spaces/newline.
Quick Reference
// Object
{ id: 1, name: 'Ada' } → id: 1
name: Ada
// Nested object
{ user: { id: 1 } } → user:
id: 1
// Primitive array (inline)
{ tags: ['a', 'b'] } → tags[2]: a,b
// Tabular array (uniform objects)
{ items: [ → items[2]{id,qty}:
{ id: 1, qty: 5 }, 1,5
{ id: 2, qty: 3 } 2,3
]}
// Mixed / non-uniform (list)
{ items: [1, { a: 1 }, 'x'] } → items[3]:
- 1
- a: 1
- x
// Array of arrays
{ pairs: [[1, 2], [3, 4]] } → pairs[2]:
- [2]: 1,2
- [2]: 3,4
// Root array
['x', 'y'] → [2]: x,y
// Empty containers
{} → (empty output)
{ items: [] } → items[0]:
// Special quoting
{ note: 'hello, world' } → note: "hello, world"
{ items: ['true', true] } → items[2]: "true",true
License
MIT License © 2025-PRESENT Johann Schopplich