AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – this is where TOON comes in.

Token-Oriented Object Notation is a compact, human-readable format designed for passing structured data to Large Language Models. It reduces token usage compared to JSON by:

Removing redundant punctuation (braces/brackets, most quotes)
Using indentation for structure
Tabularizing arrays of objects
Writing inline primitive arrays without spaces

Token Benchmarks

Example	JSON	TOON	Saved	Reduction
👤 Simple user object	31	18	13	41.9%
🏷️ User with tags	48	28	20	41.7%
📦 Small product catalog	117	49	68	58.1%
👥 API response with users	123	53	70	56.9%
⚙️ Nested configuration	67	41	26	38.8%
🛒 E-commerce order	163	94	69	42.3%
📊 Analytics data	209	94	115	55.0%
📈 Large dataset (50 records)	2159	762	1397	64.7%
Total	2917	1139	1778	61.0%

View detailed examples

📦 Small product catalog

Savings: 68 tokens (58.1% reduction)

JSON (117 tokens):

{
  "items": [
    {
      "sku": "A1",
      "name": "Widget",
      "qty": 2,
      "price": 9.99
    },
    {
      "sku": "B2",
      "name": "Gadget",
      "qty": 1,
      "price": 14.5
    },
    {
      "sku": "C3",
      "name": "Doohickey",
      "qty": 5,
      "price": 7.25
    }
  ]
}

TOON (49 tokens):

items[3]{sku,name,qty,price}:
  A1,Widget,2,9.99
  B2,Gadget,1,14.5
  C3,Doohickey,5,7.25

👥 API response with users

Savings: 70 tokens (56.9% reduction)

JSON (123 tokens):

{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "email": "alice@example.com",
      "active": true
    },
    {
      "id": 2,
      "name": "Bob",
      "email": "bob@example.com",
      "active": true
    },
    {
      "id": 3,
      "name": "Charlie",
      "email": "charlie@example.com",
      "active": false
    }
  ],
  "total": 3,
  "page": 1
}

TOON (53 tokens):

users[3]{id,name,email,active}:
  1,Alice,alice@example.com,true
  2,Bob,bob@example.com,true
  3,Charlie,charlie@example.com,false
total: 3
page: 1

📊 Analytics data

Savings: 115 tokens (55.0% reduction)

JSON (209 tokens):

{
  "metrics": [
    {
      "date": "2025-01-01",
      "views": 1234,
      "clicks": 89,
      "conversions": 12
    },
    {
      "date": "2025-01-02",
      "views": 2345,
      "clicks": 156,
      "conversions": 23
    },
    {
      "date": "2025-01-03",
      "views": 1890,
      "clicks": 123,
      "conversions": 18
    },
    {
      "date": "2025-01-04",
      "views": 3456,
      "clicks": 234,
      "conversions": 34
    },
    {
      "date": "2025-01-05",
      "views": 2789,
      "clicks": 178,
      "conversions": 27
    }
  ]
}

TOON (94 tokens):

metrics[5]{date,views,clicks,conversions}:
  2025-01-01,1234,89,12
  2025-01-02,2345,156,23
  2025-01-03,1890,123,18
  2025-01-04,3456,234,34
  2025-01-05,2789,178,27

Note

Measured with gpt-tokenizer using o200k_base encoding (used by GPT-5 and other modern models). Savings will vary across models and tokenizers.

Why TOON?

Standard JSON is verbose and token-expensive in LLM contexts:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

TOON conveys the same information with fewer tokens:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Key Features

📉 Token-efficient: typically 30–60% fewer tokens vs JSON on GPT-style tokenizers
📊 Tabular arrays: write object keys once, list rows beneath
✂️ Minimal quoting: only when required (e.g., commas, colons, ambiguous primitives)
📐 Indentation-based structure: no braces/brackets for objects
🎯 Inline primitive arrays: written without spaces after commas
🎲 Deterministic: stable key order, no trailing spaces/newline

Installation

# npm
npm install toon

# pnpm
pnpm add toon

# yarn
yarn add toon

Quick Start

import { encode } from 'toon'

const data = {
  user: {
    id: 123,
    name: 'Ada',
    tags: ['admin', 'ops'],
    active: true
  }
}

console.log(encode(data))

Output:

user:
  id: 123
  name: Ada
  tags[2]: admin,ops
  active: true

Canonical Formatting Rules

TOON formatting is deterministic and minimal:

Indentation: 2 spaces per nesting level.
Lines:
- key: value for primitives (single space after colon).
- key: for nested/empty objects (no trailing space on that line).
Arrays:
- Primitive arrays inline: key[N]: v1,v2 (no spaces after commas).
- List items: two spaces, hyphen, space (" - …").
Whitespace invariants:
- No trailing spaces at end of any line.
- No trailing newline at end of output.

Format Overview

Objects

Simple objects with primitive values:

encode({
  id: 123,
  name: 'Ada',
  active: true
})

id: 123
name: Ada
active: true

Nested objects:

encode({
  user: {
    id: 123,
    name: 'Ada'
  }
})

user:
  id: 123
  name: Ada

Arrays

Primitive Arrays (Inline)

encode({
  tags: ['admin', 'ops', 'dev']
})

tags[3]: admin,ops,dev

Arrays of Objects (Tabular)

When all objects share the same primitive fields, TOON uses an efficient tabular format:

encode({
  items: [
    { sku: 'A1', qty: 2, price: 9.99 },
    { sku: 'B2', qty: 1, price: 14.5 }
  ]
})

items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5

Mixed and Non-Uniform Arrays

Arrays that don't meet the tabular requirements use list format:

items[3]:
  - 1
  - a: 1
  - text

When objects appear in list format, the first field is placed on the hyphen line:

items[2]:
  - id: 1
    name: First
  - id: 2
    name: Second
    extra: true

Arrays of Arrays

When you have arrays containing primitive inner arrays:

encode({
  pairs: [
    [1, 2],
    [3, 4]
  ]
})

pairs[2]:
  - [2]: 1,2
  - [2]: 3,4

Empty Arrays and Objects

Empty containers have special representations:

encode({ items: [] }) // items[0]:
encode([]) // [0]:
encode({}) // (empty output)
encode({ config: {} }) // config:

Quoting Rules

TOON quotes strings only when necessary to maximize token efficiency. Inner spaces are allowed; leading or trailing spaces force quotes. Unicode and emoji are safe unquoted.

Keys

Keys are quoted when any of the following is true:

Condition	Examples
Contains spaces, commas, colons, quotes, control chars	`"full name"`, `"a,b"`, `"order:id"`, `"tab\there"`
Contains brackets or braces	`"[index]"`, `"{key}"`
Leading hyphen	`"-lead"`
Numeric-only key	`"123"`
Empty key	`""`

Notes:

Quotes and control characters in keys are escaped (e.g., "he said \"hi\"", "line\nbreak").

String Values

String values are quoted when any of the following is true:

Condition	Examples
Empty string	`""`
Contains comma, colon, quote, backslash, or control chars	`"a,b"`, `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"`
Leading or trailing spaces	`" padded "`, `" "`
Looks like boolean/number/null	`"true"`, `"false"`, `"null"`, `"42"`, `"-3.14"`, `"1e-6"`, `"05"`
Starts with `"- "` (list-like)	`"- item"`
Looks like structural token	`"[5]"`, `"{key}"`, `"[3]: x,y"`

Examples

note: "hello, world"
items[3]: x,"true","- item"
hello 👋 world         // unquoted
" padded "             // quoted
value: null            // null value
name: ""               // empty string (quoted)
text: "line1\nline2"   // multi-line string (escaped)

Tabular Format Requirements

For arrays of objects to use the efficient tabular format, all of the following must be true:

Requirement	Detail
All elements are objects	No primitives in the array
Identical key sets	No missing or extra keys across rows
Primitive values only	No nested arrays or objects
Header key order	Taken from the first object
Header key quoting	Same rules as object keys
Row value quoting	Same rules as string values

If any condition fails, TOON falls back to list format.

Type Conversions

Some non-JSON types are automatically normalized for LLM-safe output:

Input	Output
Number (finite)	Decimal form, no scientific notation; `-0` → `0`
Number (`NaN`, `±Infinity`)	`null`
`BigInt`	Decimal digits (no quotes)
`Date`	ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`)
`undefined`	`null`
`function`	`null`
`symbol`	`null`

Number normalization examples:

-0    → 0
1e6   → 1000000
1e-6  → 0.000001

API

`encode(value: unknown): string`

Converts any JSON-serializable value to TOON format.

Parameters:

value – Any JSON-serializable value (object, array, primitive, or nested structure). Non-JSON-serializable values (functions, symbols, undefined, non-finite numbers) are converted to null. Dates are converted to ISO strings, and BigInts are emitted as decimal integers (no quotes).

Returns:

A TOON-formatted string with no trailing newline or spaces.

Example:

import { encode } from 'toon'

const items = [
  { sku: 'A1', qty: 2, price: 9.99 },
  { sku: 'B2', qty: 1, price: 14.5 }
]

console.log(encode({ items }))

Output:

items[2]{sku,qty,price}:
  A1,2,9.99
  B2,1,14.5

Using TOON in LLM Prompts

When incorporating TOON into your LLM workflows:

Wrap TOON data in a fenced code block in your prompt.
Tell the model: "Do not add extra punctuation or spaces; follow the exact TOON format."
When asking the model to generate TOON, specify the same rules (2-space indentation, no trailing spaces, quoting rules).

Token Savings Example

Here's a realistic API response to illustrate the token savings:

JSON:

{
  "users": [
    { "id": 1, "name": "Alice", "email": "alice@example.com", "active": true },
    { "id": 2, "name": "Bob", "email": "bob@example.com", "active": true },
    { "id": 3, "name": "Charlie", "email": "charlie@example.com", "active": false }
  ]
}

TOON:

users[3]{id,name,email,active}:
  1,Alice,alice@example.com,true
  2,Bob,bob@example.com,true
  3,Charlie,charlie@example.com,false

Typical savings vs JSON are in the 30–60% range on GPT-style tokenizers, driven by:

Tabular arrays of objects (keys written once)
No structural braces/brackets
Minimal quoting
No spaces after commas

Notes and Limitations

Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., SentencePiece).
TOON is designed for LLM contexts where human readability and token efficiency matter. It's not a drop-in replacement for JSON in APIs or storage.
Tabular arrays require all objects to have exactly the same keys with primitive values only. Arrays with mixed types (primitives + objects/arrays), non-uniform objects, or nested structures will use a more verbose list format.
Object key order is preserved from the input. In tabular arrays, header order follows the first object's keys.
Arrays mixing primitives and objects/arrays always use list form:
```
items[2]:
  - a: 1
  - [2]: 1,2
```
Deterministic formatting: 2-space indentation, stable key order, no trailing spaces/newline.

Quick Reference

// Object
{ id: 1, name: 'Ada' }          → id: 1
                                   name: Ada

// Nested object
{ user: { id: 1 } }             → user:
                                     id: 1

// Primitive array (inline)
{ tags: ['a', 'b'] }            → tags[2]: a,b

// Tabular array (uniform objects)
{ items: [                      → items[2]{id,qty}:
  { id: 1, qty: 5 },                1,5
  { id: 2, qty: 3 }                 2,3
]}

// Mixed / non-uniform (list)
{ items: [1, { a: 1 }, 'x'] }   → items[3]:
                                     - 1
                                     - a: 1
                                     - x

// Array of arrays
{ pairs: [[1, 2], [3, 4]] }     → pairs[2]:
                                     - [2]: 1,2
                                     - [2]: 3,4

// Root array
['x', 'y']                      → [2]: x,y

// Empty containers
{}                              → (empty output)
{ items: [] }                   → items[0]:

// Special quoting
{ note: 'hello, world' }        → note: "hello, world"
{ items: ['true', true] }       → items[2]: "true",true

README.md Unescape Escape

Token-Oriented Object Notation (TOON)