docs: add format comparison

This commit is contained in:
Johann Schopplich
2025-10-27 20:02:05 +01:00
parent eeb5991c0c
commit c2b0e3f404

View File

@@ -4,11 +4,7 @@
**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. **Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.
In other words, if YAML and CSV had a baby, optimized for LLM contexts. TOON excels at **uniform complex objects** multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.
TOON borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.
> [!TIP]
> Wrap your JSON in `encode()` before sending it to LLMs and save ~1/2 of the token cost for structured data!
## Why TOON? ## Why TOON?
@@ -31,8 +27,6 @@ users[2]{id,name,role}:
2,Bob,user 2,Bob,user
``` ```
I built TOON to save tokens when sending large datasets to LLMs at work, where I tend to have uniform arrays of objects that benefit from the tabular format.
<details> <details>
<summary>Another reason</summary> <summary>Another reason</summary>
@@ -40,6 +34,16 @@ I built TOON to save tokens when sending large datasets to LLMs at work, where I
</details> </details>
## Format Comparison
Format familiarity matters as much as token count.
- **CSV:** best for uniform tables.
- **JSON:** best for non-uniform data.
- **TOON:** best for uniform complex (but not deeply nested) objects.
TOON switches to list format for non-uniform arrays. In those cases, JSON can be cheaper at scale.
## Key Features ## Key Features
- 💸 **Token-efficient:** typically 3060% fewer tokens than JSON - 💸 **Token-efficient:** typically 3060% fewer tokens than JSON