mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
docs: clarify TOON's advantages and optimal data structure
This commit is contained in:
14
README.md
14
README.md
@@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.
|
**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.
|
||||||
|
|
||||||
TOON's sweet spot is **uniform complex objects** – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.
|
TOON's sweet spot is **uniform arrays of objects** – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts. For deeply nested or non-uniform data, JSON may be more efficient.
|
||||||
|
|
||||||
## Why TOON?
|
## Why TOON?
|
||||||
|
|
||||||
@@ -44,6 +44,8 @@ users[2]{id,name,role}:
|
|||||||
|
|
||||||
## Benchmarks
|
## Benchmarks
|
||||||
|
|
||||||
|
The benchmarks test datasets that favor TOON's strengths (uniform tabular data). Real-world performance depends heavily on your data structure.
|
||||||
|
|
||||||
<!-- automd:file src="./benchmarks/results/token-efficiency.md" -->
|
<!-- automd:file src="./benchmarks/results/token-efficiency.md" -->
|
||||||
|
|
||||||
### Token Efficiency
|
### Token Efficiency
|
||||||
@@ -248,7 +250,7 @@ grok-4-fast-non-reasoning
|
|||||||
csv █████████░░░░░░░░░░░ 45.5% (70/154)
|
csv █████████░░░░░░░░░░░ 45.5% (70/154)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Advantage:** TOON achieves **69.2% accuracy** (vs JSON's 65.4%) while using **46.3% fewer tokens**.
|
**Key tradeoff:** TOON achieves **69.2% accuracy** (vs JSON's 65.4%) while using **46.3% fewer tokens** on these datasets.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>Performance by dataset and model</strong></summary>
|
<summary><strong>Performance by dataset and model</strong></summary>
|
||||||
@@ -348,7 +350,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di
|
|||||||
|
|
||||||
#### Datasets Tested
|
#### Datasets Tested
|
||||||
|
|
||||||
Four datasets designed to test different structural patterns:
|
Four datasets designed to test different structural patterns (all contain arrays of uniform objects, TOON's optimal format):
|
||||||
|
|
||||||
1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
||||||
2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays.
|
2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays.
|
||||||
@@ -812,9 +814,9 @@ By default, the decoder validates input strictly:
|
|||||||
|
|
||||||
## Notes and Limitations
|
## Notes and Limitations
|
||||||
|
|
||||||
- Format familiarity matters as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only – when this doesn't hold (due to mixed types, non-uniform objects, or nested structures), TOON switches to list format where JSON can be cheaper at scale.
|
- Format familiarity and structure matter as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only. When this doesn't hold (due to mixed types, non-uniform objects, or nested structures), TOON switches to list format where JSON can be more efficient at scale.
|
||||||
- **TOON** is best for uniform complex (but not deeply nested) objects, especially large arrays of such objects.
|
- **TOON excels at:** Uniform arrays of objects (same fields, primitive values), especially large datasets with consistent structure.
|
||||||
- **JSON** is best for non-uniform data and deeply nested structures.
|
- **JSON is better for:** Non-uniform data, deeply nested structures, and objects with varying field sets.
|
||||||
- **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., [SentencePiece](https://github.com/google/sentencepiece)).
|
- **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., [SentencePiece](https://github.com/google/sentencepiece)).
|
||||||
- **TOON is designed for LLM input** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
|
- **TOON is designed for LLM input** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
|
||||||
|
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ grok-4-fast-non-reasoning
|
|||||||
csv █████████░░░░░░░░░░░ 45.5% (70/154)
|
csv █████████░░░░░░░░░░░ 45.5% (70/154)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Advantage:** TOON achieves **69.2% accuracy** (vs JSON's 65.4%) while using **46.3% fewer tokens**.
|
**Key tradeoff:** TOON achieves **69.2% accuracy** (vs JSON's 65.4%) while using **46.3% fewer tokens** on these datasets.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><strong>Performance by dataset and model</strong></summary>
|
<summary><strong>Performance by dataset and model</strong></summary>
|
||||||
@@ -132,7 +132,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di
|
|||||||
|
|
||||||
#### Datasets Tested
|
#### Datasets Tested
|
||||||
|
|
||||||
Four datasets designed to test different structural patterns:
|
Four datasets designed to test different structural patterns (all contain arrays of uniform objects, TOON's optimal format):
|
||||||
|
|
||||||
1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
||||||
2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays.
|
2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays.
|
||||||
|
|||||||
@@ -83,7 +83,7 @@ export function generateMarkdownReport(
|
|||||||
|
|
||||||
// Build summary comparison
|
// Build summary comparison
|
||||||
const summaryComparison = toon && json
|
const summaryComparison = toon && json
|
||||||
? `**Advantage:** TOON achieves **${(toon.accuracy * 100).toFixed(1)}% accuracy** (vs JSON's ${(json.accuracy * 100).toFixed(1)}%) while using **${((1 - toon.totalTokens / json.totalTokens) * 100).toFixed(1)}% fewer tokens**.`
|
? `**Key tradeoff:** TOON achieves **${(toon.accuracy * 100).toFixed(1)}% accuracy** (vs JSON's ${(json.accuracy * 100).toFixed(1)}%) while using **${((1 - toon.totalTokens / json.totalTokens) * 100).toFixed(1)}% fewer tokens** on these datasets.`
|
||||||
: ''
|
: ''
|
||||||
|
|
||||||
// Build performance by dataset
|
// Build performance by dataset
|
||||||
@@ -221,7 +221,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di
|
|||||||
|
|
||||||
#### Datasets Tested
|
#### Datasets Tested
|
||||||
|
|
||||||
Four datasets designed to test different structural patterns:
|
Four datasets designed to test different structural patterns (all contain arrays of uniform objects, TOON's optimal format):
|
||||||
|
|
||||||
1. **Tabular** (${tabularSize} employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
1. **Tabular** (${tabularSize} employee records): Uniform objects with identical fields – optimal for TOON's tabular format.
|
||||||
2. **Nested** (${nestedSize} e-commerce orders): Complex structures with nested customer objects and item arrays.
|
2. **Nested** (${nestedSize} e-commerce orders): Complex structures with nested customer objects and item arrays.
|
||||||
|
|||||||
Reference in New Issue
Block a user