mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
docs: switch benchmark order
This commit is contained in:
450
README.md
450
README.md
@@ -93,226 +93,6 @@ Benchmarks are organized into two tracks to ensure fair comparisons:
|
||||
- **Mixed-Structure Track**: Datasets with nested or semi-uniform structures (TOON vs JSON, YAML, XML). CSV excluded as it cannot properly represent these structures.
|
||||
- **Flat-Only Track**: Datasets with flat tabular structures where CSV is applicable (CSV vs TOON vs JSON, YAML, XML).
|
||||
|
||||
### Token Efficiency
|
||||
|
||||
Token counts are measured using the GPT-5 `o200k_base` tokenizer via [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). Savings are calculated against formatted JSON (2-space indentation) as the primary baseline, with additional comparisons to compact JSON (minified), YAML, and XML. Actual savings vary by model and tokenizer.
|
||||
|
||||
The benchmarks test datasets across different structural patterns (uniform, semi-uniform, nested, deeply nested) to show where TOON excels and where other formats may be better.
|
||||
|
||||
<!-- automd:file src="./benchmarks/results/token-efficiency.md" -->
|
||||
|
||||
#### Mixed-Structure Track
|
||||
|
||||
Datasets with nested or semi-uniform structures. CSV excluded as it cannot properly represent these structures.
|
||||
|
||||
```
|
||||
🛒 E-commerce orders with nested structures ┊ Tabular: 33%
|
||||
│
|
||||
TOON █████████████░░░░░░░ 72,771 tokens
|
||||
├─ vs JSON (−33.1%) 108,806 tokens
|
||||
├─ vs JSON compact (+5.5%) 68,975 tokens
|
||||
├─ vs YAML (−14.2%) 84,780 tokens
|
||||
└─ vs XML (−40.5%) 122,406 tokens
|
||||
|
||||
🧾 Semi-uniform event logs ┊ Tabular: 50%
|
||||
│
|
||||
TOON █████████████████░░░ 153,211 tokens
|
||||
├─ vs JSON (−15.0%) 180,176 tokens
|
||||
├─ vs JSON compact (+19.9%) 127,731 tokens
|
||||
├─ vs YAML (−0.8%) 154,505 tokens
|
||||
└─ vs XML (−25.2%) 204,777 tokens
|
||||
|
||||
🧩 Deeply nested configuration ┊ Tabular: 0%
|
||||
│
|
||||
TOON ██████████████░░░░░░ 631 tokens
|
||||
├─ vs JSON (−31.3%) 919 tokens
|
||||
├─ vs JSON compact (+11.9%) 564 tokens
|
||||
├─ vs YAML (−6.2%) 673 tokens
|
||||
└─ vs XML (−37.4%) 1,008 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
TOON ████████████████░░░░ 226,613 tokens
|
||||
├─ vs JSON (−21.8%) 289,901 tokens
|
||||
├─ vs JSON compact (+14.9%) 197,270 tokens
|
||||
├─ vs YAML (−5.6%) 239,958 tokens
|
||||
└─ vs XML (−31.0%) 328,191 tokens
|
||||
```
|
||||
|
||||
#### Flat-Only Track
|
||||
|
||||
Datasets with flat tabular structures where CSV is applicable.
|
||||
|
||||
```
|
||||
👥 Uniform employee records ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 46,954 tokens
|
||||
TOON ████████████████████ 49,831 tokens (+6.1% vs CSV)
|
||||
├─ vs JSON (−60.7%) 126,860 tokens
|
||||
├─ vs JSON compact (−36.8%) 78,856 tokens
|
||||
├─ vs YAML (−50.0%) 99,706 tokens
|
||||
└─ vs XML (−66.0%) 146,444 tokens
|
||||
|
||||
📈 Time-series analytics data ┊ Tabular: 100%
|
||||
│
|
||||
CSV ██████████████████░░ 8,388 tokens
|
||||
TOON ████████████████████ 9,120 tokens (+8.7% vs CSV)
|
||||
├─ vs JSON (−59.0%) 22,250 tokens
|
||||
├─ vs JSON compact (−35.8%) 14,216 tokens
|
||||
├─ vs YAML (−48.9%) 17,863 tokens
|
||||
└─ vs XML (−65.7%) 26,621 tokens
|
||||
|
||||
⭐ Top 100 GitHub repositories ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 8,513 tokens
|
||||
TOON ████████████████████ 8,745 tokens (+2.7% vs CSV)
|
||||
├─ vs JSON (−42.3%) 15,145 tokens
|
||||
├─ vs JSON compact (−23.7%) 11,455 tokens
|
||||
├─ vs YAML (−33.4%) 13,129 tokens
|
||||
└─ vs XML (−48.8%) 17,095 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
CSV ███████████████████░ 63,855 tokens
|
||||
TOON ████████████████████ 67,696 tokens (+6.0% vs CSV)
|
||||
├─ vs JSON (−58.8%) 164,255 tokens
|
||||
├─ vs JSON compact (−35.2%) 104,527 tokens
|
||||
├─ vs YAML (−48.2%) 130,698 tokens
|
||||
└─ vs XML (−64.4%) 190,160 tokens
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary><strong>Show detailed examples</strong></summary>
|
||||
|
||||
#### 📈 Time-series analytics data
|
||||
|
||||
**Savings:** 13,130 tokens (59.0% reduction vs JSON)
|
||||
|
||||
**JSON** (22,250 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
"date": "2025-01-01",
|
||||
"views": 5715,
|
||||
"clicks": 211,
|
||||
"conversions": 28,
|
||||
"revenue": 7976.46,
|
||||
"bounceRate": 0.47
|
||||
},
|
||||
{
|
||||
"date": "2025-01-02",
|
||||
"views": 7103,
|
||||
"clicks": 393,
|
||||
"conversions": 28,
|
||||
"revenue": 8360.53,
|
||||
"bounceRate": 0.32
|
||||
},
|
||||
{
|
||||
"date": "2025-01-03",
|
||||
"views": 7248,
|
||||
"clicks": 378,
|
||||
"conversions": 24,
|
||||
"revenue": 3212.57,
|
||||
"bounceRate": 0.5
|
||||
},
|
||||
{
|
||||
"date": "2025-01-04",
|
||||
"views": 2927,
|
||||
"clicks": 77,
|
||||
"conversions": 11,
|
||||
"revenue": 1211.69,
|
||||
"bounceRate": 0.62
|
||||
},
|
||||
{
|
||||
"date": "2025-01-05",
|
||||
"views": 3530,
|
||||
"clicks": 82,
|
||||
"conversions": 8,
|
||||
"revenue": 462.77,
|
||||
"bounceRate": 0.56
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (9,120 tokens):
|
||||
|
||||
```
|
||||
metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
|
||||
2025-01-01,5715,211,28,7976.46,0.47
|
||||
2025-01-02,7103,393,28,8360.53,0.32
|
||||
2025-01-03,7248,378,24,3212.57,0.5
|
||||
2025-01-04,2927,77,11,1211.69,0.62
|
||||
2025-01-05,3530,82,8,462.77,0.56
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### ⭐ Top 100 GitHub repositories
|
||||
|
||||
**Savings:** 6,400 tokens (42.3% reduction vs JSON)
|
||||
|
||||
**JSON** (15,145 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"repositories": [
|
||||
{
|
||||
"id": 28457823,
|
||||
"name": "freeCodeCamp",
|
||||
"repo": "freeCodeCamp/freeCodeCamp",
|
||||
"description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…",
|
||||
"createdAt": "2014-12-24T17:49:19Z",
|
||||
"updatedAt": "2025-10-28T11:58:08Z",
|
||||
"pushedAt": "2025-10-28T10:17:16Z",
|
||||
"stars": 430886,
|
||||
"watchers": 8583,
|
||||
"forks": 42146,
|
||||
"defaultBranch": "main"
|
||||
},
|
||||
{
|
||||
"id": 132750724,
|
||||
"name": "build-your-own-x",
|
||||
"repo": "codecrafters-io/build-your-own-x",
|
||||
"description": "Master programming by recreating your favorite technologies from scratch.",
|
||||
"createdAt": "2018-05-09T12:03:18Z",
|
||||
"updatedAt": "2025-10-28T12:37:11Z",
|
||||
"pushedAt": "2025-10-10T18:45:01Z",
|
||||
"stars": 430877,
|
||||
"watchers": 6332,
|
||||
"forks": 40453,
|
||||
"defaultBranch": "master"
|
||||
},
|
||||
{
|
||||
"id": 21737465,
|
||||
"name": "awesome",
|
||||
"repo": "sindresorhus/awesome",
|
||||
"description": "😎 Awesome lists about all kinds of interesting topics",
|
||||
"createdAt": "2014-07-11T13:42:37Z",
|
||||
"updatedAt": "2025-10-28T12:40:21Z",
|
||||
"pushedAt": "2025-10-27T17:57:31Z",
|
||||
"stars": 410052,
|
||||
"watchers": 8017,
|
||||
"forks": 32029,
|
||||
"defaultBranch": "main"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (8,745 tokens):
|
||||
|
||||
```
|
||||
repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}:
|
||||
28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…","2014-12-24T17:49:19Z","2025-10-28T11:58:08Z","2025-10-28T10:17:16Z",430886,8583,42146,main
|
||||
132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-28T12:37:11Z","2025-10-10T18:45:01Z",430877,6332,40453,master
|
||||
21737465,awesome,sindresorhus/awesome,😎 Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-28T12:40:21Z","2025-10-27T17:57:31Z",410052,8017,32029,main
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<!-- /automd -->
|
||||
|
||||
### Retrieval Accuracy
|
||||
|
||||
<!-- automd:file src="./benchmarks/results/retrieval-accuracy.md" -->
|
||||
@@ -355,11 +135,11 @@ Benchmarks test LLM comprehension across different input formats using 209 data
|
||||
Each format's overall performance, balancing accuracy against token cost:
|
||||
|
||||
```
|
||||
TOON ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 26.9 │ 73.9% acc │ 2,744 tokens
|
||||
JSON compact ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
||||
YAML ▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
||||
JSON ▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
||||
XML ▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
||||
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
|
||||
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
||||
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
||||
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
||||
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
||||
```
|
||||
|
||||
TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
||||
@@ -658,6 +438,226 @@ Eleven datasets designed to test different structural patterns and validation ca
|
||||
|
||||
<!-- /automd -->
|
||||
|
||||
### Token Efficiency
|
||||
|
||||
Token counts are measured using the GPT-5 `o200k_base` tokenizer via [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). Savings are calculated against formatted JSON (2-space indentation) as the primary baseline, with additional comparisons to compact JSON (minified), YAML, and XML. Actual savings vary by model and tokenizer.
|
||||
|
||||
The benchmarks test datasets across different structural patterns (uniform, semi-uniform, nested, deeply nested) to show where TOON excels and where other formats may be better.
|
||||
|
||||
<!-- automd:file src="./benchmarks/results/token-efficiency.md" -->
|
||||
|
||||
#### Mixed-Structure Track
|
||||
|
||||
Datasets with nested or semi-uniform structures. CSV excluded as it cannot properly represent these structures.
|
||||
|
||||
```
|
||||
🛒 E-commerce orders with nested structures ┊ Tabular: 33%
|
||||
│
|
||||
TOON █████████████░░░░░░░ 72,771 tokens
|
||||
├─ vs JSON (−33.1%) 108,806 tokens
|
||||
├─ vs JSON compact (+5.5%) 68,975 tokens
|
||||
├─ vs YAML (−14.2%) 84,780 tokens
|
||||
└─ vs XML (−40.5%) 122,406 tokens
|
||||
|
||||
🧾 Semi-uniform event logs ┊ Tabular: 50%
|
||||
│
|
||||
TOON █████████████████░░░ 153,211 tokens
|
||||
├─ vs JSON (−15.0%) 180,176 tokens
|
||||
├─ vs JSON compact (+19.9%) 127,731 tokens
|
||||
├─ vs YAML (−0.8%) 154,505 tokens
|
||||
└─ vs XML (−25.2%) 204,777 tokens
|
||||
|
||||
🧩 Deeply nested configuration ┊ Tabular: 0%
|
||||
│
|
||||
TOON ██████████████░░░░░░ 631 tokens
|
||||
├─ vs JSON (−31.3%) 919 tokens
|
||||
├─ vs JSON compact (+11.9%) 564 tokens
|
||||
├─ vs YAML (−6.2%) 673 tokens
|
||||
└─ vs XML (−37.4%) 1,008 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
TOON ████████████████░░░░ 226,613 tokens
|
||||
├─ vs JSON (−21.8%) 289,901 tokens
|
||||
├─ vs JSON compact (+14.9%) 197,270 tokens
|
||||
├─ vs YAML (−5.6%) 239,958 tokens
|
||||
└─ vs XML (−31.0%) 328,191 tokens
|
||||
```
|
||||
|
||||
#### Flat-Only Track
|
||||
|
||||
Datasets with flat tabular structures where CSV is applicable.
|
||||
|
||||
```
|
||||
👥 Uniform employee records ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 46,954 tokens
|
||||
TOON ████████████████████ 49,831 tokens (+6.1% vs CSV)
|
||||
├─ vs JSON (−60.7%) 126,860 tokens
|
||||
├─ vs JSON compact (−36.8%) 78,856 tokens
|
||||
├─ vs YAML (−50.0%) 99,706 tokens
|
||||
└─ vs XML (−66.0%) 146,444 tokens
|
||||
|
||||
📈 Time-series analytics data ┊ Tabular: 100%
|
||||
│
|
||||
CSV ██████████████████░░ 8,388 tokens
|
||||
TOON ████████████████████ 9,120 tokens (+8.7% vs CSV)
|
||||
├─ vs JSON (−59.0%) 22,250 tokens
|
||||
├─ vs JSON compact (−35.8%) 14,216 tokens
|
||||
├─ vs YAML (−48.9%) 17,863 tokens
|
||||
└─ vs XML (−65.7%) 26,621 tokens
|
||||
|
||||
⭐ Top 100 GitHub repositories ┊ Tabular: 100%
|
||||
│
|
||||
CSV ███████████████████░ 8,513 tokens
|
||||
TOON ████████████████████ 8,745 tokens (+2.7% vs CSV)
|
||||
├─ vs JSON (−42.3%) 15,145 tokens
|
||||
├─ vs JSON compact (−23.7%) 11,455 tokens
|
||||
├─ vs YAML (−33.4%) 13,129 tokens
|
||||
└─ vs XML (−48.8%) 17,095 tokens
|
||||
|
||||
──────────────────────────────────── Total ────────────────────────────────────
|
||||
CSV ███████████████████░ 63,855 tokens
|
||||
TOON ████████████████████ 67,696 tokens (+6.0% vs CSV)
|
||||
├─ vs JSON (−58.8%) 164,255 tokens
|
||||
├─ vs JSON compact (−35.2%) 104,527 tokens
|
||||
├─ vs YAML (−48.2%) 130,698 tokens
|
||||
└─ vs XML (−64.4%) 190,160 tokens
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary><strong>Show detailed examples</strong></summary>
|
||||
|
||||
#### 📈 Time-series analytics data
|
||||
|
||||
**Savings:** 13,130 tokens (59.0% reduction vs JSON)
|
||||
|
||||
**JSON** (22,250 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
"date": "2025-01-01",
|
||||
"views": 5715,
|
||||
"clicks": 211,
|
||||
"conversions": 28,
|
||||
"revenue": 7976.46,
|
||||
"bounceRate": 0.47
|
||||
},
|
||||
{
|
||||
"date": "2025-01-02",
|
||||
"views": 7103,
|
||||
"clicks": 393,
|
||||
"conversions": 28,
|
||||
"revenue": 8360.53,
|
||||
"bounceRate": 0.32
|
||||
},
|
||||
{
|
||||
"date": "2025-01-03",
|
||||
"views": 7248,
|
||||
"clicks": 378,
|
||||
"conversions": 24,
|
||||
"revenue": 3212.57,
|
||||
"bounceRate": 0.5
|
||||
},
|
||||
{
|
||||
"date": "2025-01-04",
|
||||
"views": 2927,
|
||||
"clicks": 77,
|
||||
"conversions": 11,
|
||||
"revenue": 1211.69,
|
||||
"bounceRate": 0.62
|
||||
},
|
||||
{
|
||||
"date": "2025-01-05",
|
||||
"views": 3530,
|
||||
"clicks": 82,
|
||||
"conversions": 8,
|
||||
"revenue": 462.77,
|
||||
"bounceRate": 0.56
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (9,120 tokens):
|
||||
|
||||
```
|
||||
metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
|
||||
2025-01-01,5715,211,28,7976.46,0.47
|
||||
2025-01-02,7103,393,28,8360.53,0.32
|
||||
2025-01-03,7248,378,24,3212.57,0.5
|
||||
2025-01-04,2927,77,11,1211.69,0.62
|
||||
2025-01-05,3530,82,8,462.77,0.56
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### ⭐ Top 100 GitHub repositories
|
||||
|
||||
**Savings:** 6,400 tokens (42.3% reduction vs JSON)
|
||||
|
||||
**JSON** (15,145 tokens):
|
||||
|
||||
```json
|
||||
{
|
||||
"repositories": [
|
||||
{
|
||||
"id": 28457823,
|
||||
"name": "freeCodeCamp",
|
||||
"repo": "freeCodeCamp/freeCodeCamp",
|
||||
"description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…",
|
||||
"createdAt": "2014-12-24T17:49:19Z",
|
||||
"updatedAt": "2025-10-28T11:58:08Z",
|
||||
"pushedAt": "2025-10-28T10:17:16Z",
|
||||
"stars": 430886,
|
||||
"watchers": 8583,
|
||||
"forks": 42146,
|
||||
"defaultBranch": "main"
|
||||
},
|
||||
{
|
||||
"id": 132750724,
|
||||
"name": "build-your-own-x",
|
||||
"repo": "codecrafters-io/build-your-own-x",
|
||||
"description": "Master programming by recreating your favorite technologies from scratch.",
|
||||
"createdAt": "2018-05-09T12:03:18Z",
|
||||
"updatedAt": "2025-10-28T12:37:11Z",
|
||||
"pushedAt": "2025-10-10T18:45:01Z",
|
||||
"stars": 430877,
|
||||
"watchers": 6332,
|
||||
"forks": 40453,
|
||||
"defaultBranch": "master"
|
||||
},
|
||||
{
|
||||
"id": 21737465,
|
||||
"name": "awesome",
|
||||
"repo": "sindresorhus/awesome",
|
||||
"description": "😎 Awesome lists about all kinds of interesting topics",
|
||||
"createdAt": "2014-07-11T13:42:37Z",
|
||||
"updatedAt": "2025-10-28T12:40:21Z",
|
||||
"pushedAt": "2025-10-27T17:57:31Z",
|
||||
"stars": 410052,
|
||||
"watchers": 8017,
|
||||
"forks": 32029,
|
||||
"defaultBranch": "main"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**TOON** (8,745 tokens):
|
||||
|
||||
```
|
||||
repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}:
|
||||
28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…","2014-12-24T17:49:19Z","2025-10-28T11:58:08Z","2025-10-28T10:17:16Z",430886,8583,42146,main
|
||||
132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-28T12:37:11Z","2025-10-10T18:45:01Z",430877,6332,40453,master
|
||||
21737465,awesome,sindresorhus/awesome,😎 Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-28T12:40:21Z","2025-10-27T17:57:31Z",410052,8017,32029,main
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<!-- /automd -->
|
||||
|
||||
## Installation & Quick Start
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user