diff --git a/README.md b/README.md index 496c2d6..ad893b9 100644 --- a/README.md +++ b/README.md @@ -93,226 +93,6 @@ Benchmarks are organized into two tracks to ensure fair comparisons: - **Mixed-Structure Track**: Datasets with nested or semi-uniform structures (TOON vs JSON, YAML, XML). CSV excluded as it cannot properly represent these structures. - **Flat-Only Track**: Datasets with flat tabular structures where CSV is applicable (CSV vs TOON vs JSON, YAML, XML). -### Token Efficiency - -Token counts are measured using the GPT-5 `o200k_base` tokenizer via [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). Savings are calculated against formatted JSON (2-space indentation) as the primary baseline, with additional comparisons to compact JSON (minified), YAML, and XML. Actual savings vary by model and tokenizer. - -The benchmarks test datasets across different structural patterns (uniform, semi-uniform, nested, deeply nested) to show where TOON excels and where other formats may be better. - - - -#### Mixed-Structure Track - -Datasets with nested or semi-uniform structures. CSV excluded as it cannot properly represent these structures. - -``` -๐Ÿ›’ E-commerce orders with nested structures โ”Š Tabular: 33% - โ”‚ - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 72,771 tokens - โ”œโ”€ vs JSON (โˆ’33.1%) 108,806 tokens - โ”œโ”€ vs JSON compact (+5.5%) 68,975 tokens - โ”œโ”€ vs YAML (โˆ’14.2%) 84,780 tokens - โ””โ”€ vs XML (โˆ’40.5%) 122,406 tokens - -๐Ÿงพ Semi-uniform event logs โ”Š Tabular: 50% - โ”‚ - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 153,211 tokens - โ”œโ”€ vs JSON (โˆ’15.0%) 180,176 tokens - โ”œโ”€ vs JSON compact (+19.9%) 127,731 tokens - โ”œโ”€ vs YAML (โˆ’0.8%) 154,505 tokens - โ””โ”€ vs XML (โˆ’25.2%) 204,777 tokens - -๐Ÿงฉ Deeply nested configuration โ”Š Tabular: 0% - โ”‚ - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 631 tokens - โ”œโ”€ vs JSON (โˆ’31.3%) 919 tokens - โ”œโ”€ vs JSON compact (+11.9%) 564 tokens - โ”œโ”€ vs YAML (โˆ’6.2%) 673 tokens - โ””โ”€ vs XML (โˆ’37.4%) 1,008 tokens - -โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Total โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘ 226,613 tokens - โ”œโ”€ vs JSON (โˆ’21.8%) 289,901 tokens - โ”œโ”€ vs JSON compact (+14.9%) 197,270 tokens - โ”œโ”€ vs YAML (โˆ’5.6%) 239,958 tokens - โ””โ”€ vs XML (โˆ’31.0%) 328,191 tokens -``` - -#### Flat-Only Track - -Datasets with flat tabular structures where CSV is applicable. - -``` -๐Ÿ‘ฅ Uniform employee records โ”Š Tabular: 100% - โ”‚ - CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 46,954 tokens - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 49,831 tokens (+6.1% vs CSV) - โ”œโ”€ vs JSON (โˆ’60.7%) 126,860 tokens - โ”œโ”€ vs JSON compact (โˆ’36.8%) 78,856 tokens - โ”œโ”€ vs YAML (โˆ’50.0%) 99,706 tokens - โ””โ”€ vs XML (โˆ’66.0%) 146,444 tokens - -๐Ÿ“ˆ Time-series analytics data โ”Š Tabular: 100% - โ”‚ - CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 8,388 tokens - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 9,120 tokens (+8.7% vs CSV) - โ”œโ”€ vs JSON (โˆ’59.0%) 22,250 tokens - โ”œโ”€ vs JSON compact (โˆ’35.8%) 14,216 tokens - โ”œโ”€ vs YAML (โˆ’48.9%) 17,863 tokens - โ””โ”€ vs XML (โˆ’65.7%) 26,621 tokens - -โญ Top 100 GitHub repositories โ”Š Tabular: 100% - โ”‚ - CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 8,513 tokens - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 8,745 tokens (+2.7% vs CSV) - โ”œโ”€ vs JSON (โˆ’42.3%) 15,145 tokens - โ”œโ”€ vs JSON compact (โˆ’23.7%) 11,455 tokens - โ”œโ”€ vs YAML (โˆ’33.4%) 13,129 tokens - โ””โ”€ vs XML (โˆ’48.8%) 17,095 tokens - -โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Total โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ - CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 63,855 tokens - TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 67,696 tokens (+6.0% vs CSV) - โ”œโ”€ vs JSON (โˆ’58.8%) 164,255 tokens - โ”œโ”€ vs JSON compact (โˆ’35.2%) 104,527 tokens - โ”œโ”€ vs YAML (โˆ’48.2%) 130,698 tokens - โ””โ”€ vs XML (โˆ’64.4%) 190,160 tokens -``` - -
-Show detailed examples - -#### ๐Ÿ“ˆ Time-series analytics data - -**Savings:** 13,130 tokens (59.0% reduction vs JSON) - -**JSON** (22,250 tokens): - -```json -{ - "metrics": [ - { - "date": "2025-01-01", - "views": 5715, - "clicks": 211, - "conversions": 28, - "revenue": 7976.46, - "bounceRate": 0.47 - }, - { - "date": "2025-01-02", - "views": 7103, - "clicks": 393, - "conversions": 28, - "revenue": 8360.53, - "bounceRate": 0.32 - }, - { - "date": "2025-01-03", - "views": 7248, - "clicks": 378, - "conversions": 24, - "revenue": 3212.57, - "bounceRate": 0.5 - }, - { - "date": "2025-01-04", - "views": 2927, - "clicks": 77, - "conversions": 11, - "revenue": 1211.69, - "bounceRate": 0.62 - }, - { - "date": "2025-01-05", - "views": 3530, - "clicks": 82, - "conversions": 8, - "revenue": 462.77, - "bounceRate": 0.56 - } - ] -} -``` - -**TOON** (9,120 tokens): - -``` -metrics[5]{date,views,clicks,conversions,revenue,bounceRate}: - 2025-01-01,5715,211,28,7976.46,0.47 - 2025-01-02,7103,393,28,8360.53,0.32 - 2025-01-03,7248,378,24,3212.57,0.5 - 2025-01-04,2927,77,11,1211.69,0.62 - 2025-01-05,3530,82,8,462.77,0.56 -``` - ---- - -#### โญ Top 100 GitHub repositories - -**Savings:** 6,400 tokens (42.3% reduction vs JSON) - -**JSON** (15,145 tokens): - -```json -{ - "repositories": [ - { - "id": 28457823, - "name": "freeCodeCamp", - "repo": "freeCodeCamp/freeCodeCamp", - "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,โ€ฆ", - "createdAt": "2014-12-24T17:49:19Z", - "updatedAt": "2025-10-28T11:58:08Z", - "pushedAt": "2025-10-28T10:17:16Z", - "stars": 430886, - "watchers": 8583, - "forks": 42146, - "defaultBranch": "main" - }, - { - "id": 132750724, - "name": "build-your-own-x", - "repo": "codecrafters-io/build-your-own-x", - "description": "Master programming by recreating your favorite technologies from scratch.", - "createdAt": "2018-05-09T12:03:18Z", - "updatedAt": "2025-10-28T12:37:11Z", - "pushedAt": "2025-10-10T18:45:01Z", - "stars": 430877, - "watchers": 6332, - "forks": 40453, - "defaultBranch": "master" - }, - { - "id": 21737465, - "name": "awesome", - "repo": "sindresorhus/awesome", - "description": "๐Ÿ˜Ž Awesome lists about all kinds of interesting topics", - "createdAt": "2014-07-11T13:42:37Z", - "updatedAt": "2025-10-28T12:40:21Z", - "pushedAt": "2025-10-27T17:57:31Z", - "stars": 410052, - "watchers": 8017, - "forks": 32029, - "defaultBranch": "main" - } - ] -} -``` - -**TOON** (8,745 tokens): - -``` -repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}: - 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,โ€ฆ","2014-12-24T17:49:19Z","2025-10-28T11:58:08Z","2025-10-28T10:17:16Z",430886,8583,42146,main - 132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-28T12:37:11Z","2025-10-10T18:45:01Z",430877,6332,40453,master - 21737465,awesome,sindresorhus/awesome,๐Ÿ˜Ž Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-28T12:40:21Z","2025-10-27T17:57:31Z",410052,8017,32029,main -``` - -
- - - ### Retrieval Accuracy @@ -355,11 +135,11 @@ Benchmarks test LLM comprehension across different input formats using 209 data Each format's overall performance, balancing accuracy against token cost: ``` -TOON โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“ 26.9 โ”‚ 73.9% acc โ”‚ 2,744 tokens -JSON compact โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘ 22.9 โ”‚ 70.7% acc โ”‚ 3,081 tokens -YAML โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18.6 โ”‚ 69.0% acc โ”‚ 3,719 tokens -JSON โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 15.3 โ”‚ 69.7% acc โ”‚ 4,545 tokens -XML โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 13.0 โ”‚ 67.1% acc โ”‚ 5,167 tokens +TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 26.9 โ”‚ 73.9% acc โ”‚ 2,744 tokens +JSON compact โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 22.9 โ”‚ 70.7% acc โ”‚ 3,081 tokens +YAML โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18.6 โ”‚ 69.0% acc โ”‚ 3,719 tokens +JSON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 15.3 โ”‚ 69.7% acc โ”‚ 4,545 tokens +XML โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 13.0 โ”‚ 67.1% acc โ”‚ 5,167 tokens ``` TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**. @@ -658,6 +438,226 @@ Eleven datasets designed to test different structural patterns and validation ca +### Token Efficiency + +Token counts are measured using the GPT-5 `o200k_base` tokenizer via [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). Savings are calculated against formatted JSON (2-space indentation) as the primary baseline, with additional comparisons to compact JSON (minified), YAML, and XML. Actual savings vary by model and tokenizer. + +The benchmarks test datasets across different structural patterns (uniform, semi-uniform, nested, deeply nested) to show where TOON excels and where other formats may be better. + + + +#### Mixed-Structure Track + +Datasets with nested or semi-uniform structures. CSV excluded as it cannot properly represent these structures. + +``` +๐Ÿ›’ E-commerce orders with nested structures โ”Š Tabular: 33% + โ”‚ + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 72,771 tokens + โ”œโ”€ vs JSON (โˆ’33.1%) 108,806 tokens + โ”œโ”€ vs JSON compact (+5.5%) 68,975 tokens + โ”œโ”€ vs YAML (โˆ’14.2%) 84,780 tokens + โ””โ”€ vs XML (โˆ’40.5%) 122,406 tokens + +๐Ÿงพ Semi-uniform event logs โ”Š Tabular: 50% + โ”‚ + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 153,211 tokens + โ”œโ”€ vs JSON (โˆ’15.0%) 180,176 tokens + โ”œโ”€ vs JSON compact (+19.9%) 127,731 tokens + โ”œโ”€ vs YAML (โˆ’0.8%) 154,505 tokens + โ””โ”€ vs XML (โˆ’25.2%) 204,777 tokens + +๐Ÿงฉ Deeply nested configuration โ”Š Tabular: 0% + โ”‚ + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 631 tokens + โ”œโ”€ vs JSON (โˆ’31.3%) 919 tokens + โ”œโ”€ vs JSON compact (+11.9%) 564 tokens + โ”œโ”€ vs YAML (โˆ’6.2%) 673 tokens + โ””โ”€ vs XML (โˆ’37.4%) 1,008 tokens + +โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Total โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘ 226,613 tokens + โ”œโ”€ vs JSON (โˆ’21.8%) 289,901 tokens + โ”œโ”€ vs JSON compact (+14.9%) 197,270 tokens + โ”œโ”€ vs YAML (โˆ’5.6%) 239,958 tokens + โ””โ”€ vs XML (โˆ’31.0%) 328,191 tokens +``` + +#### Flat-Only Track + +Datasets with flat tabular structures where CSV is applicable. + +``` +๐Ÿ‘ฅ Uniform employee records โ”Š Tabular: 100% + โ”‚ + CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 46,954 tokens + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 49,831 tokens (+6.1% vs CSV) + โ”œโ”€ vs JSON (โˆ’60.7%) 126,860 tokens + โ”œโ”€ vs JSON compact (โˆ’36.8%) 78,856 tokens + โ”œโ”€ vs YAML (โˆ’50.0%) 99,706 tokens + โ””โ”€ vs XML (โˆ’66.0%) 146,444 tokens + +๐Ÿ“ˆ Time-series analytics data โ”Š Tabular: 100% + โ”‚ + CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 8,388 tokens + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 9,120 tokens (+8.7% vs CSV) + โ”œโ”€ vs JSON (โˆ’59.0%) 22,250 tokens + โ”œโ”€ vs JSON compact (โˆ’35.8%) 14,216 tokens + โ”œโ”€ vs YAML (โˆ’48.9%) 17,863 tokens + โ””โ”€ vs XML (โˆ’65.7%) 26,621 tokens + +โญ Top 100 GitHub repositories โ”Š Tabular: 100% + โ”‚ + CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 8,513 tokens + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 8,745 tokens (+2.7% vs CSV) + โ”œโ”€ vs JSON (โˆ’42.3%) 15,145 tokens + โ”œโ”€ vs JSON compact (โˆ’23.7%) 11,455 tokens + โ”œโ”€ vs YAML (โˆ’33.4%) 13,129 tokens + โ””โ”€ vs XML (โˆ’48.8%) 17,095 tokens + +โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Total โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + CSV โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘ 63,855 tokens + TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 67,696 tokens (+6.0% vs CSV) + โ”œโ”€ vs JSON (โˆ’58.8%) 164,255 tokens + โ”œโ”€ vs JSON compact (โˆ’35.2%) 104,527 tokens + โ”œโ”€ vs YAML (โˆ’48.2%) 130,698 tokens + โ””โ”€ vs XML (โˆ’64.4%) 190,160 tokens +``` + +
+Show detailed examples + +#### ๐Ÿ“ˆ Time-series analytics data + +**Savings:** 13,130 tokens (59.0% reduction vs JSON) + +**JSON** (22,250 tokens): + +```json +{ + "metrics": [ + { + "date": "2025-01-01", + "views": 5715, + "clicks": 211, + "conversions": 28, + "revenue": 7976.46, + "bounceRate": 0.47 + }, + { + "date": "2025-01-02", + "views": 7103, + "clicks": 393, + "conversions": 28, + "revenue": 8360.53, + "bounceRate": 0.32 + }, + { + "date": "2025-01-03", + "views": 7248, + "clicks": 378, + "conversions": 24, + "revenue": 3212.57, + "bounceRate": 0.5 + }, + { + "date": "2025-01-04", + "views": 2927, + "clicks": 77, + "conversions": 11, + "revenue": 1211.69, + "bounceRate": 0.62 + }, + { + "date": "2025-01-05", + "views": 3530, + "clicks": 82, + "conversions": 8, + "revenue": 462.77, + "bounceRate": 0.56 + } + ] +} +``` + +**TOON** (9,120 tokens): + +``` +metrics[5]{date,views,clicks,conversions,revenue,bounceRate}: + 2025-01-01,5715,211,28,7976.46,0.47 + 2025-01-02,7103,393,28,8360.53,0.32 + 2025-01-03,7248,378,24,3212.57,0.5 + 2025-01-04,2927,77,11,1211.69,0.62 + 2025-01-05,3530,82,8,462.77,0.56 +``` + +--- + +#### โญ Top 100 GitHub repositories + +**Savings:** 6,400 tokens (42.3% reduction vs JSON) + +**JSON** (15,145 tokens): + +```json +{ + "repositories": [ + { + "id": 28457823, + "name": "freeCodeCamp", + "repo": "freeCodeCamp/freeCodeCamp", + "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,โ€ฆ", + "createdAt": "2014-12-24T17:49:19Z", + "updatedAt": "2025-10-28T11:58:08Z", + "pushedAt": "2025-10-28T10:17:16Z", + "stars": 430886, + "watchers": 8583, + "forks": 42146, + "defaultBranch": "main" + }, + { + "id": 132750724, + "name": "build-your-own-x", + "repo": "codecrafters-io/build-your-own-x", + "description": "Master programming by recreating your favorite technologies from scratch.", + "createdAt": "2018-05-09T12:03:18Z", + "updatedAt": "2025-10-28T12:37:11Z", + "pushedAt": "2025-10-10T18:45:01Z", + "stars": 430877, + "watchers": 6332, + "forks": 40453, + "defaultBranch": "master" + }, + { + "id": 21737465, + "name": "awesome", + "repo": "sindresorhus/awesome", + "description": "๐Ÿ˜Ž Awesome lists about all kinds of interesting topics", + "createdAt": "2014-07-11T13:42:37Z", + "updatedAt": "2025-10-28T12:40:21Z", + "pushedAt": "2025-10-27T17:57:31Z", + "stars": 410052, + "watchers": 8017, + "forks": 32029, + "defaultBranch": "main" + } + ] +} +``` + +**TOON** (8,745 tokens): + +``` +repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}: + 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,โ€ฆ","2014-12-24T17:49:19Z","2025-10-28T11:58:08Z","2025-10-28T10:17:16Z",430886,8583,42146,main + 132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-28T12:37:11Z","2025-10-10T18:45:01Z",430877,6332,40453,master + 21737465,awesome,sindresorhus/awesome,๐Ÿ˜Ž Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-28T12:40:21Z","2025-10-27T17:57:31Z",410052,8017,32029,main +``` + +
+ + + ## Installation & Quick Start ```bash diff --git a/benchmarks/results/retrieval-accuracy.md b/benchmarks/results/retrieval-accuracy.md index 7b9b287..5e1a23e 100644 --- a/benchmarks/results/retrieval-accuracy.md +++ b/benchmarks/results/retrieval-accuracy.md @@ -36,11 +36,11 @@ Benchmarks test LLM comprehension across different input formats using 209 data Each format's overall performance, balancing accuracy against token cost: ``` -TOON โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“ 26.9 โ”‚ 73.9% acc โ”‚ 2,744 tokens -JSON compact โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘ 22.9 โ”‚ 70.7% acc โ”‚ 3,081 tokens -YAML โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18.6 โ”‚ 69.0% acc โ”‚ 3,719 tokens -JSON โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 15.3 โ”‚ 69.7% acc โ”‚ 4,545 tokens -XML โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 13.0 โ”‚ 67.1% acc โ”‚ 5,167 tokens +TOON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 26.9 โ”‚ 73.9% acc โ”‚ 2,744 tokens +JSON compact โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ 22.9 โ”‚ 70.7% acc โ”‚ 3,081 tokens +YAML โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18.6 โ”‚ 69.0% acc โ”‚ 3,719 tokens +JSON โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 15.3 โ”‚ 69.7% acc โ”‚ 4,545 tokens +XML โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 13.0 โ”‚ 67.1% acc โ”‚ 5,167 tokens ``` TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**. diff --git a/benchmarks/src/report.ts b/benchmarks/src/report.ts index 94b53e9..7fff455 100644 --- a/benchmarks/src/report.ts +++ b/benchmarks/src/report.ts @@ -559,7 +559,7 @@ function generateHorizontalEfficiencyChart( return ranking .map((r) => { const normalizedValue = r.efficiency / maxEfficiency - const bar = createProgressBar(normalizedValue, 1, barWidth, { filled: 'โ–“', empty: 'โ–‘' }) + const bar = createProgressBar(normalizedValue, 1, barWidth) const displayName = FORMATTER_DISPLAY_NAMES[r.format] || r.format const formatName = displayName.padEnd(maxFormatWidth) const efficiency = r.efficiency.toFixed(1).padStart(4)