docs: refine TOON explanation and key features

2026-01-29 23:34:10 +08:00 · 2025-10-26 22:12:21 +01:00
parent f030691579
commit 53b4870809
1 changed files with 30 additions and 37 deletions
--- a/README.md
+++ b/README.md
@@ -2,18 +2,40 @@
 # Token-Oriented Object Notation (TOON)
-AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – this is where TOON comes in.
+**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.
 **Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models. It reduces token usage compared to JSON by:
 - Removing redundant punctuation (braces/brackets, most quotes)
 - Using indentation for structure
 - Tabularizing arrays of objects
 - Writing inline primitive arrays without spaces
 > [!TIP]
 > Wrap your JSON in `encode()` before sending it to LLMs and save ~1/2 of the token cost for structured data!
 ## Why TOON?
 AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – and standard JSON is verbose and token-expensive:
 ```json
 {
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
 }
 ```
 TOON conveys the same information with **fewer tokens**:
 ```
 users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user
 ```
 ## Key Features
 - 💸 **Token-efficient:** typically 30–60% fewer tokens than JSON
 - 🤿 **LLM-friendly guardrails:** explicit lengths and field lists help models validate output
 - 🍱 **Minimal syntax:** removes redundant punctuation (braces, brackets, most quotes)
 - 📐 **Indentation-based structure:** replaces braces with whitespace for better readability
 - 🧺 **Tabular arrays:** declare keys once, then stream rows without repetition
 ## Token Benchmarks
 <!-- automd:file src="./docs/benchmarks.md" -->
@@ -182,35 +204,6 @@ metrics[5]{date,views,clicks,conversions}:
 > [!NOTE]
 > Measured with [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer) using `o200k_base` encoding (used by GPT-5 and other modern models). Savings will vary across models and tokenizers.
 ## Why TOON?
 Standard JSON is verbose and token-expensive in LLM contexts:
 ```json
 {
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
 }
 ```
 TOON conveys the same information with **fewer tokens**:
 ```
 users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user
 ```
 ## Key Features
 - 💸 **Token-efficient:** typically 30–60% fewer tokens vs JSON on GPT-style tokenizers, based on real benchmarks
 - 🎛️ **Deterministic, tokenizer-aware output:** minimal quoting and stable ordering keep payloads compact and reproducible
 - 🧺 **Tabular arrays without repetition:** declare uniform keys once, then stream rows for dense datasets
 - 📐 **Readable yet concise structure:** indentation replaces braces so nested data stays scannable without extra tokens
 - 🔢 **LLM-friendly guardrails:** explicit lengths and field lists help models validate and reproduce structured responses
 ## Installation
 ```bash