docs: update notes & limitations guide

2026-01-29 15:24:10 +08:00 · 2025-10-28 07:44:35 +01:00
parent 8ad083cf8b
commit 352e936370
4 changed files with 18 additions and 48 deletions
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@

 # Token-Oriented Object Notation (TOON)

-**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.
+**Token-Oriented Object Notation** is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.

 TOON excels at **uniform complex objects** – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.

@@ -34,16 +34,6 @@ users[2]{id,name,role}:

 </details>

-## Format Comparison
-
-Format familiarity matters as much as token count.
-
- **CSV:** best for uniform tables.
- **JSON:** best for non-uniform data.
- **TOON:** best for uniform complex (but not deeply nested) objects.
-
-TOON switches to list format for non-uniform arrays. In those cases, JSON can be cheaper at scale.
-
 ## Key Features

 - 💸 **Token-efficient:** typically 30–60% fewer tokens than JSON
@@ -363,17 +353,12 @@ Four datasets designed to test different structural patterns:
 #### Evaluation Process

 1. **Format conversion**: Each dataset is converted to all 5 formats (TOON, JSON, YAML, CSV, XML).
-2. **Query LLM**: Each model receives formatted data + question in a prompt.
-3. **LLM responds**: Model extracts the answer from the data.
-4. **Validate with LLM-as-judge**: GPT-5-nano validates if the answer is semantically correct.
+2. **Query LLM**: Each model receives formatted data + question in a prompt and extracts the answer.
+3. **Validate with LLM-as-judge**: GPT-5-nano validates if the answer is semantically correct.

 #### Semantic Validation

-Answers are validated by an LLM judge (`gpt-5-nano`) using semantic equivalence, not exact string matching:
-
- **Numeric formats**: `50000` = `$50,000` = `50000 dollars` ✓
- **Case insensitive**: `Engineering` = `engineering` = `ENGINEERING` ✓
- **Minor formatting**: `2025-01-01` = `January 1, 2025` ✓
+Answers are validated by an LLM judge (`gpt-5-nano`) using semantic equivalence, not exact string matching (e.g., `50000` = `$50,000`, `Engineering` = `engineering`, `2025-01-01` = `January 1, 2025`).

 #### Models & Configuration

@@ -810,6 +795,14 @@ console.log(encode(data, { lengthMarker: '#', delimiter: '|' }))
 //   B2|1|14.5
 ```

+## Notes and Limitations
+
+- Format familiarity matters as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only – when this doesn't hold (due to mixed types, non-uniform objects, or nested structures), TOON switches to list format where JSON can be cheaper at scale.
+  - **TOON** is best for uniform complex (but not deeply nested) objects, especially large arrays of such objects.
+  - **JSON** is best for non-uniform data and deeply nested structures.
+- **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., [SentencePiece](https://github.com/google/sentencepiece)).
+- **TOON is designed for LLM input** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
+
 ## Using TOON in LLM Prompts

 TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.
@@ -843,20 +836,6 @@ Task: Return only users with role "user" as TOON. Use the same header. Set [N] t
 > [!TIP]
 > For large uniform tables, use `encode(data, { delimiter: '\t' })` and tell the model "fields are tab-separated." Tabs often tokenize better than commas and reduce the need for quote-escaping.

-## Notes and Limitations
-
- **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., SentencePiece).
- **TOON is designed for LLM contexts** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
- **Tabular arrays** require all objects to have exactly the same keys with primitive values only. Arrays with mixed types (primitives + objects/arrays), non-uniform objects, or nested structures will use a more verbose list format.
- **Object key order** is preserved from the input. In tabular arrays, header order follows the first object's keys.
- **Arrays mixing primitives and objects/arrays** always use list form:
-  ```
-  items[2]:
-    - a: 1
-    - [2]: 1,2
-  ```
- **Deterministic formatting:** 2-space indentation, stable key order, no trailing spaces/newline.
-
 ## Quick Reference

 ```
--- a/benchmarks/results/accuracy/report.md
+++ b/benchmarks/results/accuracy/report.md
@@ -111,7 +111,7 @@ gemini-2.5-flash

 #### What's Being Measured

-This benchmark tests **LLM comprehension and data retrieval accuracy** when data is presented in different formats. Each LLM receives formatted data and must answer questions about it (this does NOT test LLM's ability to generate TOON output).
+This benchmark tests **LLM comprehension and data retrieval accuracy** across different input formats. Each LLM receives formatted data and must answer questions about it (this does **not** test model's ability to generate TOON output).

 #### Datasets Tested

@@ -140,18 +140,9 @@ Four datasets designed to test different structural patterns:

 #### Evaluation Process

-1. **Format conversion**: Each dataset is converted to all 5 formats (TOON, JSON, YAML, CSV, XML).
-2. **Query LLM**: Each model receives formatted data + question in a prompt.
-3. **LLM responds**: Model extracts the answer from the data.
-4. **Validate with LLM-as-judge**: GPT-5-nano validates if the answer is semantically correct.
-
-#### Semantic Validation
-
-Answers are validated by an LLM judge (`gpt-5-nano`) using semantic equivalence, not exact string matching:
-
- **Numeric formats**: `50000` = `$50,000` = `50000 dollars` ✓
- **Case insensitive**: `Engineering` = `engineering` = `ENGINEERING` ✓
- **Minor formatting**: `2025-01-01` = `January 1, 2025` ✓
+1. **Format conversion:** Each dataset is converted to all 5 formats (TOON, JSON, YAML, CSV, XML).
+2. **Query LLM**: Each model receives formatted data + question in a prompt and extracts the answer.
+4. **Validate with LLM-as-judge**: `gpt-5-nano` validates if the answer is semantically correct (e.g., `50000` = `$50,000`, `Engineering` = `engineering`, `2025-01-01` = `January 1, 2025`).

 #### Models & Configuration

--- a/benchmarks/results/accuracy/summary.json
+++ b/benchmarks/results/accuracy/summary.json
@@ -87,5 +87,5 @@
    "yaml-analytics": 2938,
    "yaml-github": 13129
  },
-  "timestamp": "2025-10-27T19:35:05.310Z"
+  "timestamp": "2025-10-28T06:43:10.560Z"
 }
--- a/benchmarks/scripts/token-efficiency-benchmark.ts
+++ b/benchmarks/scripts/token-efficiency-benchmark.ts
@@ -204,7 +204,7 @@ ${detailedExamples}
 </details>
 `.trimStart()

-console.log(markdown)
+console.log(`${barChartSection}\n`)

 await ensureDir(path.join(BENCHMARKS_DIR, 'results'))
 await fsp.writeFile(outputFilePath, markdown, 'utf-8')