mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 15:24:10 +08:00
feat(cli): memory-efficient streaming for encoding
This commit is contained in:
@@ -1,8 +1,8 @@
|
||||
# @toon-format/cli
|
||||
|
||||
Command-line tool for converting between JSON and TOON formats.
|
||||
Command-line tool for converting JSON to TOON and back, with token analysis and streaming support.
|
||||
|
||||
[TOON (Token-Oriented Object Notation)](https://toonformat.dev) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.
|
||||
[TOON (Token-Oriented Object Notation)](https://toonformat.dev) is a compact, human-readable encoding of the JSON data model that minimizes tokens for LLM input. The CLI lets you test conversions, analyze token savings, and integrate TOON into shell pipelines with stdin/stdout support—no code required.
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -79,11 +79,12 @@ toon data.json --stats -o output.toon
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
✓ Encoded to TOON
|
||||
Input: 15,145 tokens (JSON)
|
||||
Output: 8,745 tokens (TOON)
|
||||
Saved: 6,400 tokens (42.3% reduction)
|
||||
✔ Encoded data.json → output.toon
|
||||
|
||||
ℹ Token estimates: ~15,145 (JSON) → ~8,745 (TOON)
|
||||
✔ Saved ~6,400 tokens (-42.3%)
|
||||
```
|
||||
|
||||
### Alternative Delimiters
|
||||
@@ -115,6 +116,21 @@ cat large-dataset.json | toon --delimiter "\t" > output.toon
|
||||
jq '.results' data.json | toon > filtered.toon
|
||||
```
|
||||
|
||||
### Large Dataset Processing
|
||||
|
||||
The CLI streams output line-by-line without building the full string in memory, making it suitable for processing large datasets:
|
||||
|
||||
```bash
|
||||
# Encode large JSON file with minimal memory usage
|
||||
toon huge-dataset.json -o output.toon
|
||||
|
||||
# Process millions of records efficiently
|
||||
cat million-records.json | toon > output.toon
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> When using `--stats`, the full output string is kept in memory for token counting. Omit `--stats` for maximum memory efficiency with very large datasets.
|
||||
|
||||
### Key Folding (Since v1.5)
|
||||
|
||||
Collapse nested wrapper chains to reduce tokens:
|
||||
@@ -190,6 +206,7 @@ toon data.json --key-folding safe --delimiter "\t" --stats -o output.toon
|
||||
- **Pipeline integration** with existing JSON-based workflows
|
||||
- **Flexible formatting** with delimiter and indentation options
|
||||
- **Key folding** to collapse nested wrappers for additional token savings
|
||||
- **Memory-efficient streaming** for processing large datasets without loading everything into memory
|
||||
|
||||
## Related
|
||||
|
||||
|
||||
Reference in New Issue
Block a user