github/toon

Fork 0

mirror of https://github.com/voson-wang/toon.git synced 2026-01-29 15:24:10 +08:00

Files

Johann Schopplich 9ebad53ea3 feat(cli): stream output for both encoding and decoding

2025-11-21 16:52:34 +01:00

5.9 KiB

Raw Blame History

@toon-format/cli

Command-line tool for converting JSON to TOON and back, with token analysis and streaming support.

TOON (Token-Oriented Object Notation) is a compact, human-readable encoding of the JSON data model that minimizes tokens for LLM input. The CLI lets you test conversions, analyze token savings, and integrate TOON into shell pipelines with stdin/stdout support—no code required.

Installation

# npm
npm install -g @toon-format/cli

# pnpm
pnpm add -g @toon-format/cli

# yarn
yarn global add @toon-format/cli

Or use directly with npx:

npx @toon-format/cli [options] [input]

Usage

toon [options] [input]

Standard input: Omit the input argument or use - to read from stdin. This enables piping data directly from other commands.

Auto-detection: The CLI automatically detects the operation based on file extension (.json → encode, .toon → decode). When reading from stdin, use --encode or --decode flags to specify the operation (defaults to encode).

Basic Examples

# Encode JSON to TOON (auto-detected)
toon input.json -o output.toon

# Decode TOON to JSON (auto-detected)
toon data.toon -o output.json

# Output to stdout
toon input.json

# Pipe from stdin
cat data.json | toon
echo '{"name": "Ada"}' | toon

# Decode from stdin
cat data.toon | toon --decode

Options

Option	Description
`-o, --output <file>`	Output file path (prints to stdout if omitted)
`-e, --encode`	Force encode mode (overrides auto-detection)
`-d, --decode`	Force decode mode (overrides auto-detection)
`--delimiter <char>`	Array delimiter: `,` (comma), `\t` (tab), `\|` (pipe)
`--indent <number>`	Indentation size (default: `2`)
`--stats`	Show token count estimates and savings (encode only)
`--no-strict`	Disable strict validation when decoding
`--key-folding <mode>`	Enable key folding: `off`, `safe` (default: `off`)
`--flatten-depth <number>`	Maximum folded segment count when key folding is enabled (default: `Infinity`)
`--expand-paths <mode>`	Enable path expansion: `off`, `safe` (default: `off`)

Advanced Examples

Token Statistics

Show token savings when encoding:

toon data.json --stats -o output.toon

Example output:

✔ Encoded data.json → output.toon

ℹ Token estimates: ~15,145 (JSON) → ~8,745 (TOON)
✔ Saved ~6,400 tokens (-42.3%)

Alternative Delimiters

Tab-separated (often more token-efficient)

toon data.json --delimiter "\t" -o output.toon

Lenient Decoding

Skip validation for faster processing:

toon data.toon --no-strict -o output.json

Stdin Workflows

# Convert API response to TOON
curl https://api.example.com/data | toon --stats

# Process large dataset
cat large-dataset.json | toon --delimiter "\t" > output.toon

# Chain with other tools
jq '.results' data.json | toon > filtered.toon

Large Dataset Processing

The CLI uses streaming output for both encoding and decoding, writing incrementally without building the full output string in memory:

# Encode large JSON file with minimal memory usage
toon huge-dataset.json -o output.toon

# Decode large TOON file with streaming JSON output
toon huge-dataset.toon -o output.json

# Process millions of records efficiently via stdin
cat million-records.json | toon > output.toon
cat million-records.toon | toon --decode > output.json

Memory efficiency:

Encode (JSON → TOON): Streams TOON lines to output without full string in memory
Decode (TOON → JSON): Streams JSON tokens to output without full string in memory
Peak memory usage scales with data depth, not total size

Note

When using --stats with encode, the full output string is kept in memory for token counting. Omit --stats for maximum memory efficiency with very large datasets.

Key Folding (Since v1.5)

Collapse nested wrapper chains to reduce tokens:

Basic key folding

# Encode with key folding
toon input.json --key-folding safe -o output.toon

For data like:

{
  "data": {
    "metadata": {
      "items": ["a", "b"]
    }
  }
}

Output becomes:

data.metadata.items[2]: a,b

Instead of:

data:
  metadata:
    items[2]: a,b

Limit folding depth

# Fold maximum 2 levels deep
toon input.json --key-folding safe --flatten-depth 2 -o output.toon

Path expansion on decode

# Reconstruct nested structure from folded keys
toon data.toon --expand-paths safe -o output.json

Round-trip workflow

# Encode with folding
toon input.json --key-folding safe -o compressed.toon

# Decode with expansion (restores original structure)
toon compressed.toon --expand-paths safe -o output.json

# Verify round-trip
diff input.json output.json

Combined with other options

# Key folding + tab delimiter + stats
toon data.json --key-folding safe --delimiter "\t" --stats -o output.toon

Why Use the CLI?

Quick conversions between formats without writing code
Token analysis to see potential savings before sending to LLMs
Pipeline integration with existing JSON-based workflows
Flexible formatting with delimiter and indentation options
Key folding to collapse nested wrappers for additional token savings
Memory-efficient streaming for both encode and decode operations - process large datasets without loading entire outputs into memory

@toon-format/toon - JavaScript/TypeScript library
Full specification - Complete format documentation
Website - Interactive examples and guides

5.9 KiB Raw Blame History Unescape Escape