feat(cli): memory-efficient streaming for encoding

This commit is contained in:
Johann Schopplich
2025-11-21 14:02:22 +01:00
parent be8bcfe9b2
commit 1c003c6118
7 changed files with 308 additions and 50 deletions

View File

@@ -1,6 +1,6 @@
# Command Line Interface
The `@toon-format/cli` package provides a command-line interface for encoding JSON to TOON and decoding TOON back to JSON. Use it for quick conversions without writing code, estimating token savings before sending data to LLMs, or integrating TOON into shell pipelines with tools like curl and jq. It supports stdin/stdout workflows, multiple delimiter options, token statistics, and all encoding/decoding features available in the library.
The `@toon-format/cli` package provides a command-line interface for encoding JSON to TOON and decoding TOON back to JSON. Use it to analyze token savings before integrating TOON into your application, or to process JSON data through TOON in shell pipelines using stdin/stdout with tools like curl and jq. The CLI supports token statistics, streaming for large datasets, and all encoding options available in the library.
The CLI is built on top of the `@toon-format/toon` TypeScript implementation and adheres to the [latest specification](/reference/spec).
@@ -108,6 +108,14 @@ cat data.toon | toon --decode
JSON→TOON conversions use line-by-line encoding internally, which avoids holding the entire TOON document in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
```bash
# Encode large JSON file with minimal memory usage
toon huge-dataset.json -o output.toon
# Process millions of records efficiently via stdin
cat million-records.json | toon > output.toon
```
::: info Token Statistics
When using the `--stats` flag, the CLI builds the full TOON string once to compute accurate token counts. For maximum memory efficiency on very large files, omit `--stats`.
:::
@@ -139,6 +147,15 @@ toon data.json --stats -o output.toon
This helps you estimate token cost savings before sending data to LLMs.
Example output:
```
✔ Encoded data.json → output.toon
Token estimates: ~15,145 (JSON) → ~8,745 (TOON)
✔ Saved ~6,400 tokens (-42.3%)
```
### Alternative Delimiters
TOON supports three delimiters: comma (default), tab, and pipe. Alternative delimiters can provide additional token savings in specific contexts.

View File

@@ -95,6 +95,29 @@ const toon = encode(data, { delimiter: '\t' })
Tell the model "fields are tab-separated" when using tabs. For more on delimiters, see the [Format Overview](/guide/format-overview#delimiter-options).
## Streaming Large Outputs
When working with large datasets (thousands of records or deeply nested structures), use `encodeLines()` to stream TOON output line-by-line instead of building the full string in memory.
```ts
import { encodeLines } from '@toon-format/toon'
const largeData = await fetchThousandsOfRecords()
// Stream large dataset without loading full string in memory
for (const line of encodeLines(largeData, { delimiter: '\t' })) {
process.stdout.write(`${line}\n`)
}
```
The CLI also supports streaming for memory-efficient JSON-to-TOON conversion:
```bash
toon large-dataset.json --output output.toon
```
This streaming approach prevents out-of-memory errors when preparing large context windows for LLMs. For complete details on `encodeLines()`, see the [API reference](/reference/api#encodelines).
## Tips and Pitfalls
**Show, don't describe.** Don't explain TOON syntax in detail just show an example. Models learn the pattern from context. A simple code block with 2-5 rows is more effective than paragraphs of explanation.

View File

@@ -129,14 +129,14 @@ encode(data, { delimiter: '\t', keyFolding: 'safe' })
## `encodeLines(value, options?)`
Converts any JSON-serializable value to TOON format as a sequence of lines, without building the full string in memory. Suitable for streaming large outputs to files, HTTP responses, or process stdout.
**Preferred method for streaming TOON output.** Converts any JSON-serializable value to TOON format as a sequence of lines, without building the full string in memory. Suitable for streaming large outputs to files, HTTP responses, or process stdout.
```ts
import { encodeLines } from '@toon-format/toon'
// Stream to stdout
// Stream to stdout (Node.js)
for (const line of encodeLines(data)) {
console.log(line)
process.stdout.write(`${line}\n`)
}
// Write to file line-by-line
@@ -158,7 +158,7 @@ const lineArray = Array.from(encodeLines(data))
### Return Value
Returns an `Iterable<string>` that yields TOON lines one at a time. Each yielded string is a single line without a trailing newline character.
Returns an `Iterable<string>` that yields TOON lines one at a time. **Each yielded string is a single line without a trailing newline character** — you must add `\n` when writing to streams or stdout.
::: info Relationship to `encode()`
`encode(value, options)` is equivalent to: