feat: streaming decode functionality with event-based parsing (closes #131)

This commit is contained in:
Johann Schopplich
2025-11-21 22:29:57 +01:00
parent 9ebad53ea3
commit 6c57a14009
19 changed files with 2220 additions and 431 deletions

View File

@@ -108,19 +108,25 @@ cat data.toon | toon --decode
Both encoding and decoding operations use streaming output, writing incrementally without building the full output string in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
**JSON → TOON (Encode)**
- Streams TOON lines to output
- No full TOON string in memory
**JSON → TOON (Encode)**:
**TOON → JSON (Decode)**
- Streams JSON tokens to output
- No full JSON string in memory
- Streams TOON lines to output.
- No full TOON string in memory.
**TOON → JSON (Decode)**:
- Uses the same event-based streaming decoder as the `decodeStream` API in `@toon-format/toon`.
- Streams JSON tokens to output.
- No full JSON string in memory.
- When `--expand-paths safe` is enabled, falls back to non-streaming decode internally to apply deep-merge expansion before writing JSON.
Process large files with minimal memory usage:
```bash
# Encode large JSON file with minimal memory usage
# Encode large JSON file
toon huge-dataset.json -o output.toon
# Decode large TOON file with minimal memory usage
# Decode large TOON file
toon huge-dataset.toon -o output.json
# Process millions of records efficiently via stdin

View File

@@ -237,3 +237,5 @@ Round-tripping is lossless: `decode(encode(x))` always equals `x` (after normali
## Where to Go Next
Now that you've seen your first TOON document, read the [Format Overview](/guide/format-overview) for complete syntax details (objects, arrays, quoting rules, key folding), then explore [Using TOON with LLMs](/guide/llm-prompts) to see how to use it effectively in prompts. For implementation details, check the [API reference](/reference/api) (TypeScript) or the [specification](/reference/spec) (language-agnostic normative rules).
For large datasets or streaming use-cases, see `encodeLines`, `decodeFromLines`, and `decodeStream` in the [API reference](/reference/api).

View File

@@ -118,6 +118,31 @@ toon large-dataset.json --output output.toon
This streaming approach prevents out-of-memory errors when preparing large context windows for LLMs. For complete details on `encodeLines()`, see the [API reference](/reference/api#encodelines).
**Consuming streaming LLM outputs:** If your LLM client exposes streaming text and you buffer by lines, you can decode TOON incrementally:
```ts
import { decodeFromLines } from '@toon-format/toon'
// Buffer streaming response into lines
const lines: string[] = []
let buffer = ''
for await (const chunk of modelStream) {
buffer += chunk
let index: number
while ((index = buffer.indexOf('\n')) !== -1) {
lines.push(buffer.slice(0, index))
buffer = buffer.slice(index + 1)
}
}
// Decode buffered lines
const data = decodeFromLines(lines)
```
For streaming decode APIs, see [`decodeFromLines()`](/reference/api#decodeFromLines-lines-options) and [`decodeStream()`](/reference/api#decodeStream-source-options).
## Tips and Pitfalls
**Show, don't describe.** Don't explain TOON syntax in detail just show an example. Models learn the pattern from context. A simple code block with 2-5 rows is more effective than paragraphs of explanation.

View File

@@ -300,6 +300,227 @@ decode(toon, { expandPaths: 'safe', strict: false })
```
:::
## `decodeFromLines(lines, options?)`
Decodes TOON format from pre-split lines into a JavaScript value. This is a streaming-friendly wrapper around the event-based decoder that builds the full value in memory.
Useful when you already have lines as an array or iterable (e.g., from file streams, readline interfaces, or network responses) and want the standard decode behavior with path expansion support.
### Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `lines` | `Iterable<string>` | Iterable of TOON lines (without trailing newlines) |
| `options` | `DecodeOptions?` | Optional decoding configuration (see below) |
### Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
| `expandPaths` | `'off'` \| `'safe'` | `'off'` | Enable path expansion to reconstruct dotted keys into nested objects |
### Return Value
Returns a `JsonValue` (the parsed JavaScript value: object, array, or primitive).
### Example
**Basic usage with arrays:**
```ts
import { decodeFromLines } from '@toon-format/toon'
const lines = ['name: Alice', 'age: 30']
const value = decodeFromLines(lines)
// { name: 'Alice', age: 30 }
```
**Streaming from Node.js readline:**
```ts
import { createReadStream } from 'node:fs'
import { createInterface } from 'node:readline'
import { decodeFromLines } from '@toon-format/toon'
const rl = createInterface({
input: createReadStream('data.toon'),
crlfDelay: Infinity,
})
const value = decodeFromLines(rl)
console.log(value)
```
**With path expansion:**
```ts
const lines = ['user.name: Alice', 'user.age: 30']
const value = decodeFromLines(lines, { expandPaths: 'safe' })
// { user: { name: 'Alice', age: 30 } }
```
## `decodeStreamSync(lines, options?)`
Synchronously decodes TOON lines into a stream of JSON events. This function yields structured events that represent the JSON data model without building the full value tree.
Useful for streaming processing, custom transformations, or memory-efficient parsing of large datasets where you don't need the full value in memory.
::: info Event Streaming
This is a low-level API that returns individual parse events. For most use cases, [`decodeFromLines()`](#decodeFromLines-lines-options) or [`decode()`](#decode-input-options) are more convenient.
Path expansion (`expandPaths: 'safe'`) is **not supported** in streaming mode since it requires the full value tree.
:::
### Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `lines` | `Iterable<string>` | Iterable of TOON lines (without trailing newlines) |
| `options` | `DecodeStreamOptions?` | Optional streaming decoding configuration (see below) |
### Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
### Return Value
Returns an `Iterable<JsonStreamEvent>` that yields structured events.
### Event Types
Events represent the structure of the JSON data model:
```ts
type JsonStreamEvent
= | { type: 'startObject' }
| { type: 'endObject' }
| { type: 'startArray' }
| { type: 'endArray' }
| { type: 'key', key: string }
| { type: 'primitive', value: JsonPrimitive }
type JsonPrimitive = string | number | boolean | null
```
### Example
**Basic event streaming:**
```ts
import { decodeStreamSync } from '@toon-format/toon'
const lines = ['name: Alice', 'age: 30']
for (const event of decodeStreamSync(lines)) {
console.log(event)
}
// Output:
// { type: 'startObject' }
// { type: 'key', key: 'name' }
// { type: 'primitive', value: 'Alice' }
// { type: 'key', key: 'age' }
// { type: 'primitive', value: 30 }
// { type: 'endObject' }
```
**Custom processing:**
```ts
import { decodeStreamSync } from '@toon-format/toon'
const lines = ['users[2]{id,name}:', ' 1,Alice', ' 2,Bob']
let userCount = 0
for (const event of decodeStreamSync(lines)) {
if (event.type === 'endObject' && userCount < 2) {
userCount++
console.log(`Processed user ${userCount}`)
}
}
```
## `decodeStream(source, options?)`
Asynchronously decodes TOON lines into a stream of JSON events. This is the async version of [`decodeStreamSync()`](#decodeStreamSync-lines-options), supporting both synchronous and asynchronous iterables.
Useful for processing file streams, network responses, or other async sources where you want to handle data incrementally as it arrives.
### Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `source` | `AsyncIterable<string>` \| `Iterable<string>` | Async or sync iterable of TOON lines (without trailing newlines) |
| `options` | `DecodeStreamOptions?` | Optional streaming decoding configuration (see below) |
### Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
### Return Value
Returns an `AsyncIterable<JsonStreamEvent>` that yields structured events asynchronously.
### Example
**Streaming from file:**
```ts
import { createReadStream } from 'node:fs'
import { createInterface } from 'node:readline'
import { decodeStream } from '@toon-format/toon'
const fileStream = createReadStream('data.toon', 'utf-8')
const rl = createInterface({ input: fileStream, crlfDelay: Infinity })
for await (const event of decodeStream(rl)) {
console.log(event)
// Process events as they arrive
}
```
**Processing events incrementally:**
```ts
import { decodeStream } from '@toon-format/toon'
const lines = getAsyncLineSource() // AsyncIterable<string>
for await (const event of decodeStream(lines, { strict: true })) {
if (event.type === 'key' && event.key === 'id') {
// Next event will be the id value
const valueEvent = await decodeStream(lines).next()
if (valueEvent.value?.type === 'primitive') {
console.log('Found ID:', valueEvent.value.value)
}
}
}
```
**Auto-detection of sync/async sources:**
```ts
// Works with sync iterables
const syncLines = ['name: Alice', 'age: 30']
for await (const event of decodeStream(syncLines)) {
console.log(event)
}
// Works with async iterables
const asyncLines = readLinesFromNetwork()
for await (const event of decodeStream(asyncLines)) {
console.log(event)
}
```
## Round-Trip Compatibility
TOON provides lossless round-trips after normalization: