mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 23:34:10 +08:00
docs: refine quoting rules and example
This commit is contained in:
62
README.md
62
README.md
@@ -580,26 +580,18 @@ encode({ config: {} }) // config:
|
|||||||
|
|
||||||
### Quoting Rules
|
### Quoting Rules
|
||||||
|
|
||||||
TOON quotes strings **only when necessary** to maximize token efficiency. Inner spaces are allowed; leading or trailing spaces force quotes. Unicode and emoji are safe unquoted.
|
TOON quotes strings **only when necessary** to maximize token efficiency:
|
||||||
|
|
||||||
|
- Inner spaces are allowed; leading or trailing spaces force quotes.
|
||||||
|
- Unicode and emoji are safe unquoted.
|
||||||
|
- Quotes and control characters are escaped with backslash.
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> When using alternative delimiters (tab or pipe), the quoting rules adapt automatically. Strings containing the active delimiter will be quoted, while other delimiters remain safe.
|
> When using alternative delimiters (tab or pipe), the quoting rules adapt automatically. Strings containing the active delimiter will be quoted, while other delimiters remain safe.
|
||||||
|
|
||||||
#### Keys
|
#### Object Keys and Field Names
|
||||||
|
|
||||||
Keys are quoted when any of the following is true:
|
Keys are unquoted if they match the identifier pattern: start with a letter or underscore, followed by letters, digits, underscores, or dots (e.g., `id`, `userName`, `user_name`, `user.name`, `_private`). All other keys must be quoted (e.g., `"user name"`, `"order-id"`, `"123"`, `"order:id"`, `""`).
|
||||||
|
|
||||||
| Condition | Examples |
|
|
||||||
|---|---|
|
|
||||||
| Contains spaces, commas, colons, quotes, control chars | `"full name"`, `"a,b"`, `"order:id"`, `"tab\there"` |
|
|
||||||
| Contains brackets or braces | `"[index]"`, `"{key}"` |
|
|
||||||
| Leading hyphen | `"-lead"` |
|
|
||||||
| Numeric-only key | `"123"` |
|
|
||||||
| Empty key | `""` |
|
|
||||||
|
|
||||||
**Notes:**
|
|
||||||
|
|
||||||
- Quotes and control characters in keys are escaped (e.g., `"he said \"hi\""`, `"line\nbreak"`).
|
|
||||||
|
|
||||||
#### String Values
|
#### String Values
|
||||||
|
|
||||||
@@ -608,27 +600,17 @@ String values are quoted when any of the following is true:
|
|||||||
| Condition | Examples |
|
| Condition | Examples |
|
||||||
|---|---|
|
|---|---|
|
||||||
| Empty string | `""` |
|
| Empty string | `""` |
|
||||||
| Contains active delimiter, colon, quote, backslash, or control chars | `"a,b"` (comma), `"a\tb"` (tab), `"a\|b"` (pipe), `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"` |
|
|
||||||
| Leading or trailing spaces | `" padded "`, `" "` |
|
| Leading or trailing spaces | `" padded "`, `" "` |
|
||||||
|
| Contains active delimiter, colon, quote, backslash, or control chars | `"a,b"` (comma), `"a\tb"` (tab), `"a\|b"` (pipe), `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"` |
|
||||||
| Looks like boolean/number/null | `"true"`, `"false"`, `"null"`, `"42"`, `"-3.14"`, `"1e-6"`, `"05"` |
|
| Looks like boolean/number/null | `"true"`, `"false"`, `"null"`, `"42"`, `"-3.14"`, `"1e-6"`, `"05"` |
|
||||||
| Starts with `"- "` (list-like) | `"- item"` |
|
| Starts with `"- "` (list-like) | `"- item"` |
|
||||||
| Looks like structural token | `"[5]"`, `"{key}"`, `"[3]: x,y"` |
|
| Looks like structural token | `"[5]"`, `"{key}"`, `"[3]: x,y"` |
|
||||||
|
|
||||||
|
**Examples of unquoted strings:** Unicode and emoji are safe (`hello 👋 world`), as are strings with inner spaces (`hello world`).
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
> **Delimiter-aware quoting:** Unquoted strings never contain `:` or the active delimiter. This makes TOON reliably parseable with simple heuristics: split key/value on first `: `, and split array values on the delimiter declared in the array header. When using tab or pipe delimiters, commas don't need quoting – only the active delimiter triggers quoting for both array values and object values.
|
> **Delimiter-aware quoting:** Unquoted strings never contain `:` or the active delimiter. This makes TOON reliably parseable with simple heuristics: split key/value on first `: `, and split array values on the delimiter declared in the array header. When using tab or pipe delimiters, commas don't need quoting – only the active delimiter triggers quoting for both array values and object values.
|
||||||
|
|
||||||
#### Examples
|
|
||||||
|
|
||||||
```
|
|
||||||
note: "hello, world"
|
|
||||||
items[3]: foo,"true","- item"
|
|
||||||
hello 👋 world // unquoted
|
|
||||||
" padded " // quoted
|
|
||||||
value: null // null value
|
|
||||||
name: "" // empty string (quoted)
|
|
||||||
text: "line1\nline2" // multi-line string (escaped)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tabular Format Requirements
|
### Tabular Format Requirements
|
||||||
|
|
||||||
For arrays of objects to use the efficient tabular format, all of the following must be true:
|
For arrays of objects to use the efficient tabular format, all of the following must be true:
|
||||||
@@ -651,7 +633,7 @@ Some non-JSON types are automatically normalized for LLM-safe output:
|
|||||||
|
|
||||||
| Input | Output |
|
| Input | Output |
|
||||||
|---|---|
|
|---|---|
|
||||||
| Number (finite) | Decimal form, no scientific notation; `-0` → `0` |
|
| Number (finite) | Decimal form, no scientific notation (e.g., `-0` → `0`, `1e6` → `1000000`) |
|
||||||
| Number (`NaN`, `±Infinity`) | `null` |
|
| Number (`NaN`, `±Infinity`) | `null` |
|
||||||
| `BigInt` | Decimal digits (no quotes) |
|
| `BigInt` | Decimal digits (no quotes) |
|
||||||
| `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
|
| `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
|
||||||
@@ -659,14 +641,6 @@ Some non-JSON types are automatically normalized for LLM-safe output:
|
|||||||
| `function` | `null` |
|
| `function` | `null` |
|
||||||
| `symbol` | `null` |
|
| `symbol` | `null` |
|
||||||
|
|
||||||
Number normalization examples:
|
|
||||||
|
|
||||||
```
|
|
||||||
-0 → 0
|
|
||||||
1e6 → 1000000
|
|
||||||
1e-6 → 0.000001
|
|
||||||
```
|
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
### `encode(value: unknown, options?: EncodeOptions): string`
|
### `encode(value: unknown, options?: EncodeOptions): string`
|
||||||
@@ -695,7 +669,7 @@ const items = [
|
|||||||
{ sku: 'B2', qty: 1, price: 14.5 }
|
{ sku: 'B2', qty: 1, price: 14.5 }
|
||||||
]
|
]
|
||||||
|
|
||||||
console.log(encode({ items }))
|
encode({ items })
|
||||||
```
|
```
|
||||||
|
|
||||||
**Output:**
|
**Output:**
|
||||||
@@ -715,8 +689,6 @@ The `delimiter` option allows you to choose between comma (default), tab, or pip
|
|||||||
Using tab delimiters instead of commas can reduce token count further, especially for tabular data:
|
Using tab delimiters instead of commas can reduce token count further, especially for tabular data:
|
||||||
|
|
||||||
```ts
|
```ts
|
||||||
import { encode } from '@byjohann/toon'
|
|
||||||
|
|
||||||
const data = {
|
const data = {
|
||||||
items: [
|
items: [
|
||||||
{ sku: 'A1', name: 'Widget', qty: 2, price: 9.99 },
|
{ sku: 'A1', name: 'Widget', qty: 2, price: 9.99 },
|
||||||
@@ -724,7 +696,7 @@ const data = {
|
|||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(encode(data, { delimiter: '\t' }))
|
encode(data, { delimiter: '\t' })
|
||||||
```
|
```
|
||||||
|
|
||||||
**Output:**
|
**Output:**
|
||||||
@@ -751,7 +723,7 @@ items[2 ]{sku name qty price}:
|
|||||||
Pipe delimiters offer a middle ground between commas and tabs:
|
Pipe delimiters offer a middle ground between commas and tabs:
|
||||||
|
|
||||||
```ts
|
```ts
|
||||||
console.log(encode(data, { delimiter: '|' }))
|
encode(data, { delimiter: '|' })
|
||||||
```
|
```
|
||||||
|
|
||||||
**Output:**
|
**Output:**
|
||||||
@@ -767,8 +739,6 @@ items[2|]{sku|name|qty|price}:
|
|||||||
The `lengthMarker` option adds an optional hash (`#`) prefix to array lengths to emphasize that the bracketed value represents a count, not an index:
|
The `lengthMarker` option adds an optional hash (`#`) prefix to array lengths to emphasize that the bracketed value represents a count, not an index:
|
||||||
|
|
||||||
```ts
|
```ts
|
||||||
import { encode } from '@byjohann/toon'
|
|
||||||
|
|
||||||
const data = {
|
const data = {
|
||||||
tags: ['reading', 'gaming', 'coding'],
|
tags: ['reading', 'gaming', 'coding'],
|
||||||
items: [
|
items: [
|
||||||
@@ -777,14 +747,14 @@ const data = {
|
|||||||
],
|
],
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log(encode(data, { lengthMarker: '#' }))
|
encode(data, { lengthMarker: '#' })
|
||||||
// tags[#3]: reading,gaming,coding
|
// tags[#3]: reading,gaming,coding
|
||||||
// items[#2]{sku,qty,price}:
|
// items[#2]{sku,qty,price}:
|
||||||
// A1,2,9.99
|
// A1,2,9.99
|
||||||
// B2,1,14.5
|
// B2,1,14.5
|
||||||
|
|
||||||
// Works with custom delimiters
|
// Works with custom delimiters
|
||||||
console.log(encode(data, { lengthMarker: '#', delimiter: '|' }))
|
encode(data, { lengthMarker: '#', delimiter: '|' })
|
||||||
// tags[#3|]: reading|gaming|coding
|
// tags[#3|]: reading|gaming|coding
|
||||||
// items[#2|]{sku|qty|price}:
|
// items[#2|]{sku|qty|price}:
|
||||||
// A1|2|9.99
|
// A1|2|9.99
|
||||||
|
|||||||
Reference in New Issue
Block a user