docs: refine quoting rules and example

2026-01-29 23:34:10 +08:00 · 2025-10-28 09:41:23 +01:00
parent c4c65dd72f
commit 5867734881
1 changed files with 16 additions and 46 deletions
--- a/README.md
+++ b/README.md
@@ -580,26 +580,18 @@ encode({ config: {} }) // config:

 ### Quoting Rules

-TOON quotes strings **only when necessary** to maximize token efficiency. Inner spaces are allowed; leading or trailing spaces force quotes. Unicode and emoji are safe unquoted.
+TOON quotes strings **only when necessary** to maximize token efficiency:
+
+- Inner spaces are allowed; leading or trailing spaces force quotes.
+- Unicode and emoji are safe unquoted.
+- Quotes and control characters are escaped with backslash.

 > [!NOTE]
 > When using alternative delimiters (tab or pipe), the quoting rules adapt automatically. Strings containing the active delimiter will be quoted, while other delimiters remain safe.

-#### Keys
+#### Object Keys and Field Names

-Keys are quoted when any of the following is true:
-
-| Condition | Examples |
-|---|---|
-| Contains spaces, commas, colons, quotes, control chars | `"full name"`, `"a,b"`, `"order:id"`, `"tab\there"` |
-| Contains brackets or braces | `"[index]"`, `"{key}"` |
-| Leading hyphen | `"-lead"` |
-| Numeric-only key | `"123"` |
-| Empty key | `""` |
-
-**Notes:**
-
- Quotes and control characters in keys are escaped (e.g., `"he said \"hi\""`, `"line\nbreak"`).
+Keys are unquoted if they match the identifier pattern: start with a letter or underscore, followed by letters, digits, underscores, or dots (e.g., `id`, `userName`, `user_name`, `user.name`, `_private`). All other keys must be quoted (e.g., `"user name"`, `"order-id"`, `"123"`, `"order:id"`, `""`).

 #### String Values

@@ -608,27 +600,17 @@ String values are quoted when any of the following is true:
 | Condition | Examples |
 |---|---|
 | Empty string | `""` |
-| Contains active delimiter, colon, quote, backslash, or control chars | `"a,b"` (comma), `"a\tb"` (tab), `"a\|b"` (pipe), `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"` |
 | Leading or trailing spaces | `" padded "`, `"  "` |
+| Contains active delimiter, colon, quote, backslash, or control chars | `"a,b"` (comma), `"a\tb"` (tab), `"a\|b"` (pipe), `"a:b"`, `"say \"hi\""`, `"C:\\Users"`, `"line1\\nline2"` |
 | Looks like boolean/number/null | `"true"`, `"false"`, `"null"`, `"42"`, `"-3.14"`, `"1e-6"`, `"05"` |
 | Starts with `"- "` (list-like) | `"- item"` |
 | Looks like structural token | `"[5]"`, `"{key}"`, `"[3]: x,y"` |

+**Examples of unquoted strings:** Unicode and emoji are safe (`hello 👋 world`), as are strings with inner spaces (`hello world`).
+
 > [!IMPORTANT]
 > **Delimiter-aware quoting:** Unquoted strings never contain `:` or the active delimiter. This makes TOON reliably parseable with simple heuristics: split key/value on first `: `, and split array values on the delimiter declared in the array header. When using tab or pipe delimiters, commas don't need quoting – only the active delimiter triggers quoting for both array values and object values.

-#### Examples
-
-```
-note: "hello, world"
-items[3]: foo,"true","- item"
-hello 👋 world         // unquoted
-" padded "             // quoted
-value: null            // null value
-name: ""               // empty string (quoted)
-text: "line1\nline2"   // multi-line string (escaped)
-```
-
 ### Tabular Format Requirements

 For arrays of objects to use the efficient tabular format, all of the following must be true:
@@ -651,7 +633,7 @@ Some non-JSON types are automatically normalized for LLM-safe output:

 | Input | Output |
 |---|---|
-| Number (finite) | Decimal form, no scientific notation; `-0` → `0` |
+| Number (finite) | Decimal form, no scientific notation (e.g., `-0` → `0`, `1e6` → `1000000`) |
 | Number (`NaN`, `±Infinity`) | `null` |
 | `BigInt` | Decimal digits (no quotes) |
 | `Date` | ISO string in quotes (e.g., `"2025-01-01T00:00:00.000Z"`) |
@@ -659,14 +641,6 @@ Some non-JSON types are automatically normalized for LLM-safe output:
 | `function` | `null` |
 | `symbol` | `null` |

-Number normalization examples:
-
-```
-0    → 0
-1e6   → 1000000
-1e-6  → 0.000001
-```
-
 ## API

 ### `encode(value: unknown, options?: EncodeOptions): string`
@@ -695,7 +669,7 @@ const items = [
  { sku: 'B2', qty: 1, price: 14.5 }
 ]

-console.log(encode({ items }))
+encode({ items })
 ```

 **Output:**
@@ -715,8 +689,6 @@ The `delimiter` option allows you to choose between comma (default), tab, or pip
 Using tab delimiters instead of commas can reduce token count further, especially for tabular data:

 ```ts
-import { encode } from '@byjohann/toon'
-
 const data = {
  items: [
    { sku: 'A1', name: 'Widget', qty: 2, price: 9.99 },
@@ -724,7 +696,7 @@ const data = {
  ]
 }

-console.log(encode(data, { delimiter: '\t' }))
+encode(data, { delimiter: '\t' })
 ```

 **Output:**
@@ -751,7 +723,7 @@ items[2	]{sku	name	qty	price}:
 Pipe delimiters offer a middle ground between commas and tabs:

 ```ts
-console.log(encode(data, { delimiter: '|' }))
+encode(data, { delimiter: '|' })
 ```

 **Output:**
@@ -767,8 +739,6 @@ items[2|]{sku|name|qty|price}:
 The `lengthMarker` option adds an optional hash (`#`) prefix to array lengths to emphasize that the bracketed value represents a count, not an index:

 ```ts
-import { encode } from '@byjohann/toon'
-
 const data = {
  tags: ['reading', 'gaming', 'coding'],
  items: [
@@ -777,14 +747,14 @@ const data = {
  ],
 }

-console.log(encode(data, { lengthMarker: '#' }))
+encode(data, { lengthMarker: '#' })
 // tags[#3]: reading,gaming,coding
 // items[#2]{sku,qty,price}:
 //   A1,2,9.99
 //   B2,1,14.5

 // Works with custom delimiters
-console.log(encode(data, { lengthMarker: '#', delimiter: '|' }))
+encode(data, { lengthMarker: '#', delimiter: '|' })
 // tags[#3|]: reading|gaming|coding
 // items[#2|]{sku|qty|price}:
 //   A1|2|9.99