docs: update TOON specs to v1.1 with decoding behavior

2026-01-29 23:34:10 +08:00 · 2025-10-29 08:27:44 +01:00
parent 92b51db825
commit 61fb751540
1 changed files with 271 additions and 94 deletions
--- a/SPEC.md
+++ b/SPEC.md
@@ -1,31 +1,34 @@
-# TOON Specification (v1)
+# TOON Specification (v1.1)

-Status: Draft, normative where indicated. This version specifies encoding (producer behavior). A formal decoding spec is out of scope for v1.
+Status: Draft, normative where indicated. This version specifies both encoding (producer behavior) and decoding (parser behavior).

 - Normative statements use RFC 2119/8174 keywords: MUST, MUST NOT, SHOULD, SHOULD NOT, MAY.
- This spec targets implementers of encoders/validators, tool authors, and practitioners embedding TOON in LLM prompts.
+- This spec targets implementers of encoders/decoders/validators, tool authors, and practitioners embedding TOON in LLM prompts.

 Changelog:
+- v1.1: Made decoding behavior normative; added decoding semantics, strict-mode validation rules, delimiter-aware parsing, and reference decoding algorithms. Added decoder options (indent, strict).
 - v1: Initial encoding + normalization + conformance rules based on reference encoder and test suite.

 Scope:
- This document defines the data model, normalization (for the reference JavaScript/TypeScript encoder), concrete syntax, and conformance requirements for producing TOON. Decoding is informative only and not standardized in v1.
+- This document defines the data model, encoding normalization (for the reference JavaScript/TypeScript encoder), concrete syntax, decoding semantics, and conformance requirements for producing and consuming TOON.

 ## 1. Terminology and Conventions

 - TOON document: A sequence of UTF-8 text lines formatted according to this spec.
- Line: A sequence of non-newline characters terminated by LF (U+000A) in serialized form. TOON output MUST use LF line endings.
+- Line: A sequence of non-newline characters terminated by LF (U+000A) in serialized form. Encoders MUST use LF line endings.
 - Indentation level (depth): The number of indentation units (spaces) applied to a line. Depth 0 lines have no leading indentation.
 - Indentation unit: A fixed number of spaces per level (default 2). Tabs MUST NOT be used for indentation.
 - Header: The bracketed declaration for arrays, optionally followed by a field list, and terminating with a colon: e.g., key[3]: or items[2]{a,b}:.
 - Field list: The brace-enclosed, delimiter-separated list of field names for tabular arrays: {f1<delim>f2}.
- List item: A line beginning with a hyphen and a space at a given depth (“- ”), representing an element in an expanded array form.
+- List item: A line beginning with a hyphen and a space at a given depth ("- "), representing an element in an expanded array form.
 - Delimiter: The character used to separate array/tabular values: comma (default), tab, or pipe.
- Length marker: An optional “#” prefix for array lengths in headers, e.g., [#3].
+- Active delimiter: The delimiter declared by the closest array header in scope. Used to split inline primitive arrays and tabular rows under that header.
+- Length marker: An optional "#" prefix for array lengths in headers, e.g., [#3]. Decoders MUST accept and ignore the marker semantically.
 - Primitive: string, number, boolean, or null.
 - Object: Mapping from string keys to JsonValue.
 - Array: Ordered sequence of JsonValue.
 - JsonValue: Primitive | Object | Array.
+- Strict mode: Decoder mode that enforces array lengths, tabular row counts, and delimiter consistency; also rejects invalid escapes and missing colons (default: true).

 Notation:
 - Regular expressions appear in slash-delimited form.
@@ -40,7 +43,7 @@ Notation:
 - Ordering:
  - Array order is preserved.
  - Object key order is preserved as encountered by the encoder.
- Numeric canonicalization:
+- Numeric canonicalization (encoding):
  - -0 MUST be normalized to 0.
  - Finite numbers MUST be rendered without scientific notation (e.g., 1e6 → 1000000, 1e-6 → 0.000001), as per host-language number-to-string rules that avoid exponent notation in these cases.
 - Null semantics: null is represented as the literal null.
@@ -63,6 +66,31 @@ The reference encoder normalizes non-JSON values to the data model as follows:

 Note: Other language ports SHOULD apply analogous normalization strategies consistent with this spec’s data model and encoding rules.

+## 3A. Host-Language Interpretation (Reference Decoder)
+
+Decoders map text tokens to host values as follows:
+
+- Quoted tokens (strings and keys):
+  - MUST be unescaped using only these escape sequences:
+    - "\\" → backslash
+    - "\"" → double quote
+    - "\n" → newline
+    - "\r" → carriage return
+    - "\t" → tab
+  - Any other escape (e.g., "\x", trailing backslash) MUST be rejected.
+  - Unterminated quotes MUST be rejected.
+  - Quoted primitives remain strings even if they lexically resemble numbers, booleans, or null (e.g., "true" → "true").
+- Unquoted value tokens:
+  - The exact tokens true, false, null map to booleans/null.
+  - Numeric parsing:
+    - MUST accept standard decimal and exponent forms (e.g., 42, -3.14, 1e-6).
+    - MUST reject leading-zero decimals (e.g., "05", "0001"); such tokens MUST be treated as strings.
+    - Only finite numbers are represented in TOON text; non-finite are not expected from conforming encoders.
+  - Otherwise, the token is a string.
+- Keys:
+  - Decoded as strings. Quoted keys MUST be unescaped as above.
+  - Missing colon after a (quoted or unquoted) key MUST be treated as an error.
+
 ## 4. Concrete Syntax Overview

 TOON is a deterministic, line-oriented, indentation-based notation:
@@ -72,14 +100,18 @@ TOON is a deterministic, line-oriented, indentation-based notation:
  - key: alone for nested or empty objects, with nested fields indented one level.
 - Arrays:
  - Primitive arrays are inline: key[N<delim?>]: v1<delim>v2.
-  - Arrays of arrays (primitives): expanded list under a header: key[N<delim?>]: then “- [M<delim?>]: …” lines.
+  - Arrays of arrays (primitives): expanded list under a header: key[N<delim?>]: then "- [M<delim?>]: …" lines.
  - Arrays of objects:
    - Tabular form when uniform and primitive-only: key[N<delim?>]{f1<delim>f2}: then one row per line.
-    - Otherwise expanded list: key[N<delim?>]: with “- …” items, following object-as-list-item rules.
- Whitespace invariants:
+    - Otherwise expanded list: key[N<delim?>]: with "- …" items, following object-as-list-item rules.
+- Whitespace invariants (encoding):
  - No trailing spaces at the end of any line.
  - No trailing newline at the end of the document.
-  - One space after “: ” in key: value lines and after array headers when followed by inline values (non-empty primitive arrays).
+  - One space after ": " in key: value lines and after array headers when followed by inline values (non-empty primitive arrays).
+- Decoder discovery:
+  - If the first non-empty depth-0 line is a valid root array header ("[ … ]:"), decode a root array.
+  - If the document has a single line that is neither a valid array header nor a key-value line, decode it as a single primitive.
+  - Otherwise, decode an object.

 ## 5. Tokens and Lexical Elements

@@ -88,13 +120,16 @@ TOON is a deterministic, line-oriented, indentation-based notation:
  - Comma (,) is the default.
  - Tab (\t) and pipe (|) are supported alternatives.
  - The active delimiter MAY appear inside array headers (see Section 7).
- Indentation unit: default 2 spaces per level; configurable at encode-time.
- List item markers: “- ” (hyphen + single space) at the appropriate indentation level. An empty object as a list item is represented as a lone hyphen (“-”).
+- Indentation unit: default 2 spaces per level; configurable at encode-time and decode-time. Tabs MUST NOT be used for indentation.
+- List item markers: "- " (hyphen + single space) at the appropriate indentation level. An empty object as a list item is represented as a lone hyphen ("-").
 - Character set: UTF-8. Tabs MUST NOT appear as indentation but MAY appear as the chosen delimiter or inside quoted strings via escapes.
+- Decoding constraints:
+  - Quoted strings and keys MUST use only the five escapes listed in Section 3A; others MUST error.
+  - Decoders MUST locate the colon that follows the header (after any [..] and optional {..}) for arrays; missing colon MUST error.

-## 6. String and Key Encoding
+## 6. Strings and Keys (Encoding and Decoding)

-6.1 Escaping
+6.1 Escaping (Encoding and Decoding)

 The following characters in quoted strings and keys MUST be escaped:
 - Backslash: "\\" → "\\\\"
@@ -103,12 +138,14 @@ The following characters in quoted strings and keys MUST be escaped:
 - Carriage return: U+000D → "\\r"
 - Tab: U+0009 → "\\t"

-6.2 Quoting Rules for String Values
+Decoders MUST reject any other escape sequence and unterminated strings.
+
+6.2 Quoting Rules for String Values (Encoding)

 A string value MUST be quoted (with escaping as above) if any of the following is true:
 - It is empty ("").
 - It has leading or trailing whitespace.
- It equals true, false, or null (case-sensitive matches of these literals).
+- It equals true, false, or null (case-sensitive).
 - It is numeric-like:
  - Matches /^-?\d+(?:\.\d+)?(?:e[+-]?\d+)?$/i (e.g., "42", "-3.14", "1e-6").
  - Or matches /^0\d+$/ (leading-zero decimals such as "05").
@@ -120,7 +157,7 @@ A string value MUST be quoted (with escaping as above) if any of the following i

 If none of the conditions above apply, the string MAY be emitted without quotes. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted provided they do not violate the conditions.

-6.3 Key Encoding
+6.3 Key Encoding (Encoding)

 Object keys and tabular field names:
 - MAY be unquoted only if they match the pattern: ^[A-Za-z_][\w.]*$.
@@ -128,6 +165,15 @@ Object keys and tabular field names:

 Note: Keys containing spaces, punctuation (e.g., colon, pipe, hyphen), or starting with a digit MUST be quoted.

+6.4 Decoding Rules for Strings and Keys (Decoding)
+
+- Quoted strings and keys MUST be unescaped using only the five escapes in 6.1. Any other escape MUST error. Quoted primitives remain strings.
+- Unquoted values:
+  - true/false/null → boolean/null
+  - Numeric tokens → numbers (with the leading-zero rule from 3A)
+  - Otherwise → strings
+- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error.
+
 ## 7. Array Headers

 General header syntax:
@@ -138,7 +184,7 @@ General header syntax:

 Where:
 - N is the array length (non-negative integer).
- <marker?> is optional “#” when the length marker option is enabled (Section 10).
+- <marker?> is optional "#" when the length marker option is enabled (Section 13).
 - <delim?> is:
  - Absent when the delimiter is comma.
  - Present and equal to the active delimiter when the delimiter is tab or pipe.
@@ -149,6 +195,13 @@ Spacing:
 - When an inline list of values follows a header on the same line (non-empty primitive arrays), there MUST be exactly one space after the colon before the first value.
 - Otherwise, no trailing space follows the colon on the header line.

+Decoding requirements:
+- The bracket segment "[ … ]" MUST parse as a non-negative integer length. If present, a trailing tab or pipe inside the brackets selects the active delimiter for the header; otherwise comma is the active delimiter.
+- An optional "#" MAY precede the length and MUST be ignored semantically.
+- If a brace-enclosed fields segment "{ … }" is present, field names MUST be parsed using the active delimiter, and quoted field names MUST be unescaped per Section 6.1.
+- A colon MUST follow the bracket (and fields) segment; missing colon MUST error.
+- Inline values, if present on the same line, are split using the header’s active delimiter.
+
 ## 8. Primitive Encoding

 - null: literal null.
@@ -158,106 +211,147 @@ Spacing:
  - Non-finite (NaN, ±Infinity): treated as null via normalization (Section 3).
 - string: encoded per Section 6 with delimiter-aware quoting.

+Decoding note:
+- Primitive tokens are interpreted per Section 3A (quoted → string; unquoted → boolean/null/number/string with leading-zero rule).
+
 ## 9. Object Syntax

- Primitive fields: key: value (single space after colon).
- Nested or empty objects: key: on its own line; if non-empty, nested fields appear at one more indentation level.
- Key order: Implementations MUST preserve the encounter order when emitting fields.
- An empty object at the root results in an empty document (no lines).
+- Encoding:
+  - Primitive fields: key: value (single space after colon).
+  - Nested or empty objects: key: on its own line; if non-empty, nested fields appear at one more indentation level.
+  - Key order: Implementations MUST preserve the encounter order when emitting fields.
+  - An empty object at the root results in an empty document (no lines).
+- Decoding:
+  - A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
+  - Lines with "key: value" at the same depth are sibling fields.
+  - Missing colon after a key (quoted or unquoted) MUST error.
+  - Quoted keys MUST be followed immediately by ":"; missing colon MUST error.

 ## 10. Arrays

 10.1 Primitive Arrays (Inline)

- Non-empty arrays: key[N<delim?>]: v1<delim>v2<delim>… where each vi is encoded as a primitive (Section 8) with delimiter-aware quoting (Section 6).
- Empty arrays: key[0<delim?>]: (no values following).
- Root arrays use the same rules without a key: [N<delim?>]: v1<delim>…
+- Encoding:
+  - Non-empty arrays: key[N<delim?>]: v1<delim>v2<delim>… where each vi is encoded as a primitive (Section 8) with delimiter-aware quoting (Section 6).
+  - Empty arrays: key[0<delim?>]: (no values following).
+  - Root arrays use the same rules without a key: [N<delim?>]: v1<delim>…
+- Decoding:
+  - Inline arrays are split using the active delimiter declared by the header; non-active delimiters MUST NOT split values.
+  - In strict mode, the number of decoded values MUST equal N; otherwise error.

 10.2 Arrays of Arrays (Primitives Only) — Expanded List

- Parent header: key[N<delim?>]: on its own line.
- Each inner primitive array is a list item:
+- Encoding:
+  - Parent header: key[N<delim?>]: on its own line.
+  - Each inner primitive array is a list item:
    - - [M<delim?>]: v1<delim>v2<delim>…
    - Empty inner arrays: - [0<delim?>]:
- Root arrays of arrays use [N<delim?>]: as the parent header with the same rules.
+- Decoding:
+  - Items appear at one deeper depth, each starting with "- " and an inner array header "[M<delim?>]: …".
+  - Inner arrays are split using their own active delimiter; in strict mode, counts MUST match M.
+  - In strict mode, the number of list items MUST equal outer N.

 10.3 Arrays of Objects — Tabular Form

-Tabular detection (MUST hold for all rows):
+Tabular detection (encoding; MUST hold for all rows):
 - Every element is an object.
 - All objects have the same set of keys (order per object MAY vary).
 - All values across these keys are primitives (no nested arrays/objects).

-When satisfied:
+When satisfied (encoding):
 - Header: key[N<delim?>]{f1<delim>f2<delim>…}: where the field order is the encounter order of the first object’s keys.
 - Field names encoded as keys (Section 6.3), delimiter-aware.
 - Rows: one line per object at one indentation level under the header, values joined by the active delimiter. Each value encoded as a primitive (Section 8) with delimiter-aware quoting (Section 6).
 - Root tabular arrays omit the key: [N<delim?>]{…}: then rows.

+Decoding:
+- A tabular header declares the active delimiter and the ordered field list.
+- Rows appear at one deeper depth as value lines separated by the active delimiter.
+- Each row’s value count MUST equal the field count in strict mode; otherwise error.
+- The number of rows MUST equal N in strict mode; otherwise error.
+- Disambiguation at row depth:
+  - If a line has no colon → it is a data row.
+  - If a line has both a colon and the active delimiter, compare first occurrences:
+    - Delimiter before colon → row.
+    - Colon before delimiter → key-value line (end of rows).
+  - If a line has a colon but no active delimiter → key-value line (end of rows).
+
 10.4 Mixed / Non-Uniform Arrays — Expanded List

-When tabular requirements are not met:
+When tabular requirements are not met (encoding):
 - Header: key[N<delim?>]:
 - Each element is rendered as a list item at one indentation level under the header:
  - Primitive: - <primitive>
  - Primitive array: - [M<delim?>]: v1<delim>…
-  - Object: formatted using “objects as list items” (Section 11).
+  - Object: formatted using "objects as list items" (Section 11).
  - Complex arrays (e.g., arrays of arrays with mixed shapes): - key'[M<delim?>]: followed by nested items as appropriate.

+Decoding:
+- Header declares the list length N and active delimiter for nested inline arrays.
+- Each list item starts with "- " at one deeper depth and is parsed as:
+  - Primitive (no colon or array header),
+  - Inline primitive array (- [M<delim?>]: …),
+  - First-field-on-hyphen object (- key: … or - key[N…]{…}: …),
+  - Or complex nested arrays (e.g., arrays of arrays) using nested headers.
+- In strict mode, the number of list items MUST equal N; otherwise error.
+
 ## 11. Objects as List Items

 For an object appearing as a list item:

- If the object is empty, render a single “-” at the list item indentation level.
-
- Otherwise, place the first field on the hyphen line using the following rules:
-  - If the first field’s value is a primitive: - key: value
-  - If the first field’s value is a primitive array: - key[M<delim?>]: v1<delim>…
-  - If the first field’s value is an array of objects that qualifies for tabular form:
-    - - key[N<delim?>]{fields}:
-    - Followed by tabular rows at one more indentation level.
-  - If the first field’s value is a non-uniform array of objects:
-    - - key[N<delim?>]:
-    - Followed by list items at one more indentation level (apply these same rules recursively).
-  - If the first field’s value is a complex array (e.g., arrays of arrays, nested mixed arrays):
-    - - key[N<delim?>]:
-    - Followed by nested encodings (e.g., “- [M<delim?>]: …”) at one more indentation level.
-  - If the first field’s value is an object:
-    - - key:
+- Empty object list item: a single "-" at the list item indentation level.
+- First field on the hyphen line:
+  - Primitive: - key: value
+  - Primitive array: - key[M<delim?>]: v1<delim>…
+  - Tabular array: - key[N<delim?>]{fields}:
+    - Followed by tabular rows at one more indentation level (relative to the hyphen line).
+  - Non-uniform array of objects: - key[N<delim?>]:
+    - Followed by list items at one more indentation level.
+  - Object: - key:
    - Nested object fields appear at two more indentation levels (i.e., one deeper than subsequent sibling fields of the same list item).
-
 - Remaining fields of the same object appear at one indentation level under the hyphen line, in encounter order, using normal object field rules.

+Decoding:
+- The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 depth relative to the hyphen line; subsequent fields of the same list item are at +1 depth.
+- If the first field is a tabular header on the hyphen line, its rows are at +1 depth and then subsequent sibling fields continue at +1 depth after the rows.
+
 ## 12. Delimiters

 - Supported delimiters:
  - Comma (default): header omits the delimiter symbol.
  - Tab: header includes the tab character inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs to separate values.
-  - Pipe: header includes “|” inside brackets and braces; rows/inline arrays use “|”.
- Delimiter-aware quoting:
+  - Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
+- Delimiter-aware quoting (encoding):
  - Strings containing the active delimiter MUST be quoted across object values, array values, and tabular rows.
  - Strings containing non-active delimiters (e.g., commas when using tab) do not require quoting unless another quoting condition applies.
- Changing the delimiter does not relax other quoting rules (colon, brackets/braces, leading hyphen, numeric-like, boolean/null-like).
+- Delimiter-aware parsing (decoding):
+  - Inline arrays and tabular rows MUST be split only on the active delimiter declared by the nearest array header.
+  - Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
+  - Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.

 ## 13. Length Marker

- When enabled, the length marker “#” MUST appear immediately before the length in every array header, including nested arrays and tabular headers:
+- When enabled by an encoder, the length marker "#" MUST appear immediately before the length in every array header, including nested arrays and tabular headers:
  - key[#N<delim?>]: …
  - key[#N<delim?>]{…}:
  - - [#M<delim?>]: …
- Semantics: purely informational to emphasize counts; no change to other parsing or formatting rules.
+- Decoding:
+  - The marker MUST be accepted and ignored semantically.
+  - In strict mode, declared lengths MUST match actual counts (rows/items/inline values); mismatches MUST error.

 ## 14. Indentation and Whitespace Invariants

- Indentation:
+- Encoding:
  - The encoder MUST use a consistent number of spaces per level (default 2; configurable).
  - Tabs MUST NOT be used for indentation.
- Spacing:
-  - Exactly one space after “: ” in key: value lines.
+  - Exactly one space after ": " in key: value lines.
  - Exactly one space after array headers when followed by inline values (non-empty primitive arrays).
- End-of-line:
  - No trailing spaces at the end of any line.
  - No trailing newline at the end of the document.
+- Decoding:
+  - Depth is derived from the number of leading spaces and the configured indent size. Implementations SHOULD accept inputs where depth is computed as floor(indentSpaces / indentSize).
+  - Decoders SHOULD be resilient to surrounding whitespace around tokens; internal token semantics follow quoting rules.
+  - Tabs used as indentation are non-conforming; behavior is undefined (validators MAY flag this).

 ## 15. Conformance

@@ -270,37 +364,61 @@ Conformance classes:
    - Tabular detection (either uniformly tabular or not, given the input).
    - Quoting decisions for given values and active delimiter.

+- Decoder:
+  - MUST implement tokenization, escaping, and type interpretation per Sections 3A and 6.4.
+  - MUST parse array headers per Section 7 and apply the declared active delimiter to inline arrays and tabular rows.
+  - MUST implement structures and depth rules per Sections 9–12, including objects-as-list-items placement.
+  - In strict mode (default true), MUST enforce:
+    - Inline primitive array value count equals the declared length.
+    - Tabular row count equals the declared length.
+    - Tabular row value count equals the field count.
+    - Invalid escapes and unterminated strings error.
+    - Missing colon in key-value context errors.
+    - Delimiter mismatches (e.g., rows not split by the active delimiter) provoke errors via count checks.
+
 - Validator:
  - SHOULD verify structural conformance (headers, indentation, list markers).
  - SHOULD verify whitespace invariants.
  - SHOULD verify delimiter consistency between headers and rows.
-
- Parser/Decoder:
-  - Out of scope for v1; MAY be implemented. Implementers SHOULD follow the invariants in this spec for robust parsing (e.g., delimiter discovery from headers, length counts as consistency checks).
+  - SHOULD verify length counts vs. declared [N].

 Options:
- indent (default: 2 spaces)
- delimiter (default: comma; alternatives: tab, pipe)
- lengthMarker (default: disabled)
+- Encoder options:
+  - indent (default: 2 spaces)
+  - delimiter (default: comma; alternatives: tab, pipe)
+  - lengthMarker (default: disabled)
+- Decoder options:
+  - indent (default: 2 spaces)
+  - strict (default: true)

 ## 16. Error Handling and Diagnostics

- Inputs that cannot be represented in the data model (Section 2) are normalized (Section 3) before encoding (e.g., NaN → null).
- Tabular fallback:
+- Encoding normalization:
+  - Inputs that cannot be represented in the data model (Section 2) are normalized (Section 3) before encoding (e.g., NaN → null).
+- Tabular fallback (encoding):
  - If any tabular condition fails (Section 10.3), encoders MUST use expanded list format (Section 10.4).
+- Decoding errors (strict mode):
+  - Array length mismatch (inline arrays and list/tabular forms) MUST error.
+  - Tabular row value count mismatch vs. field count MUST error.
+  - Tabular row count mismatch vs. declared length MUST error.
+  - Invalid escape sequences or unterminated strings MUST error.
+  - Missing colon in key-value context MUST error.
+  - Delimiter mismatch (e.g., rows joined by a different delimiter) MUST error via count checks.
+  - Empty input is invalid and SHOULD error.
 - Validators SHOULD report:
-  - Trailing spaces, trailing newlines.
-  - Headers missing delimiters when non-comma is active.
+  - Trailing spaces, trailing newlines (encoder invariants).
+  - Headers missing delimiter marks when non-comma delimiter is in use.
  - Mismatched row counts vs. declared [N].
  - Values violating delimiter-aware quoting rules.

 ## 17. Security Considerations

 - Injection and ambiguity are mitigated by quoting rules:
-  - Strings with colon, active delimiter, leading hyphen, control characters, brackets/braces MUST be quoted.
+  - Strings with colon, the active delimiter, leading hyphen, control characters, brackets/braces MUST be quoted.
+- Decoders in strict mode reject malformed strings/escapes and structural inconsistencies (length/row counts), helping detect truncation or injected rows.
 - Encoders SHOULD avoid excessive memory use on large inputs; implement streaming/tabular row emission where feasible.
 - Unicode inputs:
-  - Encoders SHOULD avoid altering Unicode content beyond required escaping; validators SHOULD accept all valid Unicode in quoted strings and keys (with escapes as required).
+  - Encoders SHOULD avoid altering Unicode content beyond required escaping; decoders SHOULD accept all valid Unicode in quoted strings and keys (with escapes as required).

 ## 18. Internationalization

@@ -408,7 +526,7 @@ pairs[#2]:

 ## 22. Reference Algorithms (Informative)

-22.1 Tabular Detection
+22.1 Tabular Detection (Encoding)

 Given an array rows:
 - If rows is empty → not tabular (fall back to expanded format).
@@ -420,7 +538,7 @@ Given an array rows:
    - If row[key] is not a primitive → not tabular.
 - Otherwise tabular with header from the first row.

-22.2 Safe-Unquoted String Decision
+22.2 Safe-Unquoted String Decision (Encoding)

 Given a string s and active delimiter d:
 - If s is empty or s !== s.trim() → quote.
@@ -433,16 +551,73 @@ Given a string s and active delimiter d:
 - If s starts with "-" → quote.
 - Else unquoted.

-22.3 Header Formatting
+22.3 Header Formatting (Encoding)

 - Start with optional key (encoded as per key rules).
- Append “[<marker?>N<delim?>]”, where:
-  - <marker?> is “#” if enabled.
+- Append "[<marker?>N<delim?>]", where:
+  - <marker?> is "#" if enabled.
  - <delim?> is absent for comma, or is the delimiter symbol for tab/pipe.
- If tabular, append “{field1<delim>field2}” where field names are key-encoded and joined by the active delimiter.
- Append “:”.
+- If tabular, append "{field1<delim>field2}" where field names are key-encoded and joined by the active delimiter.
+- Append ":".
 - For non-empty primitive arrays on a single line, append a space and the joined values (each primitive-encoded with delimiter-aware quoting), joined by the active delimiter.

+22.4 Decoding Overview
+
+- Split input into lines; compute depth from leading spaces and indent size (default 2). Depth computation MAY be floor(indentSpaces / indentSize).
+- Decide root form:
+  - If first non-empty depth-0 line is a valid root array header: decode a root array.
+  - Else if exactly one line and it is not a key-value line: decode a single primitive.
+  - Else: decode an object.
+- For objects at depth d: process lines at depth d; for arrays at depth d: read rows/list items at depth d+1.
+
+22.5 Array Header Parsing (Decoding)
+
+- Locate the first "[ … ]" segment on the line; parse:
+  - Optional leading "#" marker (ignored semantically).
+  - Length N as decimal integer.
+  - Optional delimiter marker at the end: tab or pipe (comma otherwise).
+- If a "{ … }" fields segment occurs between the "]" and the ":", parse field names using the active delimiter; for each name, if quoted, unescape it (Section 6.1).
+- A colon MUST appear after the bracket/fields segment; otherwise error.
+- Return the header (key, length, delimiter, fields?, hasLengthMarker) and any inline values after the colon.
+
+22.6 parseDelimitedValues (Decoding)
+
+- Iterate characters left-to-right keeping:
+  - current token, inQuotes flag.
+- If encountering a double quote, toggle inQuotes.
+- While inQuotes, treat backslash + next char as a literal pair (to be validated later by the string parser).
+- Only split on the active delimiter when not in quotes.
+- Trim surrounding spaces around each token.
+
+22.7 Primitive Token Parsing (Decoding)
+
+- If token starts with a quote, it MUST be a properly quoted string (no trailing characters after the closing quote). Unescape it using only the five escapes; otherwise error.
+- Else if token is true/false/null → boolean/null.
+- Else if token is numeric without forbidden leading zeros and finite → number.
+- Else → string.
+- Empty tokens decode to empty string.
+
+22.8 Object and List Item Parsing (Decoding)
+
+- Key-value line: parse a (quoted or unquoted) key up to the first colon; missing colon → error. Rest of the line is the primitive value (if present).
+- Nested object: "key:" with nothing after colon opens a nested object. If this is:
+  - A field inside a regular object: nested fields at +1 depth relative to that line.
+  - The first field on a list-item hyphen line: nested fields at +2 depth relative to the hyphen line; subsequent sibling fields at +1 depth.
+- List items:
+  - Lines start with "- " at one deeper depth than the parent array header.
+  - After "- ":
+    - If "[ … ]:" appears → an inline array item; decode with its own header and active delimiter.
+    - Else if a colon appears → object with first field on hyphen line; parse first field and then subsequent fields as above.
+    - Else → primitive token.
+
+22.9 Strict Mode Count Checks (Decoding)
+
+- After decoding:
+  - Inline arrays: item count MUST equal N.
+  - List arrays: number of items MUST equal N.
+  - Tabular arrays: number of rows MUST equal N; each row’s value count MUST equal field count.
+- For tabular arrays, at row depth after N rows, if another same-depth line looks like a row (per disambiguation in 10.3), it MUST error in strict mode.
+
 ## 23. ABNF Sketch (Informative)

 This sketch omits full Unicode and escaping details; it illustrates structure only.
@@ -484,20 +659,22 @@ string         = quoted / safe-unquoted-string
 ```

 Notes:
- Safe-unquoted-string constraints are defined in Section 6.2.
- Actual tokenization relies on the declared header delimiter and quoting rules.
+- Safe-unquoted-string constraints are defined in Section 6.2 (encoding).
+- Quoted strings/keys accept only the five escapes in Section 6.1; others MUST error in decoding.
+- Row/key-value disambiguation at tabular row depth is defined in 10.3.

 ## 24. Test Suite and Compliance (Informative)

 - Implementations are encouraged to validate against a comprehensive test suite covering:
-  - Primitive encoding, quoting, control-character escaping.
-  - Object key encoding and order preservation.
+  - Primitive encoding/decoding, quoting, control-character escaping.
+  - Object key encoding/decoding and order preservation.
  - Primitive arrays (inline), empty arrays.
  - Arrays of arrays (expanded), mixed-length and empty inner arrays.
-  - Tabular detection and encoding, including delimiter variations.
+  - Tabular detection and formatting, including delimiter variations.
  - Mixed arrays and objects-as-list-items behavior, including nested arrays and objects.
  - Whitespace invariants (no trailing spaces/newline).
  - Normalization (BigInt, Date, undefined, NaN/Infinity, functions, symbols).
+  - Decoder strict-mode errors: count mismatches, invalid escapes, missing colon, delimiter mismatches.

 The provided reference tests in the repository mirror these conditions and SHOULD be used to ensure conformance.

@@ -512,21 +689,21 @@ The provided reference tests in the repository mirror these conditions and SHOUL

 - Backward-compatible evolutions SHOULD preserve current headers, quoting rules, and indentation semantics.
 - Reserved/structural characters (colon, brackets, braces, hyphen) MUST retain current meanings.
- Future work (non-normative): decoding/parsing spec, schemas, comments/annotations, additional delimiter profiles.
+- Future work (non-normative): schemas, comments/annotations, additional delimiter profiles.

 ## 27. Acknowledgments and License

- Credits: Author and contributors; ports in other languages (Elixir, PHP, Python, Ruby, Java, .NET, Swift).
+- Credits: Author and contributors; ports in other languages (Elixir, PHP, Python, Ruby, Java, .NET, Swift, Go).
 - License: MIT (see repository for details).

 ---

 Appendix: Cross-check With Reference Behavior (Informative)

- All normative behaviors specified herein are implemented and validated by the reference encoder and its test suite, including:
+- All normative behaviors specified herein are implemented and validated by the reference encoder and decoder test suites, including:
  - Safe-unquoted string rules and delimiter-aware quoting.
-  - Object and tabular header formation using the active delimiter (comma implicit; tab/pipe explicit).
-  - Length marker propagation to nested arrays.
-  - Tabular detection requiring uniform keys and primitive-only values.
-  - Objects-as-list-items formatting (first field on hyphen line, subsequent fields at +1 indent; nested object content at +2).
-  - Whitespace invariants and no trailing newline.
+  - Object and tabular header formation using the active delimiter (comma implicit; tab/pipe explicit), and delimiter-aware parsing.
+  - Length marker propagation (encoding) and acceptance (decoding).
+  - Tabular detection requiring uniform keys and primitive-only values (encoding).
+  - Objects-as-list-items formatting and decoding (first field on hyphen line, nested object content at +2; subsequent fields at +1).
+  - Whitespace invariants for encoding and depth-based parsing for decoding.