diff --git a/README.md b/README.md index 81aa5b4..c3597c1 100644 --- a/README.md +++ b/README.md @@ -437,6 +437,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di Eleven datasets designed to test different structural patterns and validation capabilities: **Primary datasets:** + 1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format. 2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays. 3. **Analytics** (60 days of metrics): Time-series data with dates and numeric values. @@ -445,6 +446,7 @@ Eleven datasets designed to test different structural patterns and validation ca 6. **Nested Config** (1 configuration): Deeply nested configuration with minimal tabular eligibility. **Structural validation datasets:** + 7. **Control**: Valid complete dataset (baseline for validation) 8. **Truncated**: Array with 3 rows removed from end (tests `[N]` length detection) 9. **Extra rows**: Array with 3 additional rows beyond declared length diff --git a/benchmarks/results/retrieval-accuracy.md b/benchmarks/results/retrieval-accuracy.md index 5a9d02d..868103b 100644 --- a/benchmarks/results/retrieval-accuracy.md +++ b/benchmarks/results/retrieval-accuracy.md @@ -278,6 +278,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di Eleven datasets designed to test different structural patterns and validation capabilities: **Primary datasets:** + 1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format. 2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays. 3. **Analytics** (60 days of metrics): Time-series data with dates and numeric values. @@ -286,6 +287,7 @@ Eleven datasets designed to test different structural patterns and validation ca 6. **Nested Config** (1 configuration): Deeply nested configuration with minimal tabular eligibility. **Structural validation datasets:** + 7. **Control**: Valid complete dataset (baseline for validation) 8. **Truncated**: Array with 3 rows removed from end (tests `[N]` length detection) 9. **Extra rows**: Array with 3 additional rows beyond declared length diff --git a/benchmarks/src/report.ts b/benchmarks/src/report.ts index 0adafa6..4803115 100644 --- a/benchmarks/src/report.ts +++ b/benchmarks/src/report.ts @@ -284,6 +284,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di Eleven datasets designed to test different structural patterns and validation capabilities: **Primary datasets:** + 1. **Tabular** (${tabularSize} employee records): Uniform objects with identical fields – optimal for TOON's tabular format. 2. **Nested** (${nestedSize} e-commerce orders): Complex structures with nested customer objects and item arrays. 3. **Analytics** (${analyticsSize} days of metrics): Time-series data with dates and numeric values. @@ -292,6 +293,7 @@ Eleven datasets designed to test different structural patterns and validation ca 6. **Nested Config** (${nestedConfigSize} configuration): Deeply nested configuration with minimal tabular eligibility. **Structural validation datasets:** + 7. **Control**: Valid complete dataset (baseline for validation) 8. **Truncated**: Array with 3 rows removed from end (tests \`[N]\` length detection) 9. **Extra rows**: Array with 3 additional rows beyond declared length diff --git a/docs/guide/benchmarks.md b/docs/guide/benchmarks.md index c6c6460..e0d2a37 100644 --- a/docs/guide/benchmarks.md +++ b/docs/guide/benchmarks.md @@ -294,6 +294,7 @@ This benchmark tests **LLM comprehension and data retrieval accuracy** across di Eleven datasets designed to test different structural patterns and validation capabilities: **Primary datasets:** + 1. **Tabular** (100 employee records): Uniform objects with identical fields – optimal for TOON's tabular format. 2. **Nested** (50 e-commerce orders): Complex structures with nested customer objects and item arrays. 3. **Analytics** (60 days of metrics): Time-series data with dates and numeric values. @@ -302,6 +303,7 @@ Eleven datasets designed to test different structural patterns and validation ca 6. **Nested Config** (1 configuration): Deeply nested configuration with minimal tabular eligibility. **Structural validation datasets:** + 7. **Control**: Valid complete dataset (baseline for validation) 8. **Truncated**: Array with 3 rows removed from end (tests `[N]` length detection) 9. **Extra rows**: Array with 3 additional rows beyond declared length