mirror of
https://github.com/voson-wang/toon.git
synced 2026-01-29 23:34:10 +08:00
docs(benchmarks): improve clarity of efficiency ranking metrics
This commit is contained in:
19
README.md
19
README.md
@@ -195,17 +195,20 @@ Benchmarks test LLM comprehension across different input formats using 209 data
|
|||||||
|
|
||||||
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
||||||
|
|
||||||
Each format's overall performance, balancing accuracy against token cost:
|
Each format ranked by efficiency (accuracy percentage per 1,000 tokens):
|
||||||
|
|
||||||
```
|
```
|
||||||
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
|
TOON ████████████████████ 26.9 acc%/1K tok │ 73.9% acc │ 2,744 tokens
|
||||||
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
JSON compact █████████████████░░░ 22.9 acc%/1K tok │ 70.7% acc │ 3,081 tokens
|
||||||
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
YAML ██████████████░░░░░░ 18.6 acc%/1K tok │ 69.0% acc │ 3,719 tokens
|
||||||
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
JSON ███████████░░░░░░░░░ 15.3 acc%/1K tok │ 69.7% acc │ 4,545 tokens
|
||||||
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
XML ██████████░░░░░░░░░░ 13.0 acc%/1K tok │ 67.1% acc │ 5,167 tokens
|
||||||
```
|
```
|
||||||
|
|
||||||
TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
*Efficiency score = (Accuracy % ÷ Tokens) × 1,000. Higher is better.*
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
||||||
|
|
||||||
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
||||||
|
|
||||||
@@ -247,7 +250,7 @@ grok-4-fast-non-reasoning
|
|||||||
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!TIP] Results Summary
|
> [!TIP]
|
||||||
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|||||||
@@ -33,17 +33,20 @@ Benchmarks test LLM comprehension across different input formats using 209 data
|
|||||||
|
|
||||||
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
||||||
|
|
||||||
Each format's overall performance, balancing accuracy against token cost:
|
Each format ranked by efficiency (accuracy percentage per 1,000 tokens):
|
||||||
|
|
||||||
```
|
```
|
||||||
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
|
TOON ████████████████████ 26.9 acc%/1K tok │ 73.9% acc │ 2,744 tokens
|
||||||
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
JSON compact █████████████████░░░ 22.9 acc%/1K tok │ 70.7% acc │ 3,081 tokens
|
||||||
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
YAML ██████████████░░░░░░ 18.6 acc%/1K tok │ 69.0% acc │ 3,719 tokens
|
||||||
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
JSON ███████████░░░░░░░░░ 15.3 acc%/1K tok │ 69.7% acc │ 4,545 tokens
|
||||||
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
XML ██████████░░░░░░░░░░ 13.0 acc%/1K tok │ 67.1% acc │ 5,167 tokens
|
||||||
```
|
```
|
||||||
|
|
||||||
TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
*Efficiency score = (Accuracy % ÷ Tokens) × 1,000. Higher is better.*
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
||||||
|
|
||||||
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
||||||
|
|
||||||
@@ -85,7 +88,7 @@ grok-4-fast-non-reasoning
|
|||||||
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!TIP] Results Summary
|
> [!TIP]
|
||||||
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|||||||
@@ -179,17 +179,22 @@ function generateEfficiencyRankingReport(
|
|||||||
if (csv) {
|
if (csv) {
|
||||||
// CSV totalCount is evaluations (questions × models), so divide by number of models to get question count
|
// CSV totalCount is evaluations (questions × models), so divide by number of models to get question count
|
||||||
const csvQuestionCount = csv.totalCount / modelCount
|
const csvQuestionCount = csv.totalCount / modelCount
|
||||||
csvNote = `\n\n**Note on CSV:** Excluded from ranking as it only supports ${csvQuestionCount} of ${totalQuestions} questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.`
|
csvNote = `**Note on CSV:** Excluded from ranking as it only supports ${csvQuestionCount} of ${totalQuestions} questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.`
|
||||||
}
|
}
|
||||||
|
|
||||||
return `
|
return `
|
||||||
Each format's overall performance, balancing accuracy against token cost:
|
Each format ranked by efficiency (accuracy percentage per 1,000 tokens):
|
||||||
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
${efficiencyChart}
|
${efficiencyChart}
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
${summary}${csvNote}
|
*Efficiency score = (Accuracy % ÷ Tokens) × 1,000. Higher is better.*
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> ${summary}
|
||||||
|
|
||||||
|
${csvNote}
|
||||||
`.trim()
|
`.trim()
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -396,7 +401,7 @@ function generateSummaryComparison(
|
|||||||
return ''
|
return ''
|
||||||
|
|
||||||
return `
|
return `
|
||||||
> [!TIP] Results Summary
|
> [!TIP]
|
||||||
> TOON achieves **${(toon.accuracy * 100).toFixed(1)}% accuracy** (vs JSON's ${(json.accuracy * 100).toFixed(1)}%) while using **${((1 - toon.totalTokens / json.totalTokens) * 100).toFixed(1)}% fewer tokens** on these datasets.
|
> TOON achieves **${(toon.accuracy * 100).toFixed(1)}% accuracy** (vs JSON's ${(json.accuracy * 100).toFixed(1)}%) while using **${((1 - toon.totalTokens / json.totalTokens) * 100).toFixed(1)}% fewer tokens** on these datasets.
|
||||||
`.trim()
|
`.trim()
|
||||||
}
|
}
|
||||||
@@ -566,7 +571,7 @@ function generateHorizontalEfficiencyChart(
|
|||||||
const accuracy = `${(r.accuracy * 100).toFixed(1)}%`.padStart(5)
|
const accuracy = `${(r.accuracy * 100).toFixed(1)}%`.padStart(5)
|
||||||
const tokens = r.tokens.toLocaleString('en-US').padStart(5)
|
const tokens = r.tokens.toLocaleString('en-US').padStart(5)
|
||||||
|
|
||||||
return `${formatName} ${bar} ${efficiency} │ ${accuracy} acc │ ${tokens} tokens`
|
return `${formatName} ${bar} ${efficiency} acc%/1K tok │ ${accuracy} acc │ ${tokens} tokens`
|
||||||
})
|
})
|
||||||
.join('\n')
|
.join('\n')
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -49,17 +49,20 @@ Benchmarks test LLM comprehension across different input formats using 209 data
|
|||||||
|
|
||||||
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
#### Efficiency Ranking (Accuracy per 1K Tokens)
|
||||||
|
|
||||||
Each format's overall performance, balancing accuracy against token cost:
|
Each format ranked by efficiency (accuracy percentage per 1,000 tokens):
|
||||||
|
|
||||||
```
|
```
|
||||||
TOON ████████████████████ 26.9 │ 73.9% acc │ 2,744 tokens
|
TOON ████████████████████ 26.9 acc%/1K tok │ 73.9% acc │ 2,744 tokens
|
||||||
JSON compact █████████████████░░░ 22.9 │ 70.7% acc │ 3,081 tokens
|
JSON compact █████████████████░░░ 22.9 acc%/1K tok │ 70.7% acc │ 3,081 tokens
|
||||||
YAML ██████████████░░░░░░ 18.6 │ 69.0% acc │ 3,719 tokens
|
YAML ██████████████░░░░░░ 18.6 acc%/1K tok │ 69.0% acc │ 3,719 tokens
|
||||||
JSON ███████████░░░░░░░░░ 15.3 │ 69.7% acc │ 4,545 tokens
|
JSON ███████████░░░░░░░░░ 15.3 acc%/1K tok │ 69.7% acc │ 4,545 tokens
|
||||||
XML ██████████░░░░░░░░░░ 13.0 │ 67.1% acc │ 5,167 tokens
|
XML ██████████░░░░░░░░░░ 13.0 acc%/1K tok │ 67.1% acc │ 5,167 tokens
|
||||||
```
|
```
|
||||||
|
|
||||||
TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
*Efficiency score = (Accuracy % ÷ Tokens) × 1,000. Higher is better.*
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> TOON achieves **73.9%** accuracy (vs JSON's 69.7%) while using **39.6% fewer tokens**.
|
||||||
|
|
||||||
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
**Note on CSV:** Excluded from ranking as it only supports 109 of 209 questions (flat tabular data only). While CSV is highly token-efficient for simple tabular data, it cannot represent nested structures that other formats handle.
|
||||||
|
|
||||||
@@ -101,7 +104,7 @@ grok-4-fast-non-reasoning
|
|||||||
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
CSV ██████████░░░░░░░░░░ 52.3% (57/109)
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!TIP] Results Summary
|
> [!TIP]
|
||||||
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
> TOON achieves **73.9% accuracy** (vs JSON's 69.7%) while using **39.6% fewer tokens** on these datasets.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|||||||
Reference in New Issue
Block a user