diff --git a/README.md b/README.md
index d48d69b..8c3da61 100644
--- a/README.md
+++ b/README.md
@@ -55,13 +55,23 @@ I built TOON to save tokens when sending large datasets to LLMs at work, where I
### Token Efficiency
```
-⭐ GitHub Repositories ██████████████░░░░░░░░░░░ 8,745 tokens (JSON: 15,145) 💰 42.3% saved
-📈 Daily Analytics ██████████░░░░░░░░░░░░░░░ 3,630 tokens (JSON: 9,023) 💰 59.8% saved
-👥 API Response ██████████████░░░░░░░░░░░ 2,597 tokens (JSON: 4,589) 💰 43.4% saved
-🛒 E-Commerce Order ████████████████░░░░░░░░░ 164 tokens (JSON: 256) 💰 35.9% saved
-```
+⭐ GitHub Repositories ██████████████░░░░░░░░░░░ 8,745 tokens
+ vs JSON: 15,145 💰 42.3% saved
+ vs XML: 17,095 💰 48.8% saved
-**Total:** 15,136 tokens (TOON) vs 29,013 tokens (JSON) → 47.8% savings
+📈 Daily Analytics ██████████░░░░░░░░░░░░░░░ 4,507 tokens
+ vs JSON: 10,977 💰 58.9% saved
+ vs XML: 13,128 💰 65.7% saved
+
+🛒 E-Commerce Order ████████████████░░░░░░░░░ 166 tokens
+ vs JSON: 257 💰 35.4% saved
+ vs XML: 271 💰 38.7% saved
+
+─────────────────────────────────────────────────────────────────────
+Total ████████████░░░░░░░░░░░░░ 13,418 tokens
+ vs JSON: 26,379 💰 49.1% saved
+ vs XML: 30,494 💰 56.0% saved
+```
View detailed examples
@@ -70,7 +80,7 @@ I built TOON to save tokens when sending large datasets to LLMs at work, where I
**Configuration:** Top 100 GitHub repositories with stars, forks, and metadata
-**Savings:** 6,400 tokens (42.3% reduction)
+**Savings:** 6,400 tokens (42.3% reduction vs JSON)
**JSON** (15,145 tokens):
@@ -81,7 +91,7 @@ I built TOON to save tokens when sending large datasets to LLMs at work, where I
"id": 28457823,
"name": "freeCodeCamp",
"repo": "freeCodeCamp/freeCodeCamp",
- "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,...",
+ "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…",
"createdAt": "2014-12-24T17:49:19Z",
"updatedAt": "2025-10-27T07:40:58Z",
"pushedAt": "2025-10-26T11:31:08Z",
@@ -124,7 +134,7 @@ I built TOON to save tokens when sending large datasets to LLMs at work, where I
```
repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}:
- 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,...","2014-12-24T17:49:19Z","2025-10-27T07:40:58Z","2025-10-26T11:31:08Z",430828,8582,42136,main
+ 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…","2014-12-24T17:49:19Z","2025-10-27T07:40:58Z","2025-10-26T11:31:08Z",430828,8582,42136,main
132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-27T07:43:25Z","2025-10-10T18:45:01Z",430102,6322,40388,master
21737465,awesome,sindresorhus/awesome,😎 Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-27T07:44:27Z","2025-10-23T17:26:53Z",409760,8016,32015,main
```
@@ -135,61 +145,66 @@ repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watc
**Configuration:** 180 days of web metrics (views, clicks, conversions, revenue)
-**Savings:** 5,393 tokens (59.8% reduction)
+**Savings:** 6,470 tokens (58.9% reduction vs JSON)
-**JSON** (9,023 tokens):
+**JSON** (10,977 tokens):
```json
{
"metrics": [
- {
- "date": "2024-12-31",
- "views": 1953,
- "clicks": 224,
- "conversions": 60,
- "revenue": 409.79
- },
{
"date": "2025-01-01",
- "views": 2981,
- "clicks": 242,
- "conversions": 109,
- "revenue": 467.73
+ "views": 6890,
+ "clicks": 401,
+ "conversions": 23,
+ "revenue": 6015.59,
+ "bounceRate": 0.63
},
{
"date": "2025-01-02",
- "views": 3842,
- "clicks": 100,
- "conversions": 15,
- "revenue": 569.44
+ "views": 6940,
+ "clicks": 323,
+ "conversions": 37,
+ "revenue": 9086.44,
+ "bounceRate": 0.36
},
{
"date": "2025-01-03",
- "views": 4083,
- "clicks": 161,
- "conversions": 73,
- "revenue": 444.75
+ "views": 4390,
+ "clicks": 346,
+ "conversions": 26,
+ "revenue": 6360.75,
+ "bounceRate": 0.48
},
{
"date": "2025-01-04",
- "views": 5382,
- "clicks": 257,
- "conversions": 63,
- "revenue": 457.28
+ "views": 3429,
+ "clicks": 231,
+ "conversions": 13,
+ "revenue": 2360.96,
+ "bounceRate": 0.65
+ },
+ {
+ "date": "2025-01-05",
+ "views": 5804,
+ "clicks": 186,
+ "conversions": 22,
+ "revenue": 2535.96,
+ "bounceRate": 0.37
}
]
}
```
-**TOON** (3,630 tokens):
+**TOON** (4,507 tokens):
```
-metrics[5]{date,views,clicks,conversions,revenue}:
- 2024-12-31,1953,224,60,409.79
- 2025-01-01,2981,242,109,467.73
- 2025-01-02,3842,100,15,569.44
- 2025-01-03,4083,161,73,444.75
- 2025-01-04,5382,257,63,457.28
+metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
+ 2025-01-01,6890,401,23,6015.59,0.63
+ 2025-01-02,6940,323,37,9086.44,0.36
+ 2025-01-03,4390,346,26,6360.75,0.48
+ 2025-01-04,3429,231,13,2360.96,0.65
+ 2025-01-05,5804,186,22,2535.96,0.37
```
@@ -203,25 +218,32 @@ metrics[5]{date,views,clicks,conversions,revenue}:
### Retrieval Accuracy
-Tested across **2 LLMs** with data retrieval tasks:
+Tested across **3 LLMs** with data retrieval tasks:
```
gpt-5-nano
- toon ███████████████████░ 97.5% (155/159)
- markdown-kv ███████████████████░ 95.6% (152/159)
- yaml ███████████████████░ 94.3% (150/159)
- json ███████████████████░ 93.7% (149/159)
- csv ███████████████████░ 93.7% (149/159)
+ toon ████████████████████ 99.4% (158/159)
+ yaml ███████████████████░ 95.0% (151/159)
+ csv ██████████████████░░ 92.5% (147/159)
+ json ██████████████████░░ 92.5% (147/159)
+ xml ██████████████████░░ 91.2% (145/159)
claude-haiku-4-5
- markdown-kv ███████████████░░░░░ 76.7% (122/159)
toon ███████████████░░░░░ 75.5% (120/159)
- json ███████████████░░░░░ 75.5% (120/159)
+ xml ███████████████░░░░░ 75.5% (120/159)
csv ███████████████░░░░░ 75.5% (120/159)
- yaml ███████████████░░░░░ 74.8% (119/159)
+ json ███████████████░░░░░ 75.5% (120/159)
+ yaml ███████████████░░░░░ 74.2% (118/159)
+
+gemini-2.5-flash
+ xml ██████████████████░░ 91.8% (146/159)
+ csv █████████████████░░░ 86.2% (137/159)
+ toon █████████████████░░░ 84.9% (135/159)
+ json ████████████████░░░░ 81.8% (130/159)
+ yaml ████████████████░░░░ 78.6% (125/159)
```
-**Advantage:** TOON achieves **86.5% accuracy** (vs JSON's 84.6%) while using **46.3% fewer tokens**.
+**Advantage:** TOON achieves **86.6% accuracy** (vs JSON's 83.2%) while using **46.3% fewer tokens**.
View detailed breakdown by dataset and model
@@ -232,41 +254,41 @@ claude-haiku-4-5
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `toon` | 86.2% | 2.483 | 100/116 |
-| `csv` | 80.2% | 2.337 | 93/116 |
-| `yaml` | 82.8% | 4.969 | 96/116 |
-| `markdown-kv` | 84.5% | 6.270 | 98/116 |
-| `json` | 84.5% | 6.347 | 98/116 |
+| `toon` | 87.4% | 2.483 | 152/174 |
+| `csv` | 82.8% | 2.337 | 144/174 |
+| `yaml` | 83.9% | 4.969 | 146/174 |
+| `json` | 83.9% | 6.347 | 146/174 |
+| `xml` | 88.5% | 7.314 | 154/174 |
##### E-commerce orders with nested structures
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `toon` | 90.9% | 5.967 | 80/88 |
-| `csv` | 90.9% | 6.735 | 80/88 |
-| `yaml` | 89.8% | 7.328 | 79/88 |
-| `markdown-kv` | 90.9% | 9.110 | 80/88 |
-| `json` | 89.8% | 9.694 | 79/88 |
+| `toon` | 90.9% | 5.967 | 120/132 |
+| `csv` | 93.9% | 6.735 | 124/132 |
+| `yaml` | 87.1% | 7.328 | 115/132 |
+| `json` | 87.9% | 9.694 | 116/132 |
+| `xml` | 93.2% | 10.992 | 123/132 |
##### Time-series analytics data
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `csv` | 87.9% | 1.393 | 51/58 |
-| `toon` | 86.2% | 1.515 | 50/58 |
-| `yaml` | 86.2% | 2.938 | 50/58 |
-| `json` | 87.9% | 3.665 | 51/58 |
-| `markdown-kv` | 86.2% | 3.779 | 50/58 |
+| `csv` | 89.7% | 1.393 | 78/87 |
+| `toon` | 88.5% | 1.515 | 77/87 |
+| `yaml` | 83.9% | 2.938 | 73/87 |
+| `json` | 88.5% | 3.665 | 77/87 |
+| `xml` | 85.1% | 4.376 | 74/87 |
##### Top 100 GitHub repositories
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `csv` | 80.4% | 8.513 | 45/56 |
-| `toon` | 80.4% | 8.745 | 45/56 |
-| `yaml` | 78.6% | 13.129 | 44/56 |
-| `markdown-kv` | 82.1% | 15.436 | 46/56 |
-| `json` | 73.2% | 15.145 | 41/56 |
+| `toon` | 76.2% | 8.745 | 64/84 |
+| `csv` | 69.0% | 8.513 | 58/84 |
+| `yaml` | 71.4% | 13.129 | 60/84 |
+| `json` | 69.0% | 15.145 | 58/84 |
+| `xml` | 71.4% | 17.095 | 60/84 |
#### Performance by Model
@@ -274,28 +296,38 @@ claude-haiku-4-5
| Format | Accuracy | Correct/Total |
| ------ | -------- | ------------- |
-| `toon` | 97.5% | 155/159 |
-| `markdown-kv` | 95.6% | 152/159 |
-| `yaml` | 94.3% | 150/159 |
-| `json` | 93.7% | 149/159 |
-| `csv` | 93.7% | 149/159 |
+| `toon` | 99.4% | 158/159 |
+| `yaml` | 95.0% | 151/159 |
+| `csv` | 92.5% | 147/159 |
+| `json` | 92.5% | 147/159 |
+| `xml` | 91.2% | 145/159 |
##### claude-haiku-4-5
| Format | Accuracy | Correct/Total |
| ------ | -------- | ------------- |
-| `markdown-kv` | 76.7% | 122/159 |
| `toon` | 75.5% | 120/159 |
-| `json` | 75.5% | 120/159 |
+| `xml` | 75.5% | 120/159 |
| `csv` | 75.5% | 120/159 |
-| `yaml` | 74.8% | 119/159 |
+| `json` | 75.5% | 120/159 |
+| `yaml` | 74.2% | 118/159 |
+
+##### gemini-2.5-flash
+
+| Format | Accuracy | Correct/Total |
+| ------ | -------- | ------------- |
+| `xml` | 91.8% | 146/159 |
+| `csv` | 86.2% | 137/159 |
+| `toon` | 84.9% | 135/159 |
+| `json` | 81.8% | 130/159 |
+| `yaml` | 78.6% | 125/159 |
#### Methodology
- **Semantic validation**: LLM-as-judge validates responses semantically (not exact string matching).
- **Token counting**: Using `gpt-tokenizer` with `o200k_base` encoding.
-- **Question types**: Field retrieval, aggregation, and filtering tasks.
-- **Real data**: Faker.js-generated datasets + GitHub repositories.
+- **Question types**: ~160 questions across field retrieval, aggregation, and filtering tasks.
+- **Datasets**: Faker.js-generated datasets (seeded) + GitHub repositories.
diff --git a/benchmarks/results/accuracy/raw-results.json b/benchmarks/results/accuracy/raw-results.json
index f52e84f..45e5806 100644
--- a/benchmarks/results/accuracy/raw-results.json
+++ b/benchmarks/results/accuracy/raw-results.json
@@ -7,8 +7,8 @@
"actual": "56176",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 72,
- "latencyMs": 2221.390167
+ "outputTokens": 136,
+ "latencyMs": 1973.9505419999998
},
{
"questionId": "q1",
@@ -19,7 +19,18 @@
"isCorrect": true,
"inputTokens": 7870,
"outputTokens": 6,
- "latencyMs": 1276.715333
+ "latencyMs": 1337.454
+ },
+ {
+ "questionId": "q1",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "56176",
+ "actual": "56176",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 5,
+ "latencyMs": 2219.8078330000003
},
{
"questionId": "q1",
@@ -30,7 +41,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 72,
- "latencyMs": 3718.250833
+ "latencyMs": 2159.820958
},
{
"questionId": "q1",
@@ -41,7 +52,18 @@
"isCorrect": true,
"inputTokens": 2982,
"outputTokens": 6,
- "latencyMs": 1215.944708
+ "latencyMs": 1456.8202079999999
+ },
+ {
+ "questionId": "q1",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "56176",
+ "actual": "56176",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 5,
+ "latencyMs": 2502.1313750000004
},
{
"questionId": "q1",
@@ -52,7 +74,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 72,
- "latencyMs": 2417.306625
+ "latencyMs": 2189.1171249999998
},
{
"questionId": "q1",
@@ -63,29 +85,51 @@
"isCorrect": true,
"inputTokens": 2856,
"outputTokens": 6,
- "latencyMs": 1152.5258749999998
+ "latencyMs": 1251.8321250000001
},
{
"questionId": "q1",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "56176",
+ "actual": "56176",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 5,
+ "latencyMs": 2795.7488749999998
+ },
+ {
+ "questionId": "q1",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "56176",
"actual": "56176",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 72,
- "latencyMs": 4603.444417
+ "inputTokens": 7357,
+ "outputTokens": 136,
+ "latencyMs": 13798.979167
},
{
"questionId": "q1",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "56176",
"actual": "56176",
"isCorrect": true,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 6,
- "latencyMs": 1390.011125
+ "latencyMs": 1484.293458
+ },
+ {
+ "questionId": "q1",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "56176",
+ "actual": "56176",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 5,
+ "latencyMs": 2323.462083
},
{
"questionId": "q1",
@@ -96,7 +140,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 8,
- "latencyMs": 4339.294459
+ "latencyMs": 2319.068875
},
{
"questionId": "q1",
@@ -107,7 +151,18 @@
"isCorrect": true,
"inputTokens": 5760,
"outputTokens": 6,
- "latencyMs": 1374.47325
+ "latencyMs": 1252.173292
+ },
+ {
+ "questionId": "q1",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "56176",
+ "actual": "56176",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 5,
+ "latencyMs": 1856.926
},
{
"questionId": "q2",
@@ -117,8 +172,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 135,
- "latencyMs": 2550.589042
+ "outputTokens": 71,
+ "latencyMs": 2500.574542
},
{
"questionId": "q2",
@@ -129,7 +184,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 4,
- "latencyMs": 1139.559917
+ "latencyMs": 1249.101917
+ },
+ {
+ "questionId": "q2",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 1,
+ "latencyMs": 1744.0090420000001
},
{
"questionId": "q2",
@@ -139,8 +205,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 135,
- "latencyMs": 2422.8178749999997
+ "outputTokens": 71,
+ "latencyMs": 2319.50975
},
{
"questionId": "q2",
@@ -151,7 +217,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 4,
- "latencyMs": 1135.579459
+ "latencyMs": 1258.086833
+ },
+ {
+ "questionId": "q2",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 1,
+ "latencyMs": 1847.8221249999997
},
{
"questionId": "q2",
@@ -162,7 +239,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 71,
- "latencyMs": 4198.553583999999
+ "latencyMs": 4817.745874999999
},
{
"questionId": "q2",
@@ -173,29 +250,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 4,
- "latencyMs": 1147.9685829999999
+ "latencyMs": 1024.5234999999998
},
{
"questionId": "q2",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 1,
+ "latencyMs": 1336.0151660000001
+ },
+ {
+ "questionId": "q2",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 71,
- "latencyMs": 2594.702667
+ "inputTokens": 7357,
+ "outputTokens": 135,
+ "latencyMs": 4109.140791
},
{
"questionId": "q2",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 4,
- "latencyMs": 1568.4054999999998
+ "latencyMs": 1267.7541249999995
+ },
+ {
+ "questionId": "q2",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 1,
+ "latencyMs": 1808.7597920000007
},
{
"questionId": "q2",
@@ -206,7 +305,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 71,
- "latencyMs": 2516.345875
+ "latencyMs": 4865.839082999999
},
{
"questionId": "q2",
@@ -217,7 +316,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 4,
- "latencyMs": 1633.5375000000001
+ "latencyMs": 1018.2179999999998
+ },
+ {
+ "questionId": "q2",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 1,
+ "latencyMs": 2534.4780839999994
},
{
"questionId": "q3",
@@ -227,8 +337,8 @@
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 76,
- "latencyMs": 2079.8442499999996
+ "outputTokens": 204,
+ "latencyMs": 3778.0985
},
{
"questionId": "q3",
@@ -239,7 +349,18 @@
"isCorrect": true,
"inputTokens": 7874,
"outputTokens": 12,
- "latencyMs": 1201.556458
+ "latencyMs": 1190.655541
+ },
+ {
+ "questionId": "q3",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "lorenza.kunze@yahoo.com",
+ "actual": "lorenza.kunze@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7911,
+ "outputTokens": 10,
+ "latencyMs": 1595.469916
},
{
"questionId": "q3",
@@ -249,8 +370,8 @@
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 140,
- "latencyMs": 2356.408
+ "outputTokens": 76,
+ "latencyMs": 4163.945208000001
},
{
"questionId": "q3",
@@ -261,7 +382,18 @@
"isCorrect": true,
"inputTokens": 2986,
"outputTokens": 12,
- "latencyMs": 1113.255166
+ "latencyMs": 892.92875
+ },
+ {
+ "questionId": "q3",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "lorenza.kunze@yahoo.com",
+ "actual": "lorenza.kunze@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3320,
+ "outputTokens": 10,
+ "latencyMs": 1780.4322919999995
},
{
"questionId": "q3",
@@ -271,8 +403,8 @@
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 140,
- "latencyMs": 2188.5425419999997
+ "outputTokens": 76,
+ "latencyMs": 3440.4715000000006
},
{
"questionId": "q3",
@@ -283,29 +415,51 @@
"isCorrect": true,
"inputTokens": 2860,
"outputTokens": 12,
- "latencyMs": 1029.9496669999999
+ "latencyMs": 1312.3002079999997
},
{
"questionId": "q3",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "lorenza.kunze@yahoo.com",
+ "actual": "lorenza.kunze@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3194,
+ "outputTokens": 10,
+ "latencyMs": 1560.3538330000001
+ },
+ {
+ "questionId": "q3",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "lorenza.kunze@yahoo.com",
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
- "inputTokens": 6318,
- "outputTokens": 140,
- "latencyMs": 2605.8857080000002
+ "inputTokens": 7359,
+ "outputTokens": 76,
+ "latencyMs": 3440.5599999999995
},
{
"questionId": "q3",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "lorenza.kunze@yahoo.com",
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
- "inputTokens": 6369,
+ "inputTokens": 9364,
"outputTokens": 12,
- "latencyMs": 1273.5997920000004
+ "latencyMs": 1354.2122089999993
+ },
+ {
+ "questionId": "q3",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "lorenza.kunze@yahoo.com",
+ "actual": "lorenza.kunze@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9100,
+ "outputTokens": 10,
+ "latencyMs": 1389.2405829999998
},
{
"questionId": "q3",
@@ -315,8 +469,8 @@
"actual": "lorenza.kunze@yahoo.com",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 140,
- "latencyMs": 2530.4294580000005
+ "outputTokens": 76,
+ "latencyMs": 2048.7699159999993
},
{
"questionId": "q3",
@@ -327,7 +481,18 @@
"isCorrect": true,
"inputTokens": 5764,
"outputTokens": 12,
- "latencyMs": 1404.4837089999996
+ "latencyMs": 1123.4172500000004
+ },
+ {
+ "questionId": "q3",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "lorenza.kunze@yahoo.com",
+ "actual": "lorenza.kunze@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5746,
+ "outputTokens": 10,
+ "latencyMs": 1638.1436670000003
},
{
"questionId": "q4",
@@ -338,7 +503,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 72,
- "latencyMs": 2302.062125
+ "latencyMs": 2966.8363329999993
},
{
"questionId": "q4",
@@ -349,7 +514,18 @@
"isCorrect": true,
"inputTokens": 7870,
"outputTokens": 6,
- "latencyMs": 1114.0778329999998
+ "latencyMs": 1323.5372910000006
+ },
+ {
+ "questionId": "q4",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "117381",
+ "actual": "117381",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 6,
+ "latencyMs": 1860.8958750000002
},
{
"questionId": "q4",
@@ -359,8 +535,8 @@
"actual": "117381",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 72,
- "latencyMs": 2006.7020830000001
+ "outputTokens": 136,
+ "latencyMs": 6895.250208000001
},
{
"questionId": "q4",
@@ -371,7 +547,18 @@
"isCorrect": true,
"inputTokens": 2982,
"outputTokens": 6,
- "latencyMs": 1641.5518749999997
+ "latencyMs": 1020.296542
+ },
+ {
+ "questionId": "q4",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "117381",
+ "actual": "117381",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 6,
+ "latencyMs": 2481.260875
},
{
"questionId": "q4",
@@ -381,8 +568,8 @@
"actual": "117381",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 136,
- "latencyMs": 2850.351709
+ "outputTokens": 200,
+ "latencyMs": 2689.2119999999995
},
{
"questionId": "q4",
@@ -393,29 +580,51 @@
"isCorrect": true,
"inputTokens": 2856,
"outputTokens": 6,
- "latencyMs": 1367.7319589999997
+ "latencyMs": 1194.3670409999995
},
{
"questionId": "q4",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "117381",
+ "actual": "117381",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 6,
+ "latencyMs": 1743.3429579999993
+ },
+ {
+ "questionId": "q4",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "117381",
"actual": "117381",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 72,
- "latencyMs": 2477.8365839999997
+ "latencyMs": 5788.955082999999
},
{
"questionId": "q4",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "117381",
"actual": "117381",
"isCorrect": true,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 6,
- "latencyMs": 1309.567083
+ "latencyMs": 1222.5617920000004
+ },
+ {
+ "questionId": "q4",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "117381",
+ "actual": "117381",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 6,
+ "latencyMs": 1692.9171670000014
},
{
"questionId": "q4",
@@ -426,7 +635,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 72,
- "latencyMs": 1794.2651250000008
+ "latencyMs": 6426.231709
},
{
"questionId": "q4",
@@ -437,7 +646,18 @@
"isCorrect": true,
"inputTokens": 5760,
"outputTokens": 6,
- "latencyMs": 1177.5377079999998
+ "latencyMs": 1159.4893339999999
+ },
+ {
+ "questionId": "q4",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "117381",
+ "actual": "117381",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 6,
+ "latencyMs": 2415.9878329999992
},
{
"questionId": "q5",
@@ -448,7 +668,7 @@
"isCorrect": true,
"inputTokens": 6389,
"outputTokens": 71,
- "latencyMs": 1963.9477500000003
+ "latencyMs": 2950.774625
},
{
"questionId": "q5",
@@ -459,7 +679,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 4,
- "latencyMs": 1024.5166669999999
+ "latencyMs": 1003.6548750000002
+ },
+ {
+ "questionId": "q5",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7907,
+ "outputTokens": 1,
+ "latencyMs": 1209.7468329999992
},
{
"questionId": "q5",
@@ -469,8 +700,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2526,
- "outputTokens": 135,
- "latencyMs": 2291.4288749999996
+ "outputTokens": 71,
+ "latencyMs": 3026.993291999999
},
{
"questionId": "q5",
@@ -481,7 +712,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 4,
- "latencyMs": 1312.7111250000007
+ "latencyMs": 981.8320000000003
+ },
+ {
+ "questionId": "q5",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3316,
+ "outputTokens": 1,
+ "latencyMs": 2011.3852089999982
},
{
"questionId": "q5",
@@ -492,7 +734,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 135,
- "latencyMs": 1727.6371660000004
+ "latencyMs": 4215.294709
},
{
"questionId": "q5",
@@ -503,29 +745,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 4,
- "latencyMs": 1097.0443749999995
+ "latencyMs": 906.2993340000012
},
{
"questionId": "q5",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3190,
+ "outputTokens": 1,
+ "latencyMs": 1666.1483749999989
+ },
+ {
+ "questionId": "q5",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6315,
+ "inputTokens": 7356,
"outputTokens": 135,
- "latencyMs": 2671.2276250000004
+ "latencyMs": 4311.166333000001
},
{
"questionId": "q5",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 4,
- "latencyMs": 1174.8639999999996
+ "latencyMs": 1072.923917
+ },
+ {
+ "questionId": "q5",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9096,
+ "outputTokens": 1,
+ "latencyMs": 2526.938041999998
},
{
"questionId": "q5",
@@ -535,8 +799,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 5011,
- "outputTokens": 71,
- "latencyMs": 2306.2642499999993
+ "outputTokens": 135,
+ "latencyMs": 3970.2666659999995
},
{
"questionId": "q5",
@@ -547,7 +811,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 4,
- "latencyMs": 2822.8963750000003
+ "latencyMs": 1364.8737079999992
+ },
+ {
+ "questionId": "q5",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5742,
+ "outputTokens": 1,
+ "latencyMs": 3125.6591660000013
},
{
"questionId": "q6",
@@ -558,7 +833,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 139,
- "latencyMs": 2827.0400409999993
+ "latencyMs": 3116.8453340000015
},
{
"questionId": "q6",
@@ -569,7 +844,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 11,
- "latencyMs": 1151.7215829999996
+ "latencyMs": 1065.8984999999993
+ },
+ {
+ "questionId": "q6",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "jayda60@hotmail.com",
+ "actual": "jayda60@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 8,
+ "latencyMs": 2190.0096250000024
},
{
"questionId": "q6",
@@ -580,7 +866,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 75,
- "latencyMs": 1714.2902919999997
+ "latencyMs": 2661.1630829999995
},
{
"questionId": "q6",
@@ -591,7 +877,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 11,
- "latencyMs": 1810.6344170000011
+ "latencyMs": 990.5193749999999
+ },
+ {
+ "questionId": "q6",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "jayda60@hotmail.com",
+ "actual": "jayda60@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 8,
+ "latencyMs": 1937.4020420000015
},
{
"questionId": "q6",
@@ -601,8 +898,8 @@
"actual": "jayda60@hotmail.com",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 75,
- "latencyMs": 2548.0390000000007
+ "outputTokens": 139,
+ "latencyMs": 3740.6538750000036
},
{
"questionId": "q6",
@@ -613,29 +910,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 11,
- "latencyMs": 1046.7650829999993
+ "latencyMs": 1033.1626250000008
},
{
"questionId": "q6",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "jayda60@hotmail.com",
+ "actual": "jayda60@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 8,
+ "latencyMs": 1733.0828340000007
+ },
+ {
+ "questionId": "q6",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "jayda60@hotmail.com",
"actual": "jayda60@hotmail.com",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 139,
- "latencyMs": 2408.879916000001
+ "latencyMs": 3042.367707999998
},
{
"questionId": "q6",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "jayda60@hotmail.com",
"actual": "jayda60@hotmail.com",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 11,
- "latencyMs": 1186.5773750000008
+ "latencyMs": 1472.3534580000014
+ },
+ {
+ "questionId": "q6",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "jayda60@hotmail.com",
+ "actual": "jayda60@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 8,
+ "latencyMs": 1953.7035419999993
},
{
"questionId": "q6",
@@ -645,8 +964,8 @@
"actual": "jayda60@hotmail.com",
"isCorrect": true,
"inputTokens": 5012,
- "outputTokens": 139,
- "latencyMs": 3157.9398329999995
+ "outputTokens": 75,
+ "latencyMs": 2179.8505829999995
},
{
"questionId": "q6",
@@ -657,7 +976,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 11,
- "latencyMs": 1129.6754170000004
+ "latencyMs": 1714.971625000002
+ },
+ {
+ "questionId": "q6",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "jayda60@hotmail.com",
+ "actual": "jayda60@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 8,
+ "latencyMs": 2170.373334
},
{
"questionId": "q7",
@@ -668,7 +998,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 72,
- "latencyMs": 2893.3476250000003
+ "latencyMs": 3005.6769590000004
},
{
"questionId": "q7",
@@ -679,7 +1009,18 @@
"isCorrect": true,
"inputTokens": 7870,
"outputTokens": 6,
- "latencyMs": 1288.7682919999988
+ "latencyMs": 2070.191666999999
+ },
+ {
+ "questionId": "q7",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "92971",
+ "actual": "92971",
+ "isCorrect": true,
+ "inputTokens": 7907,
+ "outputTokens": 5,
+ "latencyMs": 1338.8482500000027
},
{
"questionId": "q7",
@@ -689,8 +1030,8 @@
"actual": "92971",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 72,
- "latencyMs": 2324.6738330000007
+ "outputTokens": 136,
+ "latencyMs": 2615.7999579999996
},
{
"questionId": "q7",
@@ -701,7 +1042,18 @@
"isCorrect": true,
"inputTokens": 2982,
"outputTokens": 6,
- "latencyMs": 1095.704291
+ "latencyMs": 1124.058917000002
+ },
+ {
+ "questionId": "q7",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "92971",
+ "actual": "92971",
+ "isCorrect": true,
+ "inputTokens": 3316,
+ "outputTokens": 5,
+ "latencyMs": 2317.5837079999983
},
{
"questionId": "q7",
@@ -711,8 +1063,8 @@
"actual": "92971",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 136,
- "latencyMs": 3980.3727500000005
+ "outputTokens": 72,
+ "latencyMs": 9505.310291999998
},
{
"questionId": "q7",
@@ -723,29 +1075,51 @@
"isCorrect": true,
"inputTokens": 2856,
"outputTokens": 6,
- "latencyMs": 1122.8730419999993
+ "latencyMs": 895.9319159999977
},
{
"questionId": "q7",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "92971",
+ "actual": "92971",
+ "isCorrect": true,
+ "inputTokens": 3190,
+ "outputTokens": 5,
+ "latencyMs": 1462.6939160000002
+ },
+ {
+ "questionId": "q7",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "92971",
"actual": "92971",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 72,
- "latencyMs": 2030.0818330000002
+ "inputTokens": 7357,
+ "outputTokens": 136,
+ "latencyMs": 2529.6767499999987
},
{
"questionId": "q7",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "92971",
"actual": "92971",
"isCorrect": true,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 6,
- "latencyMs": 1705.6364999999987
+ "latencyMs": 1144.4980419999993
+ },
+ {
+ "questionId": "q7",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "92971",
+ "actual": "92971",
+ "isCorrect": true,
+ "inputTokens": 9096,
+ "outputTokens": 5,
+ "latencyMs": 3182.1694160000006
},
{
"questionId": "q7",
@@ -756,7 +1130,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 72,
- "latencyMs": 1611.3567500000008
+ "latencyMs": 2789.477584
},
{
"questionId": "q7",
@@ -767,7 +1141,18 @@
"isCorrect": true,
"inputTokens": 5760,
"outputTokens": 6,
- "latencyMs": 1109.0094590000008
+ "latencyMs": 1023.4829170000012
+ },
+ {
+ "questionId": "q7",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "92971",
+ "actual": "92971",
+ "isCorrect": true,
+ "inputTokens": 5742,
+ "outputTokens": 5,
+ "latencyMs": 3741.309666000001
},
{
"questionId": "q8",
@@ -778,7 +1163,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 199,
- "latencyMs": 3099.078125
+ "latencyMs": 2646.0443330000016
},
{
"questionId": "q8",
@@ -789,7 +1174,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 4,
- "latencyMs": 1115.9911250000005
+ "latencyMs": 1147.7947499999973
+ },
+ {
+ "questionId": "q8",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 1,
+ "latencyMs": 2658.0985
},
{
"questionId": "q8",
@@ -799,8 +1195,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 135,
- "latencyMs": 2833.193875000001
+ "outputTokens": 71,
+ "latencyMs": 3748.428749999999
},
{
"questionId": "q8",
@@ -811,7 +1207,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 4,
- "latencyMs": 933.1444169999995
+ "latencyMs": 876.6897919999974
+ },
+ {
+ "questionId": "q8",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 1,
+ "latencyMs": 3812.920249999999
},
{
"questionId": "q8",
@@ -821,8 +1228,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 199,
- "latencyMs": 2315.536
+ "outputTokens": 71,
+ "latencyMs": 6820.9698750000025
},
{
"questionId": "q8",
@@ -833,29 +1240,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 4,
- "latencyMs": 1300.336792
+ "latencyMs": 997.5997500000012
},
{
"questionId": "q8",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 1,
+ "latencyMs": 1829.7533750000002
+ },
+ {
+ "questionId": "q8",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 135,
- "latencyMs": 7016.997917000002
+ "latencyMs": 6256.235125000003
},
{
"questionId": "q8",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 4,
- "latencyMs": 1288.107333
+ "latencyMs": 1280.0348330000015
+ },
+ {
+ "questionId": "q8",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 1,
+ "latencyMs": 3024.0259170000027
},
{
"questionId": "q8",
@@ -865,8 +1294,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 5012,
- "outputTokens": 135,
- "latencyMs": 2474.8247499999998
+ "outputTokens": 71,
+ "latencyMs": 3522.8339579999993
},
{
"questionId": "q8",
@@ -877,7 +1306,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 4,
- "latencyMs": 1027.9775420000005
+ "latencyMs": 1134.9532080000026
+ },
+ {
+ "questionId": "q8",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 1,
+ "latencyMs": 3095.1540000000023
},
{
"questionId": "q9",
@@ -887,8 +1327,8 @@
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 652,
- "latencyMs": 8322.172416
+ "outputTokens": 140,
+ "latencyMs": 2087.950582999998
},
{
"questionId": "q9",
@@ -899,7 +1339,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 11,
- "latencyMs": 1066.3422090000004
+ "latencyMs": 1115.425166000001
+ },
+ {
+ "questionId": "q9",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "terrance.hansen@yahoo.com",
+ "actual": "terrance.hansen@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7910,
+ "outputTokens": 9,
+ "latencyMs": 1841.3965420000022
},
{
"questionId": "q9",
@@ -909,8 +1360,8 @@
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 76,
- "latencyMs": 2245.5604999999996
+ "outputTokens": 204,
+ "latencyMs": 4039.2035830000023
},
{
"questionId": "q9",
@@ -921,7 +1372,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 11,
- "latencyMs": 1179.7512079999997
+ "latencyMs": 1254.9832079999978
+ },
+ {
+ "questionId": "q9",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "terrance.hansen@yahoo.com",
+ "actual": "terrance.hansen@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3319,
+ "outputTokens": 9,
+ "latencyMs": 2190.8811249999962
},
{
"questionId": "q9",
@@ -931,8 +1393,8 @@
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 204,
- "latencyMs": 2584.0723340000004
+ "outputTokens": 140,
+ "latencyMs": 3403.9012079999957
},
{
"questionId": "q9",
@@ -943,29 +1405,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 11,
- "latencyMs": 1204.6979589999992
+ "latencyMs": 1323.0636660000018
},
{
"questionId": "q9",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "terrance.hansen@yahoo.com",
+ "actual": "terrance.hansen@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3193,
+ "outputTokens": 9,
+ "latencyMs": 1047.0718749999942
+ },
+ {
+ "questionId": "q9",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "terrance.hansen@yahoo.com",
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
- "inputTokens": 6318,
- "outputTokens": 396,
- "latencyMs": 3824.918375000001
+ "inputTokens": 7359,
+ "outputTokens": 140,
+ "latencyMs": 3498.7119999999995
},
{
"questionId": "q9",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "terrance.hansen@yahoo.com",
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 11,
- "latencyMs": 1492.6765830000004
+ "latencyMs": 1830.5542919999934
+ },
+ {
+ "questionId": "q9",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "terrance.hansen@yahoo.com",
+ "actual": "terrance.hansen@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9099,
+ "outputTokens": 9,
+ "latencyMs": 2052.039208999995
},
{
"questionId": "q9",
@@ -975,8 +1459,8 @@
"actual": "terrance.hansen@yahoo.com",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 76,
- "latencyMs": 1834.562
+ "outputTokens": 140,
+ "latencyMs": 2254.0641659999965
},
{
"questionId": "q9",
@@ -987,7 +1471,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 11,
- "latencyMs": 1245.0000419999997
+ "latencyMs": 1279.8175830000037
+ },
+ {
+ "questionId": "q9",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "terrance.hansen@yahoo.com",
+ "actual": "terrance.hansen@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5745,
+ "outputTokens": 9,
+ "latencyMs": 2624.0571249999994
},
{
"questionId": "q10",
@@ -997,8 +1492,8 @@
"actual": "107744",
"isCorrect": true,
"inputTokens": 6391,
- "outputTokens": 136,
- "latencyMs": 2337.0652499999997
+ "outputTokens": 72,
+ "latencyMs": 3316.716124999999
},
{
"questionId": "q10",
@@ -1009,7 +1504,18 @@
"isCorrect": true,
"inputTokens": 7870,
"outputTokens": 6,
- "latencyMs": 1148.1971250000006
+ "latencyMs": 1078.8857919999937
+ },
+ {
+ "questionId": "q10",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "107744",
+ "actual": "107744",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 6,
+ "latencyMs": 1426.163416000003
},
{
"questionId": "q10",
@@ -1019,8 +1525,8 @@
"actual": "107744",
"isCorrect": true,
"inputTokens": 2528,
- "outputTokens": 72,
- "latencyMs": 2736.2375420000008
+ "outputTokens": 136,
+ "latencyMs": 3091.0714579999985
},
{
"questionId": "q10",
@@ -1031,7 +1537,18 @@
"isCorrect": true,
"inputTokens": 2982,
"outputTokens": 6,
- "latencyMs": 1164.4291250000006
+ "latencyMs": 1171.1557079999984
+ },
+ {
+ "questionId": "q10",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "107744",
+ "actual": "107744",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 6,
+ "latencyMs": 2722.0316250000033
},
{
"questionId": "q10",
@@ -1042,7 +1559,7 @@
"isCorrect": true,
"inputTokens": 2382,
"outputTokens": 72,
- "latencyMs": 2479.8535840000004
+ "latencyMs": 3280.0853329999954
},
{
"questionId": "q10",
@@ -1053,29 +1570,51 @@
"isCorrect": true,
"inputTokens": 2856,
"outputTokens": 6,
- "latencyMs": 1032.3198329999996
+ "latencyMs": 937.3515409999964
},
{
"questionId": "q10",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "107744",
+ "actual": "107744",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 6,
+ "latencyMs": 1638.423999999999
+ },
+ {
+ "questionId": "q10",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "107744",
"actual": "107744",
"isCorrect": true,
- "inputTokens": 6317,
+ "inputTokens": 7358,
"outputTokens": 136,
- "latencyMs": 2237.465583000001
+ "latencyMs": 15425.220833
},
{
"questionId": "q10",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "107744",
"actual": "107744",
"isCorrect": true,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 6,
- "latencyMs": 1254.3189160000002
+ "latencyMs": 1195.8543749999953
+ },
+ {
+ "questionId": "q10",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "107744",
+ "actual": "107744",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 6,
+ "latencyMs": 2432.2206250000017
},
{
"questionId": "q10",
@@ -1086,7 +1625,7 @@
"isCorrect": true,
"inputTokens": 5013,
"outputTokens": 72,
- "latencyMs": 3753.917125
+ "latencyMs": 2047.1201250000013
},
{
"questionId": "q10",
@@ -1097,7 +1636,18 @@
"isCorrect": true,
"inputTokens": 5760,
"outputTokens": 6,
- "latencyMs": 1154.7003750000003
+ "latencyMs": 1617.048625000003
+ },
+ {
+ "questionId": "q10",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "107744",
+ "actual": "107744",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 6,
+ "latencyMs": 1548.9360000000015
},
{
"questionId": "q11",
@@ -1107,8 +1657,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 135,
- "latencyMs": 2621.2275420000005
+ "outputTokens": 71,
+ "latencyMs": 3741.5673339999994
},
{
"questionId": "q11",
@@ -1119,7 +1669,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 4,
- "latencyMs": 1222.843499999999
+ "latencyMs": 1189.5477079999982
+ },
+ {
+ "questionId": "q11",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 1,
+ "latencyMs": 1194.6662920000017
},
{
"questionId": "q11",
@@ -1129,8 +1690,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 71,
- "latencyMs": 1762.1339159999989
+ "outputTokens": 135,
+ "latencyMs": 2947.4346250000017
},
{
"questionId": "q11",
@@ -1141,7 +1702,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 4,
- "latencyMs": 1630.7307079999991
+ "latencyMs": 944.1087090000001
+ },
+ {
+ "questionId": "q11",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 1,
+ "latencyMs": 2017.044041999994
},
{
"questionId": "q11",
@@ -1152,7 +1724,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 71,
- "latencyMs": 1848.9775829999999
+ "latencyMs": 4068.897624999998
},
{
"questionId": "q11",
@@ -1163,29 +1735,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 4,
- "latencyMs": 1080.8682500000014
+ "latencyMs": 1092.8982499999984
},
{
"questionId": "q11",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 1,
+ "latencyMs": 2148.519874999998
+ },
+ {
+ "questionId": "q11",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 135,
- "latencyMs": 26303.357959
+ "latencyMs": 3025.696167000002
},
{
"questionId": "q11",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 4,
- "latencyMs": 1354.007999999998
+ "latencyMs": 1069.479542000001
+ },
+ {
+ "questionId": "q11",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 1,
+ "latencyMs": 2595.035582999997
},
{
"questionId": "q11",
@@ -1196,7 +1790,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 71,
- "latencyMs": 1924.4625829999986
+ "latencyMs": 2200.230208000001
},
{
"questionId": "q11",
@@ -1207,7 +1801,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 4,
- "latencyMs": 1279.5235830000001
+ "latencyMs": 1226.070749999999
+ },
+ {
+ "questionId": "q11",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 1,
+ "latencyMs": 2045.9056249999994
},
{
"questionId": "q12",
@@ -1217,8 +1822,8 @@
"actual": "allan21@gmail.com",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 330,
- "latencyMs": 3997.3972079999985
+ "outputTokens": 266,
+ "latencyMs": 5672.897708000004
},
{
"questionId": "q12",
@@ -1229,7 +1834,18 @@
"isCorrect": true,
"inputTokens": 7867,
"outputTokens": 9,
- "latencyMs": 1153.9412079999984
+ "latencyMs": 1745.323000000004
+ },
+ {
+ "questionId": "q12",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "allan21@gmail.com",
+ "actual": "allan21@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 8,
+ "latencyMs": 1877.5404999999955
},
{
"questionId": "q12",
@@ -1239,8 +1855,8 @@
"actual": "allan21@gmail.com",
"isCorrect": true,
"inputTokens": 2526,
- "outputTokens": 138,
- "latencyMs": 2494.580582999999
+ "outputTokens": 74,
+ "latencyMs": 5317.909041999999
},
{
"questionId": "q12",
@@ -1251,7 +1867,18 @@
"isCorrect": true,
"inputTokens": 2979,
"outputTokens": 9,
- "latencyMs": 1350.1353750000017
+ "latencyMs": 916.7109169999967
+ },
+ {
+ "questionId": "q12",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "allan21@gmail.com",
+ "actual": "allan21@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 8,
+ "latencyMs": 2401.305290999997
},
{
"questionId": "q12",
@@ -1261,8 +1888,8 @@
"actual": "allan21@gmail.com",
"isCorrect": true,
"inputTokens": 2380,
- "outputTokens": 138,
- "latencyMs": 3024.4009160000023
+ "outputTokens": 74,
+ "latencyMs": 3016.4596669999955
},
{
"questionId": "q12",
@@ -1273,29 +1900,51 @@
"isCorrect": true,
"inputTokens": 2853,
"outputTokens": 9,
- "latencyMs": 1199.3955830000014
+ "latencyMs": 1233.9625830000004
},
{
"questionId": "q12",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "allan21@gmail.com",
+ "actual": "allan21@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 8,
+ "latencyMs": 2000.6465000000026
+ },
+ {
+ "questionId": "q12",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "allan21@gmail.com",
"actual": "allan21@gmail.com",
"isCorrect": true,
- "inputTokens": 6315,
+ "inputTokens": 7356,
"outputTokens": 138,
- "latencyMs": 5168.116582999999
+ "latencyMs": 6270.167416999997
},
{
"questionId": "q12",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "allan21@gmail.com",
"actual": "allan21@gmail.com",
"isCorrect": true,
- "inputTokens": 6362,
+ "inputTokens": 9357,
"outputTokens": 9,
- "latencyMs": 1198.3554160000022
+ "latencyMs": 2332.7022089999955
+ },
+ {
+ "questionId": "q12",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "allan21@gmail.com",
+ "actual": "allan21@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 8,
+ "latencyMs": 1986.9040000000023
},
{
"questionId": "q12",
@@ -1306,7 +1955,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 74,
- "latencyMs": 2632.998958999997
+ "latencyMs": 3294.769625000001
},
{
"questionId": "q12",
@@ -1317,7 +1966,18 @@
"isCorrect": true,
"inputTokens": 5757,
"outputTokens": 9,
- "latencyMs": 1124.5625419999997
+ "latencyMs": 1028.5119580000028
+ },
+ {
+ "questionId": "q12",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "allan21@gmail.com",
+ "actual": "allan21@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 8,
+ "latencyMs": 1788.622083000002
},
{
"questionId": "q13",
@@ -1328,7 +1988,7 @@
"isCorrect": true,
"inputTokens": 6388,
"outputTokens": 72,
- "latencyMs": 2357.2276249999995
+ "latencyMs": 2426.662333
},
{
"questionId": "q13",
@@ -1339,7 +1999,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 6,
- "latencyMs": 1267.960791999998
+ "latencyMs": 1199.7499580000003
+ },
+ {
+ "questionId": "q13",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "145843",
+ "actual": "145843",
+ "isCorrect": true,
+ "inputTokens": 7907,
+ "outputTokens": 6,
+ "latencyMs": 2230.200499999999
},
{
"questionId": "q13",
@@ -1349,8 +2020,8 @@
"actual": "145843",
"isCorrect": true,
"inputTokens": 2525,
- "outputTokens": 136,
- "latencyMs": 2397.798125000001
+ "outputTokens": 72,
+ "latencyMs": 2973.9408330000006
},
{
"questionId": "q13",
@@ -1361,7 +2032,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 6,
- "latencyMs": 1170.6429580000004
+ "latencyMs": 1759.8231249999953
+ },
+ {
+ "questionId": "q13",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "145843",
+ "actual": "145843",
+ "isCorrect": true,
+ "inputTokens": 3316,
+ "outputTokens": 6,
+ "latencyMs": 3236.040165999999
},
{
"questionId": "q13",
@@ -1371,8 +2053,8 @@
"actual": "145843",
"isCorrect": true,
"inputTokens": 2379,
- "outputTokens": 136,
- "latencyMs": 3227.198124999999
+ "outputTokens": 72,
+ "latencyMs": 2829.9307920000065
},
{
"questionId": "q13",
@@ -1383,29 +2065,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 6,
- "latencyMs": 1112.6066250000003
+ "latencyMs": 905.942667000003
},
{
"questionId": "q13",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "145843",
+ "actual": "145843",
+ "isCorrect": true,
+ "inputTokens": 3190,
+ "outputTokens": 6,
+ "latencyMs": 1492.0838749999966
+ },
+ {
+ "questionId": "q13",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "145843",
"actual": "145843",
"isCorrect": true,
- "inputTokens": 6314,
- "outputTokens": 72,
- "latencyMs": 2036.251791999999
+ "inputTokens": 7355,
+ "outputTokens": 136,
+ "latencyMs": 3018.9516250000015
},
{
"questionId": "q13",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "145843",
"actual": "145843",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 6,
- "latencyMs": 1290.7641250000015
+ "latencyMs": 1010.1432910000003
+ },
+ {
+ "questionId": "q13",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "145843",
+ "actual": "145843",
+ "isCorrect": true,
+ "inputTokens": 9096,
+ "outputTokens": 6,
+ "latencyMs": 2475.971083000004
},
{
"questionId": "q13",
@@ -1416,7 +2120,7 @@
"isCorrect": true,
"inputTokens": 5010,
"outputTokens": 72,
- "latencyMs": 2262.8405840000014
+ "latencyMs": 2322.1169999999984
},
{
"questionId": "q13",
@@ -1427,7 +2131,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 6,
- "latencyMs": 1193.2695419999982
+ "latencyMs": 993.6942500000005
+ },
+ {
+ "questionId": "q13",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "145843",
+ "actual": "145843",
+ "isCorrect": true,
+ "inputTokens": 5742,
+ "outputTokens": 6,
+ "latencyMs": 2137.871124999998
},
{
"questionId": "q14",
@@ -1438,7 +2153,7 @@
"isCorrect": true,
"inputTokens": 6389,
"outputTokens": 71,
- "latencyMs": 3198.2654159999984
+ "latencyMs": 2223.1494999999995
},
{
"questionId": "q14",
@@ -1449,7 +2164,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 4,
- "latencyMs": 1229.8644999999997
+ "latencyMs": 1101.960708999999
+ },
+ {
+ "questionId": "q14",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 1,
+ "latencyMs": 1264.4358330000032
},
{
"questionId": "q14",
@@ -1460,7 +2186,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 71,
- "latencyMs": 3293.710084000002
+ "latencyMs": 3117.289082999996
},
{
"questionId": "q14",
@@ -1471,7 +2197,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 4,
- "latencyMs": 1121.200334000001
+ "latencyMs": 975.8156250000029
+ },
+ {
+ "questionId": "q14",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 1,
+ "latencyMs": 2076.140041999999
},
{
"questionId": "q14",
@@ -1482,7 +2219,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 71,
- "latencyMs": 2497.4451249999984
+ "latencyMs": 3522.6094999999987
},
{
"questionId": "q14",
@@ -1493,29 +2230,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 4,
- "latencyMs": 1152.0107500000013
+ "latencyMs": 749.1067079999993
},
{
"questionId": "q14",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 1,
+ "latencyMs": 2162.154208
+ },
+ {
+ "questionId": "q14",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 71,
- "latencyMs": 3547.6399999999994
+ "inputTokens": 7356,
+ "outputTokens": 135,
+ "latencyMs": 15105.717249999994
},
{
"questionId": "q14",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 4,
- "latencyMs": 2007.6731249999975
+ "latencyMs": 1518.0794160000005
+ },
+ {
+ "questionId": "q14",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 1,
+ "latencyMs": 2634.745458999998
},
{
"questionId": "q14",
@@ -1526,7 +2285,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 71,
- "latencyMs": 7054.295208
+ "latencyMs": 2809.990375000001
},
{
"questionId": "q14",
@@ -1537,7 +2296,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 4,
- "latencyMs": 1230.5032920000012
+ "latencyMs": 2328.9382079999996
+ },
+ {
+ "questionId": "q14",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 1,
+ "latencyMs": 2122.7864169999957
},
{
"questionId": "q15",
@@ -1547,8 +2317,8 @@
"actual": "alexandria61@gmail.com",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 76,
- "latencyMs": 2049.933416
+ "outputTokens": 140,
+ "latencyMs": 2744.6706660000054
},
{
"questionId": "q15",
@@ -1559,7 +2329,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 9,
- "latencyMs": 1217.1906249999993
+ "latencyMs": 1389.9784999999974
+ },
+ {
+ "questionId": "q15",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "alexandria61@gmail.com",
+ "actual": "alexandria61@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 8,
+ "latencyMs": 1310.762625000003
},
{
"questionId": "q15",
@@ -1570,7 +2351,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 204,
- "latencyMs": 2844.136208
+ "latencyMs": 5402.840416999999
},
{
"questionId": "q15",
@@ -1581,7 +2362,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 9,
- "latencyMs": 2166.8829589999987
+ "latencyMs": 1480.7467909999978
+ },
+ {
+ "questionId": "q15",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "alexandria61@gmail.com",
+ "actual": "alexandria61@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 8,
+ "latencyMs": 1741.1184169999979
},
{
"questionId": "q15",
@@ -1591,8 +2383,8 @@
"actual": "alexandria61@gmail.com",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 204,
- "latencyMs": 2726.5934579999994
+ "outputTokens": 140,
+ "latencyMs": 2192.0577909999993
},
{
"questionId": "q15",
@@ -1603,29 +2395,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 9,
- "latencyMs": 1107.4675410000018
+ "latencyMs": 1052.5672919999997
},
{
"questionId": "q15",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "alexandria61@gmail.com",
+ "actual": "alexandria61@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 8,
+ "latencyMs": 2969.6880840000013
+ },
+ {
+ "questionId": "q15",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "alexandria61@gmail.com",
"actual": "alexandria61@gmail.com",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 76,
- "latencyMs": 2260.4548749999994
+ "inputTokens": 7357,
+ "outputTokens": 140,
+ "latencyMs": 4902.5039590000015
},
{
"questionId": "q15",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "alexandria61@gmail.com",
"actual": "alexandria61@gmail.com",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 9,
- "latencyMs": 1257.2797080000018
+ "latencyMs": 1337.9500409999964
+ },
+ {
+ "questionId": "q15",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "alexandria61@gmail.com",
+ "actual": "alexandria61@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 8,
+ "latencyMs": 988.1449579999971
},
{
"questionId": "q15",
@@ -1636,7 +2450,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 140,
- "latencyMs": 2565.571791999999
+ "latencyMs": 5435.804457999999
},
{
"questionId": "q15",
@@ -1647,7 +2461,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 9,
- "latencyMs": 1255.2880829999995
+ "latencyMs": 1164.0297080000018
+ },
+ {
+ "questionId": "q15",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "alexandria61@gmail.com",
+ "actual": "alexandria61@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 8,
+ "latencyMs": 1684.5642079999961
},
{
"questionId": "q16",
@@ -1657,8 +2482,8 @@
"actual": "89436",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 136,
- "latencyMs": 2595.422042000002
+ "outputTokens": 72,
+ "latencyMs": 2137.3070000000007
},
{
"questionId": "q16",
@@ -1669,7 +2494,18 @@
"isCorrect": true,
"inputTokens": 7870,
"outputTokens": 6,
- "latencyMs": 1090.4299170000013
+ "latencyMs": 1353.1784169999955
+ },
+ {
+ "questionId": "q16",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "89436",
+ "actual": "89436",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 5,
+ "latencyMs": 2152.076667000001
},
{
"questionId": "q16",
@@ -1680,7 +2516,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 72,
- "latencyMs": 2985.3881250000013
+ "latencyMs": 9838.444999999992
},
{
"questionId": "q16",
@@ -1691,7 +2527,18 @@
"isCorrect": true,
"inputTokens": 2982,
"outputTokens": 6,
- "latencyMs": 1521.227415999998
+ "latencyMs": 1011.8612080000021
+ },
+ {
+ "questionId": "q16",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "89436",
+ "actual": "89436",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 5,
+ "latencyMs": 2380.466207999998
},
{
"questionId": "q16",
@@ -1702,7 +2549,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 72,
- "latencyMs": 2918.142082999999
+ "latencyMs": 2358.7515829999975
},
{
"questionId": "q16",
@@ -1713,29 +2560,51 @@
"isCorrect": true,
"inputTokens": 2856,
"outputTokens": 6,
- "latencyMs": 1049.085916
+ "latencyMs": 1073.5187089999963
},
{
"questionId": "q16",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "89436",
+ "actual": "89436",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 5,
+ "latencyMs": 1808.9837499999994
+ },
+ {
+ "questionId": "q16",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "89436",
"actual": "89436",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 136,
- "latencyMs": 2414.9711669999997
+ "inputTokens": 7356,
+ "outputTokens": 200,
+ "latencyMs": 3657.137167000008
},
{
"questionId": "q16",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "89436",
"actual": "89436",
"isCorrect": true,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 6,
- "latencyMs": 1178.0064170000005
+ "latencyMs": 1216.3329169999997
+ },
+ {
+ "questionId": "q16",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "89436",
+ "actual": "89436",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 5,
+ "latencyMs": 2347.6749580000032
},
{
"questionId": "q16",
@@ -1745,8 +2614,8 @@
"actual": "89436",
"isCorrect": true,
"inputTokens": 5011,
- "outputTokens": 72,
- "latencyMs": 1772.788625000001
+ "outputTokens": 136,
+ "latencyMs": 2985.761999999995
},
{
"questionId": "q16",
@@ -1757,7 +2626,18 @@
"isCorrect": true,
"inputTokens": 5760,
"outputTokens": 6,
- "latencyMs": 1134.7022499999985
+ "latencyMs": 1062.5013749999998
+ },
+ {
+ "questionId": "q16",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "89436",
+ "actual": "89436",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 5,
+ "latencyMs": 2942.199041999993
},
{
"questionId": "q17",
@@ -1767,8 +2647,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 135,
- "latencyMs": 2528.6098330000023
+ "outputTokens": 71,
+ "latencyMs": 2072.9703750000044
},
{
"questionId": "q17",
@@ -1779,7 +2659,18 @@
"isCorrect": true,
"inputTokens": 7872,
"outputTokens": 4,
- "latencyMs": 1353.3026250000003
+ "latencyMs": 1143.0027499999997
+ },
+ {
+ "questionId": "q17",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7911,
+ "outputTokens": 1,
+ "latencyMs": 2339.718792000007
},
{
"questionId": "q17",
@@ -1789,8 +2680,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 71,
- "latencyMs": 2286.120999999999
+ "outputTokens": 135,
+ "latencyMs": 2721.8648749999993
},
{
"questionId": "q17",
@@ -1801,7 +2692,18 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 4,
- "latencyMs": 961.078292000002
+ "latencyMs": 1106.3964160000032
+ },
+ {
+ "questionId": "q17",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3320,
+ "outputTokens": 1,
+ "latencyMs": 2453.6342910000094
},
{
"questionId": "q17",
@@ -1811,8 +2713,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 71,
- "latencyMs": 3445.204249999999
+ "outputTokens": 135,
+ "latencyMs": 2526.1070829999953
},
{
"questionId": "q17",
@@ -1823,29 +2725,51 @@
"isCorrect": true,
"inputTokens": 2858,
"outputTokens": 4,
- "latencyMs": 1003.445125000002
+ "latencyMs": 963.8103339999943
},
{
"questionId": "q17",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3194,
+ "outputTokens": 1,
+ "latencyMs": 1213.7454580000049
+ },
+ {
+ "questionId": "q17",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6318,
- "outputTokens": 135,
- "latencyMs": 2696.166874999999
+ "inputTokens": 7359,
+ "outputTokens": 199,
+ "latencyMs": 3451.3691249999974
},
{
"questionId": "q17",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 4,
- "latencyMs": 1063.340791999999
+ "latencyMs": 1054.2650409999915
+ },
+ {
+ "questionId": "q17",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9100,
+ "outputTokens": 1,
+ "latencyMs": 1712.7362089999951
},
{
"questionId": "q17",
@@ -1855,8 +2779,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 135,
- "latencyMs": 3367.6109579999975
+ "outputTokens": 199,
+ "latencyMs": 4517.758332999991
},
{
"questionId": "q17",
@@ -1867,7 +2791,18 @@
"isCorrect": true,
"inputTokens": 5762,
"outputTokens": 4,
- "latencyMs": 1322.4013339999983
+ "latencyMs": 1036.0673749999987
+ },
+ {
+ "questionId": "q17",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5746,
+ "outputTokens": 1,
+ "latencyMs": 2099.134084000005
},
{
"questionId": "q18",
@@ -1878,7 +2813,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 139,
- "latencyMs": 2745.6627499999995
+ "latencyMs": 3450.1222080000007
},
{
"questionId": "q18",
@@ -1889,7 +2824,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 10,
- "latencyMs": 1312.9286670000001
+ "latencyMs": 2320.022790999996
+ },
+ {
+ "questionId": "q18",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "kelvin54@yahoo.com",
+ "actual": "kelvin54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 8,
+ "latencyMs": 1058.7114589999983
},
{
"questionId": "q18",
@@ -1899,8 +2845,8 @@
"actual": "kelvin54@yahoo.com",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 1483,
- "latencyMs": 13678.859999999997
+ "outputTokens": 75,
+ "latencyMs": 3345.744040999998
},
{
"questionId": "q18",
@@ -1911,7 +2857,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 10,
- "latencyMs": 1030.3843339999985
+ "latencyMs": 1209.7132500000007
+ },
+ {
+ "questionId": "q18",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "kelvin54@yahoo.com",
+ "actual": "kelvin54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 8,
+ "latencyMs": 1716.227457999994
},
{
"questionId": "q18",
@@ -1922,7 +2879,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 139,
- "latencyMs": 2223.2737909999996
+ "latencyMs": 3093.9495000000024
},
{
"questionId": "q18",
@@ -1933,29 +2890,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 10,
- "latencyMs": 1224.2647080000024
+ "latencyMs": 1311.3692500000034
},
{
"questionId": "q18",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "kelvin54@yahoo.com",
+ "actual": "kelvin54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 8,
+ "latencyMs": 794.0660829999979
+ },
+ {
+ "questionId": "q18",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "kelvin54@yahoo.com",
"actual": "kelvin54@yahoo.com",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 139,
- "latencyMs": 3198.8672499999993
+ "inputTokens": 7357,
+ "outputTokens": 459,
+ "latencyMs": 5397.067582999996
},
{
"questionId": "q18",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "kelvin54@yahoo.com",
"actual": "kelvin54@yahoo.com",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 10,
- "latencyMs": 1234.557084
+ "latencyMs": 1179.005124999996
+ },
+ {
+ "questionId": "q18",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "kelvin54@yahoo.com",
+ "actual": "kelvin54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 8,
+ "latencyMs": 3390.3811669999996
},
{
"questionId": "q18",
@@ -1965,8 +2944,8 @@
"actual": "kelvin54@yahoo.com",
"isCorrect": true,
"inputTokens": 5012,
- "outputTokens": 139,
- "latencyMs": 2861.692708999999
+ "outputTokens": 75,
+ "latencyMs": 3942.734500000006
},
{
"questionId": "q18",
@@ -1977,7 +2956,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 10,
- "latencyMs": 1284.2591250000005
+ "latencyMs": 1198.2199580000015
+ },
+ {
+ "questionId": "q18",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "kelvin54@yahoo.com",
+ "actual": "kelvin54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 8,
+ "latencyMs": 1988.9680829999998
},
{
"questionId": "q19",
@@ -1987,8 +2977,8 @@
"actual": "143365",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 136,
- "latencyMs": 2741.803499999998
+ "outputTokens": 200,
+ "latencyMs": 2964.017540999994
},
{
"questionId": "q19",
@@ -1999,7 +2989,18 @@
"isCorrect": true,
"inputTokens": 7872,
"outputTokens": 6,
- "latencyMs": 1096.6906249999993
+ "latencyMs": 1171.257249999995
+ },
+ {
+ "questionId": "q19",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "143365",
+ "actual": "143365",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 6,
+ "latencyMs": 1304.4575840000034
},
{
"questionId": "q19",
@@ -2009,8 +3010,8 @@
"actual": "143365",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 136,
- "latencyMs": 3692.904416999998
+ "outputTokens": 72,
+ "latencyMs": 3056.008249999999
},
{
"questionId": "q19",
@@ -2021,7 +3022,18 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 6,
- "latencyMs": 1516.7794159999976
+ "latencyMs": 873.7801659999968
+ },
+ {
+ "questionId": "q19",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "143365",
+ "actual": "143365",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 6,
+ "latencyMs": 1536.4943750000093
},
{
"questionId": "q19",
@@ -2031,8 +3043,8 @@
"actual": "143365",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 392,
- "latencyMs": 5068.4152909999975
+ "outputTokens": 328,
+ "latencyMs": 3966.832792000001
},
{
"questionId": "q19",
@@ -2043,29 +3055,51 @@
"isCorrect": true,
"inputTokens": 2858,
"outputTokens": 6,
- "latencyMs": 1356.2728330000027
+ "latencyMs": 1072.791458000007
},
{
"questionId": "q19",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "143365",
+ "actual": "143365",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 6,
+ "latencyMs": 1334.2349169999943
+ },
+ {
+ "questionId": "q19",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "143365",
"actual": "143365",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 136,
- "latencyMs": 2866.8642500000024
+ "latencyMs": 2824.245167000001
},
{
"questionId": "q19",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "143365",
"actual": "143365",
"isCorrect": true,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 6,
- "latencyMs": 1462.041624999998
+ "latencyMs": 1156.3476669999945
+ },
+ {
+ "questionId": "q19",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "143365",
+ "actual": "143365",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 6,
+ "latencyMs": 2503.603999999992
},
{
"questionId": "q19",
@@ -2076,7 +3110,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 72,
- "latencyMs": 2320.320083999999
+ "latencyMs": 1988.6155419999996
},
{
"questionId": "q19",
@@ -2087,7 +3121,18 @@
"isCorrect": true,
"inputTokens": 5762,
"outputTokens": 6,
- "latencyMs": 1082.976666999999
+ "latencyMs": 2019.264417000013
+ },
+ {
+ "questionId": "q19",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "143365",
+ "actual": "143365",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 6,
+ "latencyMs": 2120.657042000006
},
{
"questionId": "q20",
@@ -2097,8 +3142,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 7,
- "latencyMs": 2427.6330409999973
+ "outputTokens": 71,
+ "latencyMs": 2674.240417000008
},
{
"questionId": "q20",
@@ -2109,7 +3154,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 4,
- "latencyMs": 1108.7309170000008
+ "latencyMs": 985.5821250000008
+ },
+ {
+ "questionId": "q20",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 1,
+ "latencyMs": 1005.9853749999893
},
{
"questionId": "q20",
@@ -2120,7 +3176,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 71,
- "latencyMs": 4405.948458000003
+ "latencyMs": 2337.429165999987
},
{
"questionId": "q20",
@@ -2131,7 +3187,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 4,
- "latencyMs": 1235.6647919999996
+ "latencyMs": 1671.3083750000078
+ },
+ {
+ "questionId": "q20",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 1,
+ "latencyMs": 1858.936124999993
},
{
"questionId": "q20",
@@ -2142,7 +3209,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 71,
- "latencyMs": 2528.553082999999
+ "latencyMs": 1797.8257500000036
},
{
"questionId": "q20",
@@ -2153,29 +3220,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 4,
- "latencyMs": 974.1328329999997
+ "latencyMs": 1014.9593339999992
},
{
"questionId": "q20",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 1,
+ "latencyMs": 1534.200667000012
+ },
+ {
+ "questionId": "q20",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6315,
+ "inputTokens": 7356,
"outputTokens": 135,
- "latencyMs": 2243.1775420000013
+ "latencyMs": 3340.923125000001
},
{
"questionId": "q20",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 4,
- "latencyMs": 2416.867124999997
+ "latencyMs": 1555.2516250000044
+ },
+ {
+ "questionId": "q20",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 1,
+ "latencyMs": 2945.7507919999916
},
{
"questionId": "q20",
@@ -2185,8 +3274,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 5011,
- "outputTokens": 135,
- "latencyMs": 2429.5548750000016
+ "outputTokens": 71,
+ "latencyMs": 3605.196708999996
},
{
"questionId": "q20",
@@ -2197,7 +3286,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 4,
- "latencyMs": 1257.326083
+ "latencyMs": 1068.8147920000047
+ },
+ {
+ "questionId": "q20",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 1,
+ "latencyMs": 2330.3333749999874
},
{
"questionId": "q21",
@@ -2207,8 +3307,8 @@
"actual": "dean19@gmail.com",
"isCorrect": true,
"inputTokens": 6393,
- "outputTokens": 203,
- "latencyMs": 4366.677041999996
+ "outputTokens": 75,
+ "latencyMs": 2723.754000000001
},
{
"questionId": "q21",
@@ -2219,7 +3319,18 @@
"isCorrect": true,
"inputTokens": 7876,
"outputTokens": 9,
- "latencyMs": 1410.3295419999995
+ "latencyMs": 1170.7758329999924
+ },
+ {
+ "questionId": "q21",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "dean19@gmail.com",
+ "actual": "dean19@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 7912,
+ "outputTokens": 7,
+ "latencyMs": 2132.3265829999873
},
{
"questionId": "q21",
@@ -2229,8 +3340,8 @@
"actual": "dean19@gmail.com",
"isCorrect": true,
"inputTokens": 2530,
- "outputTokens": 75,
- "latencyMs": 2834.2883330000004
+ "outputTokens": 139,
+ "latencyMs": 3074.613540999999
},
{
"questionId": "q21",
@@ -2241,7 +3352,18 @@
"isCorrect": true,
"inputTokens": 2988,
"outputTokens": 9,
- "latencyMs": 1023.437750000001
+ "latencyMs": 887.1294170000037
+ },
+ {
+ "questionId": "q21",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "dean19@gmail.com",
+ "actual": "dean19@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3321,
+ "outputTokens": 7,
+ "latencyMs": 1689.1039579999924
},
{
"questionId": "q21",
@@ -2251,8 +3373,8 @@
"actual": "dean19@gmail.com",
"isCorrect": true,
"inputTokens": 2384,
- "outputTokens": 139,
- "latencyMs": 3091.7722909999975
+ "outputTokens": 75,
+ "latencyMs": 2337.622915999993
},
{
"questionId": "q21",
@@ -2263,29 +3385,51 @@
"isCorrect": true,
"inputTokens": 2862,
"outputTokens": 9,
- "latencyMs": 1910.5562920000011
+ "latencyMs": 951.0157920000056
},
{
"questionId": "q21",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "dean19@gmail.com",
+ "actual": "dean19@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3195,
+ "outputTokens": 7,
+ "latencyMs": 2195.647125000003
+ },
+ {
+ "questionId": "q21",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "dean19@gmail.com",
"actual": "dean19@gmail.com",
"isCorrect": true,
- "inputTokens": 6319,
+ "inputTokens": 7360,
"outputTokens": 75,
- "latencyMs": 2335.239207999999
+ "latencyMs": 2328.1204169999983
},
{
"questionId": "q21",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "dean19@gmail.com",
"actual": "dean19@gmail.com",
"isCorrect": true,
- "inputTokens": 6371,
+ "inputTokens": 9366,
"outputTokens": 9,
- "latencyMs": 1145.7144169999992
+ "latencyMs": 1225.2067499999976
+ },
+ {
+ "questionId": "q21",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "dean19@gmail.com",
+ "actual": "dean19@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 9101,
+ "outputTokens": 7,
+ "latencyMs": 1613.4727500000008
},
{
"questionId": "q21",
@@ -2296,7 +3440,7 @@
"isCorrect": true,
"inputTokens": 5015,
"outputTokens": 75,
- "latencyMs": 2204.0944169999966
+ "latencyMs": 2482.4477909999987
},
{
"questionId": "q21",
@@ -2307,7 +3451,18 @@
"isCorrect": true,
"inputTokens": 5766,
"outputTokens": 9,
- "latencyMs": 1102.2122499999969
+ "latencyMs": 1235.0746250000084
+ },
+ {
+ "questionId": "q21",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "dean19@gmail.com",
+ "actual": "dean19@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 5747,
+ "outputTokens": 7,
+ "latencyMs": 4278.624791999988
},
{
"questionId": "q22",
@@ -2317,8 +3472,8 @@
"actual": "111314",
"isCorrect": true,
"inputTokens": 6391,
- "outputTokens": 200,
- "latencyMs": 3785.0480830000015
+ "outputTokens": 136,
+ "latencyMs": 2741.065750000009
},
{
"questionId": "q22",
@@ -2329,7 +3484,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 6,
- "latencyMs": 1147.6056669999962
+ "latencyMs": 1172.1854580000072
+ },
+ {
+ "questionId": "q22",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "111314",
+ "actual": "111314",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 6,
+ "latencyMs": 1184.0355000000127
},
{
"questionId": "q22",
@@ -2339,8 +3505,8 @@
"actual": "111314",
"isCorrect": true,
"inputTokens": 2528,
- "outputTokens": 72,
- "latencyMs": 3996.1190410000054
+ "outputTokens": 136,
+ "latencyMs": 6348.677542000005
},
{
"questionId": "q22",
@@ -2351,7 +3517,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 6,
- "latencyMs": 1101.5621670000037
+ "latencyMs": 964.3882920000033
+ },
+ {
+ "questionId": "q22",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "111314",
+ "actual": "111314",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 6,
+ "latencyMs": 1484.964082999999
},
{
"questionId": "q22",
@@ -2361,8 +3538,8 @@
"actual": "111314",
"isCorrect": true,
"inputTokens": 2382,
- "outputTokens": 136,
- "latencyMs": 2563.2732499999984
+ "outputTokens": 72,
+ "latencyMs": 23689.366624999995
},
{
"questionId": "q22",
@@ -2373,29 +3550,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 6,
- "latencyMs": 1224.5424589999966
+ "latencyMs": 1258.0295830000105
},
{
"questionId": "q22",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "111314",
+ "actual": "111314",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 6,
+ "latencyMs": 18510.087583
+ },
+ {
+ "questionId": "q22",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "111314",
"actual": "111314",
"isCorrect": true,
- "inputTokens": 6317,
+ "inputTokens": 7358,
"outputTokens": 136,
- "latencyMs": 2436.8848329999964
+ "latencyMs": 2856.495458000005
},
{
"questionId": "q22",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "111314",
"actual": "111314",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 6,
- "latencyMs": 1500.1066250000003
+ "latencyMs": 1031.8081669999956
+ },
+ {
+ "questionId": "q22",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "111314",
+ "actual": "111314",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 6,
+ "latencyMs": 2408.5496249999997
},
{
"questionId": "q22",
@@ -2406,7 +3605,7 @@
"isCorrect": true,
"inputTokens": 5013,
"outputTokens": 72,
- "latencyMs": 2529.925833000001
+ "latencyMs": 2405.9946670000063
},
{
"questionId": "q22",
@@ -2417,7 +3616,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 6,
- "latencyMs": 1701.0276660000018
+ "latencyMs": 1855.128291999994
+ },
+ {
+ "questionId": "q22",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "111314",
+ "actual": "111314",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 6,
+ "latencyMs": 14026.715166000009
},
{
"questionId": "q23",
@@ -2427,8 +3637,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6388,
- "outputTokens": 135,
- "latencyMs": 3078.5496249999997
+ "outputTokens": 71,
+ "latencyMs": 2613.9667920000065
},
{
"questionId": "q23",
@@ -2439,7 +3649,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 4,
- "latencyMs": 1224.1848329999993
+ "latencyMs": 914.9832499999902
+ },
+ {
+ "questionId": "q23",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7907,
+ "outputTokens": 1,
+ "latencyMs": 17605.488457999993
},
{
"questionId": "q23",
@@ -2449,8 +3670,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2525,
- "outputTokens": 71,
- "latencyMs": 2287.0156669999997
+ "outputTokens": 455,
+ "latencyMs": 5491.203125
},
{
"questionId": "q23",
@@ -2461,7 +3682,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 4,
- "latencyMs": 1209.1454999999987
+ "latencyMs": 1559.9341249999998
+ },
+ {
+ "questionId": "q23",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3316,
+ "outputTokens": 1,
+ "latencyMs": 12204.927791999988
},
{
"questionId": "q23",
@@ -2472,7 +3704,7 @@
"isCorrect": true,
"inputTokens": 2379,
"outputTokens": 71,
- "latencyMs": 2059.012499999997
+ "latencyMs": 4993.148166999992
},
{
"questionId": "q23",
@@ -2483,29 +3715,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 4,
- "latencyMs": 1393.596375000001
+ "latencyMs": 1479.5367499999993
},
{
"questionId": "q23",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3190,
+ "outputTokens": 1,
+ "latencyMs": 2016.5271659999999
+ },
+ {
+ "questionId": "q23",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6314,
- "outputTokens": 71,
- "latencyMs": 1858.8989159999983
+ "inputTokens": 7355,
+ "outputTokens": 135,
+ "latencyMs": 3785.880541999999
},
{
"questionId": "q23",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 4,
- "latencyMs": 1193.9375419999997
+ "latencyMs": 1170.9521249999962
+ },
+ {
+ "questionId": "q23",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9096,
+ "outputTokens": 1,
+ "latencyMs": 2376.3025000000052
},
{
"questionId": "q23",
@@ -2515,8 +3769,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 5010,
- "outputTokens": 135,
- "latencyMs": 2755.0157499999987
+ "outputTokens": 71,
+ "latencyMs": 12974.991708999994
},
{
"questionId": "q23",
@@ -2527,7 +3781,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 4,
- "latencyMs": 1366.030666999999
+ "latencyMs": 1062.6410830000095
+ },
+ {
+ "questionId": "q23",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5742,
+ "outputTokens": 1,
+ "latencyMs": 2375.1459170000016
},
{
"questionId": "q24",
@@ -2537,8 +3802,8 @@
"actual": "laurel54@yahoo.com",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 395,
- "latencyMs": 4352.137999999999
+ "outputTokens": 331,
+ "latencyMs": 7831.431874999995
},
{
"questionId": "q24",
@@ -2549,7 +3814,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 10,
- "latencyMs": 1093.9707500000004
+ "latencyMs": 1169.4948749999894
+ },
+ {
+ "questionId": "q24",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "laurel54@yahoo.com",
+ "actual": "laurel54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 8,
+ "latencyMs": 6873.670041000005
},
{
"questionId": "q24",
@@ -2560,7 +3836,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 139,
- "latencyMs": 2481.934500000003
+ "latencyMs": 2733.310750000004
},
{
"questionId": "q24",
@@ -2571,7 +3847,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 10,
- "latencyMs": 1262.3894579999978
+ "latencyMs": 1465.5957500000077
+ },
+ {
+ "questionId": "q24",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "laurel54@yahoo.com",
+ "actual": "laurel54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 8,
+ "latencyMs": 12162.723041999998
},
{
"questionId": "q24",
@@ -2581,8 +3868,8 @@
"actual": "laurel54@yahoo.com",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 75,
- "latencyMs": 2360.7159170000014
+ "outputTokens": 203,
+ "latencyMs": 2401.237958999991
},
{
"questionId": "q24",
@@ -2593,29 +3880,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 10,
- "latencyMs": 1462.5894999999946
+ "latencyMs": 976.5733749999927
},
{
"questionId": "q24",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "laurel54@yahoo.com",
+ "actual": "laurel54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 8,
+ "latencyMs": 1773.305250000005
+ },
+ {
+ "questionId": "q24",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "laurel54@yahoo.com",
"actual": "laurel54@yahoo.com",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 75,
- "latencyMs": 3247.478041000002
+ "inputTokens": 7357,
+ "outputTokens": 395,
+ "latencyMs": 6293.676041999992
},
{
"questionId": "q24",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "laurel54@yahoo.com",
"actual": "laurel54@yahoo.com",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 10,
- "latencyMs": 1693.1597089999996
+ "latencyMs": 1263.188875000007
+ },
+ {
+ "questionId": "q24",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "laurel54@yahoo.com",
+ "actual": "laurel54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 8,
+ "latencyMs": 1866.224624999988
},
{
"questionId": "q24",
@@ -2626,7 +3935,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 75,
- "latencyMs": 1726.2765839999993
+ "latencyMs": 1734.0090409999975
},
{
"questionId": "q24",
@@ -2637,7 +3946,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 10,
- "latencyMs": 1605.044458000004
+ "latencyMs": 1076.4865419999987
+ },
+ {
+ "questionId": "q24",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "laurel54@yahoo.com",
+ "actual": "laurel54@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 8,
+ "latencyMs": 1799.7341250000027
},
{
"questionId": "q25",
@@ -2648,7 +3968,7 @@
"isCorrect": true,
"inputTokens": 6391,
"outputTokens": 136,
- "latencyMs": 2263.1207090000025
+ "latencyMs": 4268.888999999996
},
{
"questionId": "q25",
@@ -2659,7 +3979,18 @@
"isCorrect": true,
"inputTokens": 7873,
"outputTokens": 6,
- "latencyMs": 3789.016875000001
+ "latencyMs": 1100.426707999999
+ },
+ {
+ "questionId": "q25",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "89553",
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 7910,
+ "outputTokens": 5,
+ "latencyMs": 905.148000000001
},
{
"questionId": "q25",
@@ -2670,7 +4001,7 @@
"isCorrect": true,
"inputTokens": 2528,
"outputTokens": 72,
- "latencyMs": 1829.9641669999983
+ "latencyMs": 3470.1760000000068
},
{
"questionId": "q25",
@@ -2681,7 +4012,18 @@
"isCorrect": true,
"inputTokens": 2985,
"outputTokens": 6,
- "latencyMs": 989.6153750000012
+ "latencyMs": 1239.0414170000004
+ },
+ {
+ "questionId": "q25",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "89553",
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 3319,
+ "outputTokens": 5,
+ "latencyMs": 3012.1026249999995
},
{
"questionId": "q25",
@@ -2692,7 +4034,7 @@
"isCorrect": true,
"inputTokens": 2382,
"outputTokens": 72,
- "latencyMs": 2717.4773339999956
+ "latencyMs": 4932.565208
},
{
"questionId": "q25",
@@ -2703,29 +4045,51 @@
"isCorrect": true,
"inputTokens": 2859,
"outputTokens": 6,
- "latencyMs": 1717.8889999999956
+ "latencyMs": 923.8483330000017
},
{
"questionId": "q25",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "89553",
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 3193,
+ "outputTokens": 5,
+ "latencyMs": 1677.830792000008
+ },
+ {
+ "questionId": "q25",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "89553",
- "actual": "46730",
- "isCorrect": false,
- "inputTokens": 6317,
- "outputTokens": 72,
- "latencyMs": 5490.572667
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 7358,
+ "outputTokens": 200,
+ "latencyMs": 4701.415708
},
{
"questionId": "q25",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "89553",
"actual": "89553",
"isCorrect": true,
- "inputTokens": 6368,
+ "inputTokens": 9363,
"outputTokens": 6,
- "latencyMs": 1427.4055000000008
+ "latencyMs": 1366.9058340000047
+ },
+ {
+ "questionId": "q25",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "89553",
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 9099,
+ "outputTokens": 5,
+ "latencyMs": 1693.0314170000056
},
{
"questionId": "q25",
@@ -2735,8 +4099,8 @@
"actual": "89553",
"isCorrect": true,
"inputTokens": 5013,
- "outputTokens": 264,
- "latencyMs": 4052.875957999997
+ "outputTokens": 136,
+ "latencyMs": 5666.829292000009
},
{
"questionId": "q25",
@@ -2747,7 +4111,18 @@
"isCorrect": true,
"inputTokens": 5763,
"outputTokens": 6,
- "latencyMs": 1586.255124999996
+ "latencyMs": 1181.8469999999943
+ },
+ {
+ "questionId": "q25",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "89553",
+ "actual": "89553",
+ "isCorrect": true,
+ "inputTokens": 5745,
+ "outputTokens": 5,
+ "latencyMs": 2083.4975829999894
},
{
"questionId": "q26",
@@ -2757,8 +4132,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6388,
- "outputTokens": 135,
- "latencyMs": 3787.343541000002
+ "outputTokens": 71,
+ "latencyMs": 2986.76112499999
},
{
"questionId": "q26",
@@ -2769,7 +4144,18 @@
"isCorrect": true,
"inputTokens": 7866,
"outputTokens": 4,
- "latencyMs": 1196.934000000001
+ "latencyMs": 1736.9273340000072
+ },
+ {
+ "questionId": "q26",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7907,
+ "outputTokens": 1,
+ "latencyMs": 1777.5319579999923
},
{
"questionId": "q26",
@@ -2780,7 +4166,7 @@
"isCorrect": true,
"inputTokens": 2525,
"outputTokens": 71,
- "latencyMs": 2172.2377080000006
+ "latencyMs": 2717.0237919999927
},
{
"questionId": "q26",
@@ -2791,7 +4177,18 @@
"isCorrect": true,
"inputTokens": 2978,
"outputTokens": 4,
- "latencyMs": 1112.6987080000035
+ "latencyMs": 874.0303339999955
+ },
+ {
+ "questionId": "q26",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3316,
+ "outputTokens": 1,
+ "latencyMs": 5675.357959000001
},
{
"questionId": "q26",
@@ -2802,7 +4199,7 @@
"isCorrect": true,
"inputTokens": 2379,
"outputTokens": 71,
- "latencyMs": 2074.6067919999987
+ "latencyMs": 3198.773958000005
},
{
"questionId": "q26",
@@ -2813,29 +4210,51 @@
"isCorrect": true,
"inputTokens": 2852,
"outputTokens": 4,
- "latencyMs": 1202.2165000000023
+ "latencyMs": 1085.409707999992
},
{
"questionId": "q26",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3190,
+ "outputTokens": 1,
+ "latencyMs": 1932.898749999993
+ },
+ {
+ "questionId": "q26",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6314,
+ "inputTokens": 7355,
"outputTokens": 135,
- "latencyMs": 3257.5967080000046
+ "latencyMs": 4096.534249999997
},
{
"questionId": "q26",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6361,
+ "inputTokens": 9356,
"outputTokens": 4,
- "latencyMs": 1316.7435000000041
+ "latencyMs": 1258.4983749999956
+ },
+ {
+ "questionId": "q26",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9096,
+ "outputTokens": 1,
+ "latencyMs": 2413.0945409999986
},
{
"questionId": "q26",
@@ -2846,7 +4265,7 @@
"isCorrect": true,
"inputTokens": 5010,
"outputTokens": 71,
- "latencyMs": 2391.9063749999987
+ "latencyMs": 3148.736499999999
},
{
"questionId": "q26",
@@ -2857,7 +4276,18 @@
"isCorrect": true,
"inputTokens": 5756,
"outputTokens": 4,
- "latencyMs": 1208.8820829999968
+ "latencyMs": 1131.4892499999987
+ },
+ {
+ "questionId": "q26",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5742,
+ "outputTokens": 1,
+ "latencyMs": 1526.3339579999883
},
{
"questionId": "q27",
@@ -2868,7 +4298,7 @@
"isCorrect": true,
"inputTokens": 6391,
"outputTokens": 142,
- "latencyMs": 2735.679790999995
+ "latencyMs": 2969.5719580000004
},
{
"questionId": "q27",
@@ -2879,7 +4309,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 14,
- "latencyMs": 1253.706624999999
+ "latencyMs": 2196.764500000005
+ },
+ {
+ "questionId": "q27",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "jayme.kertzmann77@gmail.com",
+ "actual": "jayme.kertzmann77@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 7910,
+ "outputTokens": 12,
+ "latencyMs": 1040.4618750000081
},
{
"questionId": "q27",
@@ -2889,8 +4330,8 @@
"actual": "jayme.kertzmann77@gmail.com",
"isCorrect": true,
"inputTokens": 2528,
- "outputTokens": 142,
- "latencyMs": 2471.819457999998
+ "outputTokens": 78,
+ "latencyMs": 3091.4898329999996
},
{
"questionId": "q27",
@@ -2901,7 +4342,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 14,
- "latencyMs": 1063.2195409999986
+ "latencyMs": 1001.9885000000068
+ },
+ {
+ "questionId": "q27",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "jayme.kertzmann77@gmail.com",
+ "actual": "jayme.kertzmann77@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3319,
+ "outputTokens": 12,
+ "latencyMs": 3467.2665410000045
},
{
"questionId": "q27",
@@ -2911,8 +4363,8 @@
"actual": "jayme.kertzmann77@gmail.com",
"isCorrect": true,
"inputTokens": 2382,
- "outputTokens": 142,
- "latencyMs": 2061.6382500000036
+ "outputTokens": 78,
+ "latencyMs": 5917.028874999989
},
{
"questionId": "q27",
@@ -2923,29 +4375,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 14,
- "latencyMs": 1877.579082999997
+ "latencyMs": 1305.7503750000033
},
{
"questionId": "q27",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "jayme.kertzmann77@gmail.com",
+ "actual": "jayme.kertzmann77@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 3193,
+ "outputTokens": 12,
+ "latencyMs": 2613.1883329999982
+ },
+ {
+ "questionId": "q27",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "jayme.kertzmann77@gmail.com",
"actual": "jayme.kertzmann77@gmail.com",
"isCorrect": true,
- "inputTokens": 6317,
+ "inputTokens": 7358,
"outputTokens": 142,
- "latencyMs": 3448.810375000001
+ "latencyMs": 2786.5942090000026
},
{
"questionId": "q27",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "jayme.kertzmann77@gmail.com",
"actual": "jayme.kertzmann77@gmail.com",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 14,
- "latencyMs": 1265.9410419999986
+ "latencyMs": 2270.722458999997
+ },
+ {
+ "questionId": "q27",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "jayme.kertzmann77@gmail.com",
+ "actual": "jayme.kertzmann77@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 9099,
+ "outputTokens": 12,
+ "latencyMs": 1157.144708000007
},
{
"questionId": "q27",
@@ -2955,8 +4429,8 @@
"actual": "jayme.kertzmann77@gmail.com",
"isCorrect": true,
"inputTokens": 5013,
- "outputTokens": 78,
- "latencyMs": 2152.5591669999994
+ "outputTokens": 142,
+ "latencyMs": 3469.4895829999878
},
{
"questionId": "q27",
@@ -2967,7 +4441,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 14,
- "latencyMs": 1432.513583
+ "latencyMs": 1359.8917079999956
+ },
+ {
+ "questionId": "q27",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "jayme.kertzmann77@gmail.com",
+ "actual": "jayme.kertzmann77@gmail.com",
+ "isCorrect": true,
+ "inputTokens": 5745,
+ "outputTokens": 12,
+ "latencyMs": 2318.6192080000037
},
{
"questionId": "q28",
@@ -2978,7 +4463,7 @@
"isCorrect": true,
"inputTokens": 6390,
"outputTokens": 136,
- "latencyMs": 2707.4454169999954
+ "latencyMs": 4774.099707999994
},
{
"questionId": "q28",
@@ -2989,7 +4474,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 6,
- "latencyMs": 1568.5869169999933
+ "latencyMs": 1098.6865830000024
+ },
+ {
+ "questionId": "q28",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "104053",
+ "actual": "104053",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 6,
+ "latencyMs": 1239.2771659999999
},
{
"questionId": "q28",
@@ -3000,7 +4496,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 136,
- "latencyMs": 2373.4566669999986
+ "latencyMs": 5861.847667000009
},
{
"questionId": "q28",
@@ -3011,7 +4507,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 6,
- "latencyMs": 1525.172749999998
+ "latencyMs": 1297.473874999996
+ },
+ {
+ "questionId": "q28",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "104053",
+ "actual": "104053",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 6,
+ "latencyMs": 1698.9040830000013
},
{
"questionId": "q28",
@@ -3021,8 +4528,8 @@
"actual": "104053",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 136,
- "latencyMs": 9347.989583000002
+ "outputTokens": 72,
+ "latencyMs": 7521.450750000004
},
{
"questionId": "q28",
@@ -3033,29 +4540,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 6,
- "latencyMs": 1748.783334000007
+ "latencyMs": 989.1705420000071
},
{
"questionId": "q28",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "104053",
+ "actual": "104053",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 6,
+ "latencyMs": 1598.6000829999975
+ },
+ {
+ "questionId": "q28",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "104053",
"actual": "104053",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 72,
- "latencyMs": 1929.517458000002
+ "inputTokens": 7357,
+ "outputTokens": 136,
+ "latencyMs": 4121.990666000012
},
{
"questionId": "q28",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "104053",
"actual": "104053",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 6,
- "latencyMs": 1022.1345000000001
+ "latencyMs": 1153.3577499999956
+ },
+ {
+ "questionId": "q28",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "104053",
+ "actual": "104053",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 6,
+ "latencyMs": 5119.164292000001
},
{
"questionId": "q28",
@@ -3066,7 +4595,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 136,
- "latencyMs": 2102.925624999996
+ "latencyMs": 5101.831541000007
},
{
"questionId": "q28",
@@ -3077,7 +4606,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 6,
- "latencyMs": 1471.7255839999998
+ "latencyMs": 1048.2691250000062
+ },
+ {
+ "questionId": "q28",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "104053",
+ "actual": "104053",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 6,
+ "latencyMs": 2109.3487500000047
},
{
"questionId": "q29",
@@ -3087,8 +4627,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6391,
- "outputTokens": 71,
- "latencyMs": 1983.693041999999
+ "outputTokens": 135,
+ "latencyMs": 3792.2222499999916
},
{
"questionId": "q29",
@@ -3099,7 +4639,18 @@
"isCorrect": true,
"inputTokens": 7872,
"outputTokens": 4,
- "latencyMs": 1077.2119579999999
+ "latencyMs": 1203.301084000006
+ },
+ {
+ "questionId": "q29",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7910,
+ "outputTokens": 1,
+ "latencyMs": 1963.9974580000126
},
{
"questionId": "q29",
@@ -3109,8 +4660,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2528,
- "outputTokens": 71,
- "latencyMs": 2549.1221250000017
+ "outputTokens": 135,
+ "latencyMs": 3127.7867909999914
},
{
"questionId": "q29",
@@ -3121,7 +4672,18 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 4,
- "latencyMs": 921.1110840000038
+ "latencyMs": 1192.564333000002
+ },
+ {
+ "questionId": "q29",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3319,
+ "outputTokens": 1,
+ "latencyMs": 2034.2360419999895
},
{
"questionId": "q29",
@@ -3131,8 +4693,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2382,
- "outputTokens": 135,
- "latencyMs": 4070.615666999998
+ "outputTokens": 71,
+ "latencyMs": 2648.283917000008
},
{
"questionId": "q29",
@@ -3143,29 +4705,51 @@
"isCorrect": true,
"inputTokens": 2858,
"outputTokens": 4,
- "latencyMs": 974.754832999999
+ "latencyMs": 902.732290999993
},
{
"questionId": "q29",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3193,
+ "outputTokens": 1,
+ "latencyMs": 2174.387124999994
+ },
+ {
+ "questionId": "q29",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6317,
- "outputTokens": 135,
- "latencyMs": 2665.842083000003
+ "inputTokens": 7358,
+ "outputTokens": 71,
+ "latencyMs": 2300.0212080000056
},
{
"questionId": "q29",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 4,
- "latencyMs": 1081.2904160000035
+ "latencyMs": 963.8994999999995
+ },
+ {
+ "questionId": "q29",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9099,
+ "outputTokens": 1,
+ "latencyMs": 4195.405083000005
},
{
"questionId": "q29",
@@ -3176,7 +4760,7 @@
"isCorrect": true,
"inputTokens": 5013,
"outputTokens": 135,
- "latencyMs": 2897.919332999998
+ "latencyMs": 3398.262333999999
},
{
"questionId": "q29",
@@ -3187,7 +4771,18 @@
"isCorrect": true,
"inputTokens": 5762,
"outputTokens": 4,
- "latencyMs": 1341.0955420000028
+ "latencyMs": 1032.8332079999964
+ },
+ {
+ "questionId": "q29",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5745,
+ "outputTokens": 1,
+ "latencyMs": 2265.614916999999
},
{
"questionId": "q30",
@@ -3197,8 +4792,8 @@
"actual": "carley.bauch@yahoo.com",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 204,
- "latencyMs": 3231.9646249999932
+ "outputTokens": 76,
+ "latencyMs": 2575.189624999999
},
{
"questionId": "q30",
@@ -3209,7 +4804,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 12,
- "latencyMs": 1288.5363330000037
+ "latencyMs": 1003.463208000001
+ },
+ {
+ "questionId": "q30",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "carley.bauch@yahoo.com",
+ "actual": "carley.bauch@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 9,
+ "latencyMs": 1218.547916999989
},
{
"questionId": "q30",
@@ -3220,7 +4826,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 76,
- "latencyMs": 2581.508915999999
+ "latencyMs": 17850.385834000015
},
{
"questionId": "q30",
@@ -3231,7 +4837,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 12,
- "latencyMs": 1183.8337079999983
+ "latencyMs": 1060.4747919999936
+ },
+ {
+ "questionId": "q30",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "carley.bauch@yahoo.com",
+ "actual": "carley.bauch@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 9,
+ "latencyMs": 2927.220583000002
},
{
"questionId": "q30",
@@ -3242,7 +4859,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 140,
- "latencyMs": 2073.944792000002
+ "latencyMs": 2492.920542000007
},
{
"questionId": "q30",
@@ -3253,29 +4870,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 12,
- "latencyMs": 1302.5857499999984
+ "latencyMs": 1167.4384590000118
},
{
"questionId": "q30",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "carley.bauch@yahoo.com",
+ "actual": "carley.bauch@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 9,
+ "latencyMs": 1760.1724159999867
+ },
+ {
+ "questionId": "q30",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "carley.bauch@yahoo.com",
"actual": "carley.bauch@yahoo.com",
"isCorrect": true,
- "inputTokens": 6316,
- "outputTokens": 204,
- "latencyMs": 3076.5304590000014
+ "inputTokens": 7357,
+ "outputTokens": 76,
+ "latencyMs": 2586.2806249999994
},
{
"questionId": "q30",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "carley.bauch@yahoo.com",
"actual": "carley.bauch@yahoo.com",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 12,
- "latencyMs": 1110.9787920000017
+ "latencyMs": 1827.6337499999936
+ },
+ {
+ "questionId": "q30",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "carley.bauch@yahoo.com",
+ "actual": "carley.bauch@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 9,
+ "latencyMs": 1985.0590000000084
},
{
"questionId": "q30",
@@ -3286,7 +4925,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 76,
- "latencyMs": 3381.732917000001
+ "latencyMs": 2150.4795000000013
},
{
"questionId": "q30",
@@ -3297,7 +4936,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 12,
- "latencyMs": 1198.1488329999993
+ "latencyMs": 1151.3658339999965
+ },
+ {
+ "questionId": "q30",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "carley.bauch@yahoo.com",
+ "actual": "carley.bauch@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 9,
+ "latencyMs": 2104.947874999998
},
{
"questionId": "q31",
@@ -3308,7 +4958,7 @@
"isCorrect": true,
"inputTokens": 6393,
"outputTokens": 136,
- "latencyMs": 2687.965959000001
+ "latencyMs": 2204.857333000007
},
{
"questionId": "q31",
@@ -3319,7 +4969,18 @@
"isCorrect": true,
"inputTokens": 7874,
"outputTokens": 6,
- "latencyMs": 2615.956250000003
+ "latencyMs": 1366.9736249999987
+ },
+ {
+ "questionId": "q31",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "142029",
+ "actual": "142029",
+ "isCorrect": true,
+ "inputTokens": 7911,
+ "outputTokens": 6,
+ "latencyMs": 1108.5303330000024
},
{
"questionId": "q31",
@@ -3330,7 +4991,7 @@
"isCorrect": true,
"inputTokens": 2530,
"outputTokens": 136,
- "latencyMs": 2132.413249999998
+ "latencyMs": 2809.3447089999972
},
{
"questionId": "q31",
@@ -3341,7 +5002,18 @@
"isCorrect": true,
"inputTokens": 2986,
"outputTokens": 6,
- "latencyMs": 1091.060666999998
+ "latencyMs": 985.2792080000072
+ },
+ {
+ "questionId": "q31",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "142029",
+ "actual": "142029",
+ "isCorrect": true,
+ "inputTokens": 3320,
+ "outputTokens": 6,
+ "latencyMs": 1869.5062499999913
},
{
"questionId": "q31",
@@ -3351,8 +5023,8 @@
"actual": "142029",
"isCorrect": true,
"inputTokens": 2384,
- "outputTokens": 72,
- "latencyMs": 2074.8201670000053
+ "outputTokens": 136,
+ "latencyMs": 2816.2447910000046
},
{
"questionId": "q31",
@@ -3363,29 +5035,51 @@
"isCorrect": true,
"inputTokens": 2860,
"outputTokens": 6,
- "latencyMs": 1622.2757499999934
+ "latencyMs": 1038.263666999992
},
{
"questionId": "q31",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "142029",
+ "actual": "142029",
+ "isCorrect": true,
+ "inputTokens": 3194,
+ "outputTokens": 6,
+ "latencyMs": 1011.8830000000016
+ },
+ {
+ "questionId": "q31",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "142029",
"actual": "142029",
"isCorrect": true,
- "inputTokens": 6319,
+ "inputTokens": 7360,
"outputTokens": 200,
- "latencyMs": 3122.3756670000002
+ "latencyMs": 2650.324915999983
},
{
"questionId": "q31",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "142029",
"actual": "142029",
"isCorrect": true,
- "inputTokens": 6369,
+ "inputTokens": 9364,
"outputTokens": 6,
- "latencyMs": 1175.7301249999946
+ "latencyMs": 1139.189167000004
+ },
+ {
+ "questionId": "q31",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "142029",
+ "actual": "142029",
+ "isCorrect": true,
+ "inputTokens": 9100,
+ "outputTokens": 6,
+ "latencyMs": 1773.4112920000043
},
{
"questionId": "q31",
@@ -3396,7 +5090,7 @@
"isCorrect": true,
"inputTokens": 5015,
"outputTokens": 136,
- "latencyMs": 2601.074916999998
+ "latencyMs": 2481.3391249999986
},
{
"questionId": "q31",
@@ -3407,7 +5101,18 @@
"isCorrect": true,
"inputTokens": 5764,
"outputTokens": 6,
- "latencyMs": 1089.4757079999981
+ "latencyMs": 1290.1707079999906
+ },
+ {
+ "questionId": "q31",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "142029",
+ "actual": "142029",
+ "isCorrect": true,
+ "inputTokens": 5746,
+ "outputTokens": 6,
+ "latencyMs": 2289.944292
},
{
"questionId": "q32",
@@ -3418,7 +5123,7 @@
"isCorrect": true,
"inputTokens": 6389,
"outputTokens": 135,
- "latencyMs": 6939.617750000005
+ "latencyMs": 4142.8067919999885
},
{
"questionId": "q32",
@@ -3429,7 +5134,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 4,
- "latencyMs": 1207.9619999999995
+ "latencyMs": 1067.801999999996
+ },
+ {
+ "questionId": "q32",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 1,
+ "latencyMs": 1057.6598330000124
},
{
"questionId": "q32",
@@ -3440,7 +5156,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 135,
- "latencyMs": 2784.063166
+ "latencyMs": 2198.369875000004
},
{
"questionId": "q32",
@@ -3451,7 +5167,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 4,
- "latencyMs": 1011.0956670000014
+ "latencyMs": 1228.235249999998
+ },
+ {
+ "questionId": "q32",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 1,
+ "latencyMs": 2113.6464160000032
},
{
"questionId": "q32",
@@ -3462,7 +5189,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 135,
- "latencyMs": 3098.7147909999985
+ "latencyMs": 2331.9615420000046
},
{
"questionId": "q32",
@@ -3473,29 +5200,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 4,
- "latencyMs": 983.9449170000007
+ "latencyMs": 1010.4068330000155
},
{
"questionId": "q32",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 1,
+ "latencyMs": 1529.0002080000122
+ },
+ {
+ "questionId": "q32",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 135,
- "latencyMs": 3889.572291999997
+ "inputTokens": 7356,
+ "outputTokens": 199,
+ "latencyMs": 4986.682375000004
},
{
"questionId": "q32",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 4,
- "latencyMs": 1096.1613339999967
+ "latencyMs": 1295.2261669999862
+ },
+ {
+ "questionId": "q32",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 1,
+ "latencyMs": 2608.518458000006
},
{
"questionId": "q32",
@@ -3506,7 +5255,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 71,
- "latencyMs": 2484.078917000006
+ "latencyMs": 1683.7294159999874
},
{
"questionId": "q32",
@@ -3517,7 +5266,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 4,
- "latencyMs": 1150.418792000004
+ "latencyMs": 1466.112374999997
+ },
+ {
+ "questionId": "q32",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 1,
+ "latencyMs": 2186.13829100001
},
{
"questionId": "q33",
@@ -3527,8 +5287,8 @@
"actual": "cheyenne_skiles@hotmail.com",
"isCorrect": true,
"inputTokens": 6393,
- "outputTokens": 140,
- "latencyMs": 2221.4447079999954
+ "outputTokens": 204,
+ "latencyMs": 4101.640291000018
},
{
"questionId": "q33",
@@ -3539,7 +5299,18 @@
"isCorrect": true,
"inputTokens": 7872,
"outputTokens": 14,
- "latencyMs": 1193.9583749999947
+ "latencyMs": 1355.6347499999974
+ },
+ {
+ "questionId": "q33",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "cheyenne_skiles@hotmail.com",
+ "actual": "cheyenne_skiles@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 7911,
+ "outputTokens": 9,
+ "latencyMs": 1218.3612080000166
},
{
"questionId": "q33",
@@ -3549,8 +5320,8 @@
"actual": "cheyenne_skiles@hotmail.com",
"isCorrect": true,
"inputTokens": 2530,
- "outputTokens": 76,
- "latencyMs": 2170.8865829999995
+ "outputTokens": 140,
+ "latencyMs": 2800.1185839999816
},
{
"questionId": "q33",
@@ -3561,7 +5332,18 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 14,
- "latencyMs": 1247.6116660000043
+ "latencyMs": 1477.837124999991
+ },
+ {
+ "questionId": "q33",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "cheyenne_skiles@hotmail.com",
+ "actual": "cheyenne_skiles@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 3320,
+ "outputTokens": 9,
+ "latencyMs": 1545.5144169999985
},
{
"questionId": "q33",
@@ -3572,7 +5354,7 @@
"isCorrect": true,
"inputTokens": 2384,
"outputTokens": 76,
- "latencyMs": 3827.705667000002
+ "latencyMs": 3839.476958000014
},
{
"questionId": "q33",
@@ -3583,29 +5365,51 @@
"isCorrect": true,
"inputTokens": 2858,
"outputTokens": 14,
- "latencyMs": 1084.8218339999949
+ "latencyMs": 1138.701000000001
},
{
"questionId": "q33",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "cheyenne_skiles@hotmail.com",
+ "actual": "cheyenne_skiles@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 3194,
+ "outputTokens": 9,
+ "latencyMs": 928.7706250000047
+ },
+ {
+ "questionId": "q33",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "cheyenne_skiles@hotmail.com",
"actual": "cheyenne_skiles@hotmail.com",
"isCorrect": true,
- "inputTokens": 6319,
+ "inputTokens": 7360,
"outputTokens": 140,
- "latencyMs": 3311.8220839999994
+ "latencyMs": 2666.2794580000045
},
{
"questionId": "q33",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "cheyenne_skiles@hotmail.com",
"actual": "cheyenne_skiles@hotmail.com",
"isCorrect": true,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 14,
- "latencyMs": 1269.2092920000068
+ "latencyMs": 2169.680166999984
+ },
+ {
+ "questionId": "q33",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "cheyenne_skiles@hotmail.com",
+ "actual": "cheyenne_skiles@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 9100,
+ "outputTokens": 9,
+ "latencyMs": 1705.846458999993
},
{
"questionId": "q33",
@@ -3615,8 +5419,8 @@
"actual": "cheyenne_skiles@hotmail.com",
"isCorrect": true,
"inputTokens": 5015,
- "outputTokens": 140,
- "latencyMs": 2648.3102500000023
+ "outputTokens": 76,
+ "latencyMs": 2263.530958999996
},
{
"questionId": "q33",
@@ -3627,7 +5431,18 @@
"isCorrect": true,
"inputTokens": 5762,
"outputTokens": 14,
- "latencyMs": 1278.0403750000041
+ "latencyMs": 1402.7602079999924
+ },
+ {
+ "questionId": "q33",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "cheyenne_skiles@hotmail.com",
+ "actual": "cheyenne_skiles@hotmail.com",
+ "isCorrect": true,
+ "inputTokens": 5746,
+ "outputTokens": 9,
+ "latencyMs": 2376.068292000011
},
{
"questionId": "q34",
@@ -3637,8 +5452,8 @@
"actual": "84650",
"isCorrect": true,
"inputTokens": 6391,
- "outputTokens": 136,
- "latencyMs": 3555.1511670000036
+ "outputTokens": 72,
+ "latencyMs": 2438.071291
},
{
"questionId": "q34",
@@ -3649,7 +5464,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 6,
- "latencyMs": 1317.5797499999971
+ "latencyMs": 1119.892125000013
+ },
+ {
+ "questionId": "q34",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "84650",
+ "actual": "84650",
+ "isCorrect": true,
+ "inputTokens": 7910,
+ "outputTokens": 5,
+ "latencyMs": 1219.9752500000177
},
{
"questionId": "q34",
@@ -3660,7 +5486,7 @@
"isCorrect": true,
"inputTokens": 2528,
"outputTokens": 136,
- "latencyMs": 2291.943041999999
+ "latencyMs": 3074.212375000003
},
{
"questionId": "q34",
@@ -3671,7 +5497,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 6,
- "latencyMs": 2081.3947499999995
+ "latencyMs": 1182.489499999996
+ },
+ {
+ "questionId": "q34",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "84650",
+ "actual": "84650",
+ "isCorrect": true,
+ "inputTokens": 3319,
+ "outputTokens": 5,
+ "latencyMs": 2366.0734999999986
},
{
"questionId": "q34",
@@ -3682,7 +5519,7 @@
"isCorrect": true,
"inputTokens": 2382,
"outputTokens": 72,
- "latencyMs": 2067.9348329999993
+ "latencyMs": 3682.4087500000023
},
{
"questionId": "q34",
@@ -3693,29 +5530,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 6,
- "latencyMs": 1192.6603340000001
+ "latencyMs": 865.8139159999846
},
{
"questionId": "q34",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "84650",
+ "actual": "84650",
+ "isCorrect": true,
+ "inputTokens": 3193,
+ "outputTokens": 5,
+ "latencyMs": 1594.2567079999717
+ },
+ {
+ "questionId": "q34",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "84650",
"actual": "84650",
"isCorrect": true,
- "inputTokens": 6317,
+ "inputTokens": 7358,
"outputTokens": 200,
- "latencyMs": 3044.592457999999
+ "latencyMs": 9620.968290999997
},
{
"questionId": "q34",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "84650",
"actual": "84650",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 6,
- "latencyMs": 1106.2235409999994
+ "latencyMs": 1066.5026659999858
+ },
+ {
+ "questionId": "q34",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "84650",
+ "actual": "84650",
+ "isCorrect": true,
+ "inputTokens": 9099,
+ "outputTokens": 5,
+ "latencyMs": 2701.866624999995
},
{
"questionId": "q34",
@@ -3726,7 +5585,7 @@
"isCorrect": true,
"inputTokens": 5013,
"outputTokens": 136,
- "latencyMs": 2627.8240000000005
+ "latencyMs": 3559.778957999981
},
{
"questionId": "q34",
@@ -3737,7 +5596,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 6,
- "latencyMs": 1379.9015
+ "latencyMs": 1008.4788750000007
+ },
+ {
+ "questionId": "q34",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "84650",
+ "actual": "84650",
+ "isCorrect": true,
+ "inputTokens": 5745,
+ "outputTokens": 5,
+ "latencyMs": 1889.822375000018
},
{
"questionId": "q35",
@@ -3747,8 +5617,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 263,
- "latencyMs": 3705.3900829999984
+ "outputTokens": 71,
+ "latencyMs": 3083.3981669999775
},
{
"questionId": "q35",
@@ -3759,7 +5629,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 4,
- "latencyMs": 1909.4442500000005
+ "latencyMs": 1060.2027909999888
+ },
+ {
+ "questionId": "q35",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 1,
+ "latencyMs": 1432.9026670000167
},
{
"questionId": "q35",
@@ -3769,8 +5650,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2527,
- "outputTokens": 135,
- "latencyMs": 2173.6019589999996
+ "outputTokens": 71,
+ "latencyMs": 2827.286916000012
},
{
"questionId": "q35",
@@ -3781,7 +5662,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 4,
- "latencyMs": 1063.8584580000024
+ "latencyMs": 1606.289208000002
+ },
+ {
+ "questionId": "q35",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 1,
+ "latencyMs": 1781.2257079999836
},
{
"questionId": "q35",
@@ -3791,8 +5683,8 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 2381,
- "outputTokens": 71,
- "latencyMs": 1800.4930420000019
+ "outputTokens": 135,
+ "latencyMs": 2855.722792000015
},
{
"questionId": "q35",
@@ -3803,29 +5695,51 @@
"isCorrect": true,
"inputTokens": 2857,
"outputTokens": 4,
- "latencyMs": 1011.3969579999975
+ "latencyMs": 1140.299874999997
},
{
"questionId": "q35",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 1,
+ "latencyMs": 2195.365832999989
+ },
+ {
+ "questionId": "q35",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 135,
- "latencyMs": 2562.2492500000008
+ "latencyMs": 2904.48324999999
},
{
"questionId": "q35",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 4,
- "latencyMs": 1349.1809170000051
+ "latencyMs": 1264.2794160000049
+ },
+ {
+ "questionId": "q35",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 1,
+ "latencyMs": 3598.464708000014
},
{
"questionId": "q35",
@@ -3836,7 +5750,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 71,
- "latencyMs": 1883.7523750000037
+ "latencyMs": 2646.219666000019
},
{
"questionId": "q35",
@@ -3847,7 +5761,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 4,
- "latencyMs": 1135.412292000001
+ "latencyMs": 1090.8027500000026
+ },
+ {
+ "questionId": "q35",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 1,
+ "latencyMs": 2322.022082999989
},
{
"questionId": "q36",
@@ -3857,52 +5782,74 @@
"actual": "macey.gottlieb5@yahoo.com",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 334,
- "latencyMs": 4067.161957999997
- },
- {
- "questionId": "q36",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "macey.gottlieb5@yahoo.com",
- "actual": "macey.gottlieb5@yahoo.com",
- "isCorrect": true,
- "inputTokens": 7869,
- "outputTokens": 14,
- "latencyMs": 1333.0713749999995
- },
- {
- "questionId": "q36",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "macey.gottlieb5@yahoo.com",
- "actual": "macey.gottlieb5@yahoo.com",
- "isCorrect": true,
- "inputTokens": 2526,
- "outputTokens": 142,
- "latencyMs": 2081.8315000000002
- },
- {
- "questionId": "q36",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "macey.gottlieb5@yahoo.com",
- "actual": "macey.gottlieb5@yahoo.com",
- "isCorrect": true,
- "inputTokens": 2981,
- "outputTokens": 14,
- "latencyMs": 1231.0224579999995
- },
- {
- "questionId": "q36",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "macey.gottlieb5@yahoo.com",
- "actual": "macey.gottlieb5@yahoo.com",
- "isCorrect": true,
- "inputTokens": 2380,
"outputTokens": 78,
- "latencyMs": 2333.0360409999994
+ "latencyMs": 2498.7566669999796
+ },
+ {
+ "questionId": "q36",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7869,
+ "outputTokens": 14,
+ "latencyMs": 1563.026332999987
+ },
+ {
+ "questionId": "q36",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 11,
+ "latencyMs": 1062.8037919999915
+ },
+ {
+ "questionId": "q36",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 2526,
+ "outputTokens": 590,
+ "latencyMs": 9420.16175
+ },
+ {
+ "questionId": "q36",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 2981,
+ "outputTokens": 14,
+ "latencyMs": 1038.3448750000098
+ },
+ {
+ "questionId": "q36",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 11,
+ "latencyMs": 3468.648833000014
+ },
+ {
+ "questionId": "q36",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 2380,
+ "outputTokens": 142,
+ "latencyMs": 3061.706208000018
},
{
"questionId": "q36",
@@ -3913,29 +5860,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 14,
- "latencyMs": 1175.1937500000058
+ "latencyMs": 1053.0741669999843
},
{
"questionId": "q36",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 11,
+ "latencyMs": 1576.9219160000212
+ },
+ {
+ "questionId": "q36",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "macey.gottlieb5@yahoo.com",
"actual": "macey.gottlieb5@yahoo.com",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 206,
- "latencyMs": 7391.094749999997
+ "inputTokens": 7356,
+ "outputTokens": 78,
+ "latencyMs": 1889.579624999984
},
{
"questionId": "q36",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "macey.gottlieb5@yahoo.com",
"actual": "macey.gottlieb5@yahoo.com",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 14,
- "latencyMs": 1843.981458000002
+ "latencyMs": 1520.9462920000078
+ },
+ {
+ "questionId": "q36",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 11,
+ "latencyMs": 1917.4184999999998
},
{
"questionId": "q36",
@@ -3946,7 +5915,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 142,
- "latencyMs": 2386.8134589999972
+ "latencyMs": 4630.122166999994
},
{
"questionId": "q36",
@@ -3957,7 +5926,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 14,
- "latencyMs": 1449.751750000003
+ "latencyMs": 1646.354083000013
+ },
+ {
+ "questionId": "q36",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "macey.gottlieb5@yahoo.com",
+ "actual": "macey.gottlieb5@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 11,
+ "latencyMs": 2197.673375000013
},
{
"questionId": "q37",
@@ -3967,8 +5947,8 @@
"actual": "89773",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 136,
- "latencyMs": 4075.600666999999
+ "outputTokens": 72,
+ "latencyMs": 3646.0600829999894
},
{
"questionId": "q37",
@@ -3979,7 +5959,18 @@
"isCorrect": true,
"inputTokens": 7868,
"outputTokens": 6,
- "latencyMs": 985.1729999999952
+ "latencyMs": 1356.2343330000003
+ },
+ {
+ "questionId": "q37",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "89773",
+ "actual": "89773",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 5,
+ "latencyMs": 735.1860419999866
},
{
"questionId": "q37",
@@ -3990,7 +5981,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 136,
- "latencyMs": 2891.2602079999997
+ "latencyMs": 2701.791499999992
},
{
"questionId": "q37",
@@ -4001,7 +5992,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 6,
- "latencyMs": 2073.129000000001
+ "latencyMs": 1259.3909169999824
+ },
+ {
+ "questionId": "q37",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "89773",
+ "actual": "89773",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 5,
+ "latencyMs": 1960.7033339999907
},
{
"questionId": "q37",
@@ -4012,7 +6014,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 72,
- "latencyMs": 1894.3316669999986
+ "latencyMs": 5573.357083999988
},
{
"questionId": "q37",
@@ -4023,29 +6025,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 6,
- "latencyMs": 1172.3735000000015
+ "latencyMs": 1284.3673750000016
},
{
"questionId": "q37",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "89773",
+ "actual": "89773",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 5,
+ "latencyMs": 2050.5506659999955
+ },
+ {
+ "questionId": "q37",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "89773",
"actual": "89773",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 72,
- "latencyMs": 2456.6511249999967
+ "inputTokens": 7356,
+ "outputTokens": 136,
+ "latencyMs": 3253.602791000012
},
{
"questionId": "q37",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "89773",
"actual": "89773",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 6,
- "latencyMs": 1298.1367079999982
+ "latencyMs": 1146.329166999989
+ },
+ {
+ "questionId": "q37",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "89773",
+ "actual": "89773",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 5,
+ "latencyMs": 2395.673125000001
},
{
"questionId": "q37",
@@ -4055,8 +6079,8 @@
"actual": "89773",
"isCorrect": true,
"inputTokens": 5011,
- "outputTokens": 136,
- "latencyMs": 6018.304375
+ "outputTokens": 72,
+ "latencyMs": 2913.434957999998
},
{
"questionId": "q37",
@@ -4067,7 +6091,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 6,
- "latencyMs": 1103.9152499999982
+ "latencyMs": 2243.595874999999
+ },
+ {
+ "questionId": "q37",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "89773",
+ "actual": "89773",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 5,
+ "latencyMs": 1839.661374999996
},
{
"questionId": "q38",
@@ -4077,30 +6112,41 @@
"actual": "Marketing",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 71,
- "latencyMs": 3867.303832999998
- },
- {
- "questionId": "q38",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Marketing",
- "actual": "Marketing",
- "isCorrect": true,
- "inputTokens": 7868,
- "outputTokens": 4,
- "latencyMs": 1287.7528749999983
- },
- {
- "questionId": "q38",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Marketing",
- "actual": "Marketing",
- "isCorrect": true,
- "inputTokens": 2526,
"outputTokens": 135,
- "latencyMs": 2355.0305829999998
+ "latencyMs": 2779.79579199999
+ },
+ {
+ "questionId": "q38",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7868,
+ "outputTokens": 4,
+ "latencyMs": 1133.7338750000054
+ },
+ {
+ "questionId": "q38",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 1,
+ "latencyMs": 774.6977079999924
+ },
+ {
+ "questionId": "q38",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 2526,
+ "outputTokens": 71,
+ "latencyMs": 4311.999750000017
},
{
"questionId": "q38",
@@ -4111,7 +6157,18 @@
"isCorrect": true,
"inputTokens": 2980,
"outputTokens": 4,
- "latencyMs": 1086.8424579999992
+ "latencyMs": 2223.9427499999874
+ },
+ {
+ "questionId": "q38",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 1,
+ "latencyMs": 2975.953125
},
{
"questionId": "q38",
@@ -4122,7 +6179,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 71,
- "latencyMs": 3472.6323339999944
+ "latencyMs": 4617.852291999996
},
{
"questionId": "q38",
@@ -4133,29 +6190,51 @@
"isCorrect": true,
"inputTokens": 2854,
"outputTokens": 4,
- "latencyMs": 948.3086249999978
+ "latencyMs": 1096.2197500000184
},
{
"questionId": "q38",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 1,
+ "latencyMs": 2754.3287919999857
+ },
+ {
+ "questionId": "q38",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 71,
- "latencyMs": 3343.3446659999972
+ "inputTokens": 7356,
+ "outputTokens": 135,
+ "latencyMs": 3539.3821250000037
},
{
"questionId": "q38",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Marketing",
"actual": "Marketing",
"isCorrect": true,
- "inputTokens": 6363,
+ "inputTokens": 9358,
"outputTokens": 4,
- "latencyMs": 1048.567959
+ "latencyMs": 1369.516082999995
+ },
+ {
+ "questionId": "q38",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 1,
+ "latencyMs": 2677.958791000012
},
{
"questionId": "q38",
@@ -4166,7 +6245,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 71,
- "latencyMs": 3761.141875000001
+ "latencyMs": 2209.974041999987
},
{
"questionId": "q38",
@@ -4177,7 +6256,18 @@
"isCorrect": true,
"inputTokens": 5758,
"outputTokens": 4,
- "latencyMs": 1130.9393339999951
+ "latencyMs": 1352.3056670000078
+ },
+ {
+ "questionId": "q38",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Marketing",
+ "actual": "Marketing",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 1,
+ "latencyMs": 2126.258208000014
},
{
"questionId": "q39",
@@ -4187,8 +6277,8 @@
"actual": "georgianna_renner@yahoo.com",
"isCorrect": true,
"inputTokens": 6389,
- "outputTokens": 79,
- "latencyMs": 4200.215792000003
+ "outputTokens": 207,
+ "latencyMs": 3999.7677079999994
},
{
"questionId": "q39",
@@ -4199,7 +6289,18 @@
"isCorrect": true,
"inputTokens": 7869,
"outputTokens": 13,
- "latencyMs": 1351.981166999998
+ "latencyMs": 1170.8554579999764
+ },
+ {
+ "questionId": "q39",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "georgianna_renner@yahoo.com",
+ "actual": "georgianna_renner@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 7908,
+ "outputTokens": 10,
+ "latencyMs": 1278.5721670000057
},
{
"questionId": "q39",
@@ -4210,7 +6311,7 @@
"isCorrect": true,
"inputTokens": 2526,
"outputTokens": 143,
- "latencyMs": 2465.4245840000003
+ "latencyMs": 3334.013791000005
},
{
"questionId": "q39",
@@ -4221,7 +6322,18 @@
"isCorrect": true,
"inputTokens": 2981,
"outputTokens": 13,
- "latencyMs": 885.4770840000056
+ "latencyMs": 1115.4245419999934
+ },
+ {
+ "questionId": "q39",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "georgianna_renner@yahoo.com",
+ "actual": "georgianna_renner@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3317,
+ "outputTokens": 10,
+ "latencyMs": 2555.918707999983
},
{
"questionId": "q39",
@@ -4232,7 +6344,7 @@
"isCorrect": true,
"inputTokens": 2380,
"outputTokens": 143,
- "latencyMs": 2903.201958000005
+ "latencyMs": 2100.1043329999957
},
{
"questionId": "q39",
@@ -4243,29 +6355,51 @@
"isCorrect": true,
"inputTokens": 2855,
"outputTokens": 13,
- "latencyMs": 1006.1219579999961
+ "latencyMs": 1298.810999999987
},
{
"questionId": "q39",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "georgianna_renner@yahoo.com",
+ "actual": "georgianna_renner@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 3191,
+ "outputTokens": 10,
+ "latencyMs": 1940.2669170000008
+ },
+ {
+ "questionId": "q39",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "georgianna_renner@yahoo.com",
"actual": "georgianna_renner@yahoo.com",
"isCorrect": true,
- "inputTokens": 6315,
- "outputTokens": 207,
- "latencyMs": 3253.900333999998
+ "inputTokens": 7356,
+ "outputTokens": 143,
+ "latencyMs": 2666.5189580000006
},
{
"questionId": "q39",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "georgianna_renner@yahoo.com",
"actual": "georgianna_renner@yahoo.com",
"isCorrect": true,
- "inputTokens": 6364,
+ "inputTokens": 9359,
"outputTokens": 13,
- "latencyMs": 1219.713582999997
+ "latencyMs": 1611.7814170000202
+ },
+ {
+ "questionId": "q39",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "georgianna_renner@yahoo.com",
+ "actual": "georgianna_renner@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 9097,
+ "outputTokens": 10,
+ "latencyMs": 1709.3350419999915
},
{
"questionId": "q39",
@@ -4276,7 +6410,7 @@
"isCorrect": true,
"inputTokens": 5011,
"outputTokens": 143,
- "latencyMs": 2335.6635000000024
+ "latencyMs": 4774.929042000003
},
{
"questionId": "q39",
@@ -4287,7 +6421,18 @@
"isCorrect": true,
"inputTokens": 5759,
"outputTokens": 13,
- "latencyMs": 1334.1358330000003
+ "latencyMs": 1369.8504160000011
+ },
+ {
+ "questionId": "q39",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "georgianna_renner@yahoo.com",
+ "actual": "georgianna_renner@yahoo.com",
+ "isCorrect": true,
+ "inputTokens": 5743,
+ "outputTokens": 10,
+ "latencyMs": 3123.9857920000213
},
{
"questionId": "q40",
@@ -4297,8 +6442,8 @@
"actual": "49741",
"isCorrect": true,
"inputTokens": 6390,
- "outputTokens": 136,
- "latencyMs": 1912.2536669999972
+ "outputTokens": 72,
+ "latencyMs": 2700.2800830000197
},
{
"questionId": "q40",
@@ -4309,7 +6454,18 @@
"isCorrect": true,
"inputTokens": 7871,
"outputTokens": 6,
- "latencyMs": 1104.4684160000033
+ "latencyMs": 1145.983292000019
+ },
+ {
+ "questionId": "q40",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "49741",
+ "actual": "49741",
+ "isCorrect": true,
+ "inputTokens": 7909,
+ "outputTokens": 5,
+ "latencyMs": 952.1742089999898
},
{
"questionId": "q40",
@@ -4320,7 +6476,7 @@
"isCorrect": true,
"inputTokens": 2527,
"outputTokens": 72,
- "latencyMs": 2648.919750000001
+ "latencyMs": 2220.3111250000075
},
{
"questionId": "q40",
@@ -4331,7 +6487,18 @@
"isCorrect": true,
"inputTokens": 2983,
"outputTokens": 6,
- "latencyMs": 1525.6309170000022
+ "latencyMs": 981.9718339999963
+ },
+ {
+ "questionId": "q40",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "49741",
+ "actual": "49741",
+ "isCorrect": true,
+ "inputTokens": 3318,
+ "outputTokens": 5,
+ "latencyMs": 2079.9035830000066
},
{
"questionId": "q40",
@@ -4342,7 +6509,7 @@
"isCorrect": true,
"inputTokens": 2381,
"outputTokens": 136,
- "latencyMs": 2736.3283749999973
+ "latencyMs": 2519.2579590000096
},
{
"questionId": "q40",
@@ -4353,29 +6520,51 @@
"isCorrect": false,
"inputTokens": 2857,
"outputTokens": 6,
- "latencyMs": 1077.766334
+ "latencyMs": 942.0043329999899
},
{
"questionId": "q40",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "49741",
+ "actual": "49741",
+ "isCorrect": true,
+ "inputTokens": 3192,
+ "outputTokens": 5,
+ "latencyMs": 1683.0637080000015
+ },
+ {
+ "questionId": "q40",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "49741",
"actual": "49741",
"isCorrect": true,
- "inputTokens": 6316,
+ "inputTokens": 7357,
"outputTokens": 72,
- "latencyMs": 2116.5284170000014
+ "latencyMs": 2190.1603750000068
},
{
"questionId": "q40",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "49741",
"actual": "49741",
"isCorrect": true,
- "inputTokens": 6366,
+ "inputTokens": 9361,
"outputTokens": 6,
- "latencyMs": 1159.7744170000005
+ "latencyMs": 1771.8361250000016
+ },
+ {
+ "questionId": "q40",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "49741",
+ "actual": "49741",
+ "isCorrect": true,
+ "inputTokens": 9098,
+ "outputTokens": 5,
+ "latencyMs": 2376.372875000001
},
{
"questionId": "q40",
@@ -4386,7 +6575,7 @@
"isCorrect": true,
"inputTokens": 5012,
"outputTokens": 72,
- "latencyMs": 2529.7074160000047
+ "latencyMs": 2355.175791000016
},
{
"questionId": "q40",
@@ -4397,7 +6586,18 @@
"isCorrect": true,
"inputTokens": 5761,
"outputTokens": 6,
- "latencyMs": 1604.601791999994
+ "latencyMs": 1192.191541999986
+ },
+ {
+ "questionId": "q40",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "49741",
+ "actual": "49741",
+ "isCorrect": true,
+ "inputTokens": 5744,
+ "outputTokens": 5,
+ "latencyMs": 2328.137166999979
},
{
"questionId": "q41",
@@ -4407,121 +6607,11 @@
"actual": "17",
"isCorrect": true,
"inputTokens": 6387,
- "outputTokens": 967,
- "latencyMs": 8300.216583000001
- },
- {
- "questionId": "q41",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 7865,
- "outputTokens": 5,
- "latencyMs": 1204.089749999992
- },
- {
- "questionId": "q41",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 2524,
- "outputTokens": 455,
- "latencyMs": 5231.604541000001
- },
- {
- "questionId": "q41",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 2977,
- "outputTokens": 5,
- "latencyMs": 1168.508707999994
- },
- {
- "questionId": "q41",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 2378,
- "outputTokens": 967,
- "latencyMs": 8396.912500000006
- },
- {
- "questionId": "q41",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 2851,
- "outputTokens": 5,
- "latencyMs": 1060.6276250000083
- },
- {
- "questionId": "q41",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 6313,
"outputTokens": 775,
- "latencyMs": 9340.763791999998
+ "latencyMs": 11132.566209000011
},
{
"questionId": "q41",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 6360,
- "outputTokens": 5,
- "latencyMs": 1020.8827080000046
- },
- {
- "questionId": "q41",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 5009,
- "outputTokens": 903,
- "latencyMs": 8792.062000000005
- },
- {
- "questionId": "q41",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 5755,
- "outputTokens": 5,
- "latencyMs": 1459.8301659999997
- },
- {
- "questionId": "q42",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 6387,
- "outputTokens": 519,
- "latencyMs": 6439.622583000004
- },
- {
- "questionId": "q42",
"format": "json",
"model": "claude-haiku-4-5",
"expected": "17",
@@ -4529,43 +6619,65 @@
"isCorrect": false,
"inputTokens": 7865,
"outputTokens": 5,
- "latencyMs": 1416.1659170000057
+ "latencyMs": 1048.9463749999995
},
{
- "questionId": "q42",
+ "questionId": "q41",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 954.9381670000148
+ },
+ {
+ "questionId": "q41",
"format": "toon",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 2524,
- "outputTokens": 903,
- "latencyMs": 8064.398499999996
+ "outputTokens": 583,
+ "latencyMs": 5343.168333000009
},
{
- "questionId": "q42",
+ "questionId": "q41",
"format": "toon",
"model": "claude-haiku-4-5",
"expected": "17",
- "actual": "14",
+ "actual": "15",
"isCorrect": false,
"inputTokens": 2977,
"outputTokens": 5,
- "latencyMs": 998.3781250000029
+ "latencyMs": 929.4576249999809
},
{
- "questionId": "q42",
+ "questionId": "q41",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 1230.1574160000018
+ },
+ {
+ "questionId": "q41",
"format": "csv",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 2378,
- "outputTokens": 647,
- "latencyMs": 5498.786500000002
+ "outputTokens": 1415,
+ "latencyMs": 16158.150375000027
},
{
- "questionId": "q42",
+ "questionId": "q41",
"format": "csv",
"model": "claude-haiku-4-5",
"expected": "17",
@@ -4573,153 +6685,65 @@
"isCorrect": false,
"inputTokens": 2851,
"outputTokens": 5,
- "latencyMs": 1343.9632910000073
+ "latencyMs": 932.4995000000054
},
{
- "questionId": "q42",
- "format": "markdown-kv",
+ "questionId": "q41",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "14",
+ "isCorrect": false,
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 1859.355958
+ },
+ {
+ "questionId": "q41",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
- "inputTokens": 6313,
- "outputTokens": 647,
- "latencyMs": 7565.158291
+ "inputTokens": 7354,
+ "outputTokens": 903,
+ "latencyMs": 11415.376208000001
},
{
- "questionId": "q42",
- "format": "markdown-kv",
+ "questionId": "q41",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "17",
- "actual": "14",
+ "actual": "15",
"isCorrect": false,
- "inputTokens": 6360,
+ "inputTokens": 9355,
"outputTokens": 5,
- "latencyMs": 1320.9714169999934
+ "latencyMs": 1198.3916249999893
},
{
- "questionId": "q42",
+ "questionId": "q41",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 3497.0485409999965
+ },
+ {
+ "questionId": "q41",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 5009,
- "outputTokens": 839,
- "latencyMs": 10626.395499999999
- },
- {
- "questionId": "q42",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 5755,
- "outputTokens": 5,
- "latencyMs": 3227.584917
- },
- {
- "questionId": "q43",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 6387,
- "outputTokens": 583,
- "latencyMs": 6690.373416000002
- },
- {
- "questionId": "q43",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 7865,
- "outputTokens": 5,
- "latencyMs": 1187.1296250000014
- },
- {
- "questionId": "q43",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 2524,
- "outputTokens": 519,
- "latencyMs": 5081.884875000003
- },
- {
- "questionId": "q43",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 2977,
- "outputTokens": 5,
- "latencyMs": 1576.2339999999967
- },
- {
- "questionId": "q43",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 2378,
"outputTokens": 1031,
- "latencyMs": 9927.5775
+ "latencyMs": 10859.450207999995
},
{
- "questionId": "q43",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 2851,
- "outputTokens": 5,
- "latencyMs": 1169.6451669999951
- },
- {
- "questionId": "q43",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 6313,
- "outputTokens": 519,
- "latencyMs": 6772.954291999995
- },
- {
- "questionId": "q43",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "15",
- "isCorrect": false,
- "inputTokens": 6360,
- "outputTokens": 5,
- "latencyMs": 1905.9189590000024
- },
- {
- "questionId": "q43",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 5009,
- "outputTokens": 455,
- "latencyMs": 6827.424666999999
- },
- {
- "questionId": "q43",
+ "questionId": "q41",
"format": "yaml",
"model": "claude-haiku-4-5",
"expected": "17",
@@ -4727,21 +6751,32 @@
"isCorrect": false,
"inputTokens": 5755,
"outputTokens": 5,
- "latencyMs": 2121.3979160000017
+ "latencyMs": 2038.0866250000254
},
{
- "questionId": "q44",
+ "questionId": "q41",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 1642.4759159999958
+ },
+ {
+ "questionId": "q42",
"format": "json",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 6387,
- "outputTokens": 519,
- "latencyMs": 15235.099042000002
+ "outputTokens": 1031,
+ "latencyMs": 11081.197666000022
},
{
- "questionId": "q44",
+ "questionId": "q42",
"format": "json",
"model": "claude-haiku-4-5",
"expected": "17",
@@ -4749,43 +6784,65 @@
"isCorrect": false,
"inputTokens": 7865,
"outputTokens": 5,
- "latencyMs": 1182.0669170000037
+ "latencyMs": 1095.9497919999994
},
{
- "questionId": "q44",
+ "questionId": "q42",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 1309.7017500000075
+ },
+ {
+ "questionId": "q42",
"format": "toon",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 2524,
- "outputTokens": 583,
- "latencyMs": 6872.47600000001
+ "outputTokens": 711,
+ "latencyMs": 9064.612916999991
},
{
- "questionId": "q44",
+ "questionId": "q42",
"format": "toon",
"model": "claude-haiku-4-5",
"expected": "17",
- "actual": "15",
+ "actual": "14",
"isCorrect": false,
"inputTokens": 2977,
"outputTokens": 5,
- "latencyMs": 931.0203749999928
+ "latencyMs": 1045.4045000000042
},
{
- "questionId": "q44",
+ "questionId": "q42",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 2056.116624999995
+ },
+ {
+ "questionId": "q42",
"format": "csv",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
"inputTokens": 2378,
- "outputTokens": 2311,
- "latencyMs": 17952.683875000002
+ "outputTokens": 967,
+ "latencyMs": 8423.070084000006
},
{
- "questionId": "q44",
+ "questionId": "q42",
"format": "csv",
"model": "claude-haiku-4-5",
"expected": "17",
@@ -4793,40 +6850,392 @@
"isCorrect": false,
"inputTokens": 2851,
"outputTokens": 5,
- "latencyMs": 1167.8899999999994
+ "latencyMs": 901.4683749999967
},
{
- "questionId": "q44",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
+ "questionId": "q42",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
"expected": "17",
- "actual": "17",
- "isCorrect": true,
- "inputTokens": 6313,
- "outputTokens": 455,
- "latencyMs": 6896.831916999989
- },
- {
- "questionId": "q44",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "17",
- "actual": "10",
+ "actual": "14",
"isCorrect": false,
- "inputTokens": 6360,
- "outputTokens": 5,
- "latencyMs": 1401.859083000003
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 2192.902625000017
},
{
- "questionId": "q44",
- "format": "yaml",
+ "questionId": "q42",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "17",
"actual": "17",
"isCorrect": true,
- "inputTokens": 5009,
+ "inputTokens": 7354,
"outputTokens": 647,
- "latencyMs": 5266.956917000003
+ "latencyMs": 9821.846875000017
+ },
+ {
+ "questionId": "q42",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 9355,
+ "outputTokens": 5,
+ "latencyMs": 1586.0259169999918
+ },
+ {
+ "questionId": "q42",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 9515.369042000006
+ },
+ {
+ "questionId": "q42",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 5009,
+ "outputTokens": 455,
+ "latencyMs": 5076.419125000015
+ },
+ {
+ "questionId": "q42",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 5755,
+ "outputTokens": 5,
+ "latencyMs": 1472.8408340000024
+ },
+ {
+ "questionId": "q42",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 865.6228749999718
+ },
+ {
+ "questionId": "q43",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 6387,
+ "outputTokens": 775,
+ "latencyMs": 8729.67633300001
+ },
+ {
+ "questionId": "q43",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 7865,
+ "outputTokens": 5,
+ "latencyMs": 1217.0473749999946
+ },
+ {
+ "questionId": "q43",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 1158.2075419999892
+ },
+ {
+ "questionId": "q43",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 2524,
+ "outputTokens": 775,
+ "latencyMs": 6998.693750000006
+ },
+ {
+ "questionId": "q43",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 2977,
+ "outputTokens": 5,
+ "latencyMs": 1640.0182080000232
+ },
+ {
+ "questionId": "q43",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "14",
+ "isCorrect": false,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 947.1101670000062
+ },
+ {
+ "questionId": "q43",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 2378,
+ "outputTokens": 583,
+ "latencyMs": 13248.978291000007
+ },
+ {
+ "questionId": "q43",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 2851,
+ "outputTokens": 5,
+ "latencyMs": 836.4533340000198
+ },
+ {
+ "questionId": "q43",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 818.1433329999854
+ },
+ {
+ "questionId": "q43",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 7354,
+ "outputTokens": 1095,
+ "latencyMs": 9890.235916000005
+ },
+ {
+ "questionId": "q43",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 9355,
+ "outputTokens": 5,
+ "latencyMs": 1320.4134170000034
+ },
+ {
+ "questionId": "q43",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 4225.577166000003
+ },
+ {
+ "questionId": "q43",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 5009,
+ "outputTokens": 1031,
+ "latencyMs": 13344.171333000006
+ },
+ {
+ "questionId": "q43",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 5755,
+ "outputTokens": 5,
+ "latencyMs": 863.8359160000109
+ },
+ {
+ "questionId": "q43",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 1194.4381250000151
+ },
+ {
+ "questionId": "q44",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 6387,
+ "outputTokens": 455,
+ "latencyMs": 5239.934833000007
+ },
+ {
+ "questionId": "q44",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 7865,
+ "outputTokens": 5,
+ "latencyMs": 1124.6063330000034
+ },
+ {
+ "questionId": "q44",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "14",
+ "isCorrect": false,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 1525.701040999993
+ },
+ {
+ "questionId": "q44",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 2524,
+ "outputTokens": 519,
+ "latencyMs": 6195.039833999996
+ },
+ {
+ "questionId": "q44",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 2977,
+ "outputTokens": 5,
+ "latencyMs": 891.0962500000023
+ },
+ {
+ "questionId": "q44",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 1322.2949580000131
+ },
+ {
+ "questionId": "q44",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 2378,
+ "outputTokens": 1543,
+ "latencyMs": 16353.942624999996
+ },
+ {
+ "questionId": "q44",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 2851,
+ "outputTokens": 5,
+ "latencyMs": 861.9590829999943
+ },
+ {
+ "questionId": "q44",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 912.1500829999859
+ },
+ {
+ "questionId": "q44",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 7354,
+ "outputTokens": 519,
+ "latencyMs": 6838.317749999987
+ },
+ {
+ "questionId": "q44",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "17",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 9355,
+ "outputTokens": 5,
+ "latencyMs": 1875.6236249999783
+ },
+ {
+ "questionId": "q44",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 1482.7477500000095
+ },
+ {
+ "questionId": "q44",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 5009,
+ "outputTokens": 1223,
+ "latencyMs": 13887.709959
},
{
"questionId": "q44",
@@ -4837,7 +7246,18 @@
"isCorrect": false,
"inputTokens": 5755,
"outputTokens": 5,
- "latencyMs": 1100.9057919999905
+ "latencyMs": 1135.573457999999
+ },
+ {
+ "questionId": "q44",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "17",
+ "actual": "17",
+ "isCorrect": true,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 1063.958209000004
},
{
"questionId": "q45",
@@ -4847,8 +7267,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 6387,
- "outputTokens": 1095,
- "latencyMs": 15621.264291999993
+ "outputTokens": 903,
+ "latencyMs": 11372.731792000006
},
{
"questionId": "q45",
@@ -4859,7 +7279,18 @@
"isCorrect": false,
"inputTokens": 7865,
"outputTokens": 5,
- "latencyMs": 1063.5868750000081
+ "latencyMs": 1085.2727500000037
+ },
+ {
+ "questionId": "q45",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "14",
+ "isCorrect": false,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 788.761582999985
},
{
"questionId": "q45",
@@ -4869,8 +7300,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 2524,
- "outputTokens": 455,
- "latencyMs": 5703.061916000006
+ "outputTokens": 775,
+ "latencyMs": 9670.953584000003
},
{
"questionId": "q45",
@@ -4881,7 +7312,18 @@
"isCorrect": false,
"inputTokens": 2977,
"outputTokens": 5,
- "latencyMs": 1113.9432499999966
+ "latencyMs": 1307.5495419999934
+ },
+ {
+ "questionId": "q45",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "17",
+ "isCorrect": false,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 1034.7324580000131
},
{
"questionId": "q45",
@@ -4891,8 +7333,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 2378,
- "outputTokens": 3015,
- "latencyMs": 22321.357124999995
+ "outputTokens": 647,
+ "latencyMs": 7079.23558399998
},
{
"questionId": "q45",
@@ -4903,29 +7345,51 @@
"isCorrect": false,
"inputTokens": 2851,
"outputTokens": 5,
- "latencyMs": 968.0936249999941
+ "latencyMs": 1123.2897499999963
},
{
"questionId": "q45",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 1318.0012920000008
+ },
+ {
+ "questionId": "q45",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "16",
"actual": "16",
"isCorrect": true,
- "inputTokens": 6313,
- "outputTokens": 1287,
- "latencyMs": 14521.080749999994
+ "inputTokens": 7354,
+ "outputTokens": 583,
+ "latencyMs": 5795.2639590000035
},
{
"questionId": "q45",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "16",
"actual": "12",
"isCorrect": false,
- "inputTokens": 6360,
+ "inputTokens": 9355,
"outputTokens": 5,
- "latencyMs": 1228.1847500000003
+ "latencyMs": 1125.9925829999847
+ },
+ {
+ "questionId": "q45",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "16",
+ "isCorrect": true,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 8305.401042000012
},
{
"questionId": "q45",
@@ -4935,8 +7399,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 5009,
- "outputTokens": 455,
- "latencyMs": 5216.268042000011
+ "outputTokens": 839,
+ "latencyMs": 10189.432124999992
},
{
"questionId": "q45",
@@ -4947,7 +7411,18 @@
"isCorrect": false,
"inputTokens": 5755,
"outputTokens": 5,
- "latencyMs": 1026.5127079999947
+ "latencyMs": 1615.4580000000133
+ },
+ {
+ "questionId": "q45",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "10",
+ "isCorrect": false,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 1533.5138750000042
},
{
"questionId": "q46",
@@ -4957,8 +7432,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 6387,
- "outputTokens": 391,
- "latencyMs": 4335.125541000001
+ "outputTokens": 519,
+ "latencyMs": 7169.378540999984
},
{
"questionId": "q46",
@@ -4969,7 +7444,18 @@
"isCorrect": false,
"inputTokens": 7865,
"outputTokens": 5,
- "latencyMs": 1116.4177909999999
+ "latencyMs": 1133.9953749999986
+ },
+ {
+ "questionId": "q46",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "15",
+ "isCorrect": false,
+ "inputTokens": 7906,
+ "outputTokens": 2,
+ "latencyMs": 1018.8396669999929
},
{
"questionId": "q46",
@@ -4979,8 +7465,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 2524,
- "outputTokens": 583,
- "latencyMs": 4128.823499999999
+ "outputTokens": 647,
+ "latencyMs": 6637.351416999998
},
{
"questionId": "q46",
@@ -4991,7 +7477,18 @@
"isCorrect": false,
"inputTokens": 2977,
"outputTokens": 5,
- "latencyMs": 1105.622457999998
+ "latencyMs": 864.9015839999774
+ },
+ {
+ "questionId": "q46",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "17",
+ "isCorrect": false,
+ "inputTokens": 3315,
+ "outputTokens": 2,
+ "latencyMs": 992.5710419999959
},
{
"questionId": "q46",
@@ -5002,7 +7499,7 @@
"isCorrect": true,
"inputTokens": 2378,
"outputTokens": 839,
- "latencyMs": 6542.58583299999
+ "latencyMs": 7426.826874999999
},
{
"questionId": "q46",
@@ -5013,29 +7510,51 @@
"isCorrect": false,
"inputTokens": 2851,
"outputTokens": 5,
- "latencyMs": 1084.2237909999967
+ "latencyMs": 893.4481660000165
},
{
"questionId": "q46",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 3189,
+ "outputTokens": 2,
+ "latencyMs": 1200.8498329999857
+ },
+ {
+ "questionId": "q46",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "16",
"actual": "16",
"isCorrect": true,
- "inputTokens": 6313,
- "outputTokens": 455,
- "latencyMs": 5050.133375000005
+ "inputTokens": 7354,
+ "outputTokens": 775,
+ "latencyMs": 8865.971332999994
},
{
"questionId": "q46",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "16",
"actual": "10",
"isCorrect": false,
- "inputTokens": 6360,
+ "inputTokens": 9355,
"outputTokens": 5,
- "latencyMs": 1075.023709000001
+ "latencyMs": 1491.2856249999895
+ },
+ {
+ "questionId": "q46",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "17",
+ "isCorrect": false,
+ "inputTokens": 9095,
+ "outputTokens": 2,
+ "latencyMs": 1216.2892920000013
},
{
"questionId": "q46",
@@ -5045,8 +7564,8 @@
"actual": "16",
"isCorrect": true,
"inputTokens": 5009,
- "outputTokens": 711,
- "latencyMs": 9237.985791
+ "outputTokens": 839,
+ "latencyMs": 9403.812124999997
},
{
"questionId": "q46",
@@ -5057,7 +7576,18 @@
"isCorrect": false,
"inputTokens": 5755,
"outputTokens": 5,
- "latencyMs": 1346.3510000000097
+ "latencyMs": 1126.5797500000044
+ },
+ {
+ "questionId": "q46",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "16",
+ "actual": "17",
+ "isCorrect": false,
+ "inputTokens": 5741,
+ "outputTokens": 2,
+ "latencyMs": 1671.0382089999912
},
{
"questionId": "q47",
@@ -5067,8 +7597,8 @@
"actual": "91",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 2375,
- "latencyMs": 27655.89520900001
+ "outputTokens": 1671,
+ "latencyMs": 15363.507083999983
},
{
"questionId": "q47",
@@ -5079,7 +7609,18 @@
"isCorrect": false,
"inputTokens": 7870,
"outputTokens": 5,
- "latencyMs": 1315.7111659999937
+ "latencyMs": 1189.3042910000077
+ },
+ {
+ "questionId": "q47",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "91",
+ "actual": "90",
+ "isCorrect": false,
+ "inputTokens": 7914,
+ "outputTokens": 2,
+ "latencyMs": 1651.3950829999812
},
{
"questionId": "q47",
@@ -5089,8 +7630,8 @@
"actual": "91",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 2695,
- "latencyMs": 26482.504707999993
+ "outputTokens": 2311,
+ "latencyMs": 21706.56012499999
},
{
"questionId": "q47",
@@ -5101,7 +7642,18 @@
"isCorrect": false,
"inputTokens": 2982,
"outputTokens": 5,
- "latencyMs": 1368.221916999988
+ "latencyMs": 1338.67408300002
+ },
+ {
+ "questionId": "q47",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "91",
+ "actual": "91",
+ "isCorrect": true,
+ "inputTokens": 3323,
+ "outputTokens": 2,
+ "latencyMs": 12844.911791999999
},
{
"questionId": "q47",
@@ -5111,8 +7663,8 @@
"actual": "91",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 1671,
- "latencyMs": 18249.434333000012
+ "outputTokens": 2823,
+ "latencyMs": 16151.116582999995
},
{
"questionId": "q47",
@@ -5123,29 +7675,51 @@
"isCorrect": false,
"inputTokens": 2856,
"outputTokens": 5,
- "latencyMs": 1051.9521660000028
+ "latencyMs": 3041.4831669999985
},
{
"questionId": "q47",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "91",
+ "actual": "91",
+ "isCorrect": true,
+ "inputTokens": 3197,
+ "outputTokens": 2,
+ "latencyMs": 12006.398833000014
+ },
+ {
+ "questionId": "q47",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "91",
"actual": "91",
"isCorrect": true,
- "inputTokens": 6318,
- "outputTokens": 1799,
- "latencyMs": 15867.284083999999
+ "inputTokens": 7359,
+ "outputTokens": 2695,
+ "latencyMs": 26044.306083000003
},
{
"questionId": "q47",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "91",
"actual": "89",
"isCorrect": false,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 5,
- "latencyMs": 1831.3835839999956
+ "latencyMs": 1573.8229160000046
+ },
+ {
+ "questionId": "q47",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "91",
+ "actual": "91",
+ "isCorrect": true,
+ "inputTokens": 9103,
+ "outputTokens": 2,
+ "latencyMs": 27838.932499999995
},
{
"questionId": "q47",
@@ -5155,8 +7729,8 @@
"actual": "91",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 2247,
- "latencyMs": 19254.821666999997
+ "outputTokens": 2823,
+ "latencyMs": 22628.083542000008
},
{
"questionId": "q47",
@@ -5167,7 +7741,18 @@
"isCorrect": false,
"inputTokens": 5760,
"outputTokens": 5,
- "latencyMs": 1762.2908329999918
+ "latencyMs": 1787.638666999992
+ },
+ {
+ "questionId": "q47",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "91",
+ "actual": "90",
+ "isCorrect": false,
+ "inputTokens": 5749,
+ "outputTokens": 2,
+ "latencyMs": 1343.8462499999732
},
{
"questionId": "q48",
@@ -5178,7 +7763,7 @@
"isCorrect": true,
"inputTokens": 6392,
"outputTokens": 1479,
- "latencyMs": 13444.104542000001
+ "latencyMs": 14420.83845900002
},
{
"questionId": "q48",
@@ -5189,7 +7774,18 @@
"isCorrect": false,
"inputTokens": 7870,
"outputTokens": 5,
- "latencyMs": 1182.2523340000043
+ "latencyMs": 1271.2462919999962
+ },
+ {
+ "questionId": "q48",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "67",
+ "actual": "70",
+ "isCorrect": false,
+ "inputTokens": 7914,
+ "outputTokens": 2,
+ "latencyMs": 1108.4178750000137
},
{
"questionId": "q48",
@@ -5199,8 +7795,8 @@
"actual": "67",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 2183,
- "latencyMs": 19257.86050000001
+ "outputTokens": 2247,
+ "latencyMs": 18434.695834000013
},
{
"questionId": "q48",
@@ -5211,7 +7807,18 @@
"isCorrect": false,
"inputTokens": 2982,
"outputTokens": 5,
- "latencyMs": 1081.3142080000107
+ "latencyMs": 1125.2875420000055
+ },
+ {
+ "questionId": "q48",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "67",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 3323,
+ "outputTokens": 2,
+ "latencyMs": 13027.224332999991
},
{
"questionId": "q48",
@@ -5221,8 +7828,8 @@
"actual": "67",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 3463,
- "latencyMs": 21384.707542000004
+ "outputTokens": 2503,
+ "latencyMs": 23294.861958000023
},
{
"questionId": "q48",
@@ -5233,40 +7840,62 @@
"isCorrect": false,
"inputTokens": 2856,
"outputTokens": 5,
- "latencyMs": 1051.6647080000112
+ "latencyMs": 1208.8763340000005
},
{
"questionId": "q48",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "67",
+ "actual": "67",
+ "isCorrect": true,
+ "inputTokens": 3197,
+ "outputTokens": 2,
+ "latencyMs": 11604.352749999991
+ },
+ {
+ "questionId": "q48",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "67",
"actual": "67",
"isCorrect": true,
- "inputTokens": 6318,
- "outputTokens": 2439,
- "latencyMs": 19519.416207999995
+ "inputTokens": 7359,
+ "outputTokens": 1479,
+ "latencyMs": 18504.804959
},
{
"questionId": "q48",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "67",
- "actual": "47",
+ "actual": "57",
"isCorrect": false,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 5,
- "latencyMs": 1060.1008749999892
+ "latencyMs": 1127.928917000012
+ },
+ {
+ "questionId": "q48",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "67",
+ "actual": "67",
+ "isCorrect": true,
+ "inputTokens": 9103,
+ "outputTokens": 2,
+ "latencyMs": 22629.69987500002
},
{
"questionId": "q48",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "67",
- "actual": "66",
- "isCorrect": false,
+ "actual": "67",
+ "isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 1991,
- "latencyMs": 15234.403459000008
+ "outputTokens": 2631,
+ "latencyMs": 93677.45470900001
},
{
"questionId": "q48",
@@ -5277,7 +7906,18 @@
"isCorrect": false,
"inputTokens": 5760,
"outputTokens": 5,
- "latencyMs": 1208.8559589999932
+ "latencyMs": 1083.3742910000146
+ },
+ {
+ "questionId": "q48",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "67",
+ "actual": "70",
+ "isCorrect": false,
+ "inputTokens": 5749,
+ "outputTokens": 2,
+ "latencyMs": 1435.5812079999887
},
{
"questionId": "q49",
@@ -5287,8 +7927,8 @@
"actual": "41",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 1415,
- "latencyMs": 14119.885540999996
+ "outputTokens": 1543,
+ "latencyMs": 14267.44858299999
},
{
"questionId": "q49",
@@ -5299,7 +7939,18 @@
"isCorrect": false,
"inputTokens": 7870,
"outputTokens": 5,
- "latencyMs": 1428.8373750000028
+ "latencyMs": 1483.0176250000077
+ },
+ {
+ "questionId": "q49",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "40",
+ "isCorrect": false,
+ "inputTokens": 7915,
+ "outputTokens": 2,
+ "latencyMs": 1598.6212089999754
},
{
"questionId": "q49",
@@ -5309,8 +7960,8 @@
"actual": "41",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 1607,
- "latencyMs": 13997.297709000006
+ "outputTokens": 1671,
+ "latencyMs": 15241.04254200001
},
{
"questionId": "q49",
@@ -5321,7 +7972,18 @@
"isCorrect": false,
"inputTokens": 2982,
"outputTokens": 5,
- "latencyMs": 1270.4412920000032
+ "latencyMs": 1011.390458000009
+ },
+ {
+ "questionId": "q49",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 3324,
+ "outputTokens": 2,
+ "latencyMs": 17035.035957999993
},
{
"questionId": "q49",
@@ -5331,8 +7993,8 @@
"actual": "41",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 1415,
- "latencyMs": 13861.177167000002
+ "outputTokens": 1799,
+ "latencyMs": 15270.303583
},
{
"questionId": "q49",
@@ -5343,29 +8005,51 @@
"isCorrect": false,
"inputTokens": 2856,
"outputTokens": 5,
- "latencyMs": 916.5238340000069
+ "latencyMs": 919.8500000000058
},
{
"questionId": "q49",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 3198,
+ "outputTokens": 2,
+ "latencyMs": 9191.171333000006
+ },
+ {
+ "questionId": "q49",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "41",
"actual": "42",
"isCorrect": false,
- "inputTokens": 6318,
- "outputTokens": 1799,
- "latencyMs": 16007.06925
+ "inputTokens": 7359,
+ "outputTokens": 1479,
+ "latencyMs": 14804.62512500002
},
{
"questionId": "q49",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "41",
- "actual": "27",
+ "actual": "31",
"isCorrect": false,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 5,
- "latencyMs": 1426.0594579999888
+ "latencyMs": 1236.6115409999911
+ },
+ {
+ "questionId": "q49",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 9104,
+ "outputTokens": 2,
+ "latencyMs": 19284.10699999999
},
{
"questionId": "q49",
@@ -5375,8 +8059,8 @@
"actual": "41",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 2055,
- "latencyMs": 22966.680624999994
+ "outputTokens": 1863,
+ "latencyMs": 17259.288042
},
{
"questionId": "q49",
@@ -5387,7 +8071,18 @@
"isCorrect": false,
"inputTokens": 5760,
"outputTokens": 5,
- "latencyMs": 1044.6609999999928
+ "latencyMs": 1715.9734999999928
+ },
+ {
+ "questionId": "q49",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "44",
+ "isCorrect": false,
+ "inputTokens": 5750,
+ "outputTokens": 2,
+ "latencyMs": 1872.7845830000006
},
{
"questionId": "q50",
@@ -5397,8 +8092,8 @@
"actual": "26",
"isCorrect": true,
"inputTokens": 6392,
- "outputTokens": 1159,
- "latencyMs": 10799.117333000002
+ "outputTokens": 1543,
+ "latencyMs": 15919.779666999995
},
{
"questionId": "q50",
@@ -5409,7 +8104,18 @@
"isCorrect": false,
"inputTokens": 7870,
"outputTokens": 5,
- "latencyMs": 1359.5568330000096
+ "latencyMs": 1291.8912500000151
+ },
+ {
+ "questionId": "q50",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "26",
+ "actual": "24",
+ "isCorrect": false,
+ "inputTokens": 7915,
+ "outputTokens": 2,
+ "latencyMs": 1005.6952080000192
},
{
"questionId": "q50",
@@ -5419,8 +8125,8 @@
"actual": "26",
"isCorrect": true,
"inputTokens": 2529,
- "outputTokens": 1543,
- "latencyMs": 13702.052542000005
+ "outputTokens": 1287,
+ "latencyMs": 30941.076040999993
},
{
"questionId": "q50",
@@ -5431,7 +8137,18 @@
"isCorrect": false,
"inputTokens": 2982,
"outputTokens": 5,
- "latencyMs": 967.0454159999936
+ "latencyMs": 1114.022666999983
+ },
+ {
+ "questionId": "q50",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "26",
+ "actual": "26",
+ "isCorrect": true,
+ "inputTokens": 3324,
+ "outputTokens": 2,
+ "latencyMs": 17484.997459000006
},
{
"questionId": "q50",
@@ -5441,8 +8158,8 @@
"actual": "26",
"isCorrect": true,
"inputTokens": 2383,
- "outputTokens": 1671,
- "latencyMs": 13116.871958000003
+ "outputTokens": 1735,
+ "latencyMs": 16410.497957999993
},
{
"questionId": "q50",
@@ -5453,29 +8170,51 @@
"isCorrect": false,
"inputTokens": 2856,
"outputTokens": 5,
- "latencyMs": 1088.8372910000035
+ "latencyMs": 1096.8193330000213
},
{
"questionId": "q50",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "26",
+ "actual": "26",
+ "isCorrect": true,
+ "inputTokens": 3198,
+ "outputTokens": 2,
+ "latencyMs": 14324.279708000016
+ },
+ {
+ "questionId": "q50",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "26",
"actual": "26",
"isCorrect": true,
- "inputTokens": 6318,
+ "inputTokens": 7359,
"outputTokens": 1543,
- "latencyMs": 14387.148624999987
+ "latencyMs": 15139.200333999994
},
{
"questionId": "q50",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "26",
- "actual": "16",
+ "actual": "21",
"isCorrect": false,
- "inputTokens": 6365,
+ "inputTokens": 9360,
"outputTokens": 5,
- "latencyMs": 1273.9564170000085
+ "latencyMs": 1152.736042000004
+ },
+ {
+ "questionId": "q50",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "26",
+ "actual": "26",
+ "isCorrect": true,
+ "inputTokens": 9104,
+ "outputTokens": 2,
+ "latencyMs": 19624.726874999993
},
{
"questionId": "q50",
@@ -5485,8 +8224,8 @@
"actual": "26",
"isCorrect": true,
"inputTokens": 5014,
- "outputTokens": 1223,
- "latencyMs": 12143.083792000005
+ "outputTokens": 1031,
+ "latencyMs": 7884.299167000019
},
{
"questionId": "q50",
@@ -5497,7 +8236,18 @@
"isCorrect": false,
"inputTokens": 5760,
"outputTokens": 5,
- "latencyMs": 1032.9807079999882
+ "latencyMs": 984.3461250000109
+ },
+ {
+ "questionId": "q50",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "26",
+ "actual": "30",
+ "isCorrect": false,
+ "inputTokens": 5750,
+ "outputTokens": 2,
+ "latencyMs": 1294.497417000006
},
{
"questionId": "q51",
@@ -5507,8 +8257,8 @@
"actual": "78",
"isCorrect": true,
"inputTokens": 6386,
- "outputTokens": 2631,
- "latencyMs": 23077.678417000003
+ "outputTokens": 2695,
+ "latencyMs": 25757.74325
},
{
"questionId": "q51",
@@ -5519,7 +8269,18 @@
"isCorrect": false,
"inputTokens": 7864,
"outputTokens": 5,
- "latencyMs": 1281.171417000005
+ "latencyMs": 1330.1275409999944
+ },
+ {
+ "questionId": "q51",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "78",
+ "actual": "78",
+ "isCorrect": true,
+ "inputTokens": 7905,
+ "outputTokens": 2,
+ "latencyMs": 11349.042874999985
},
{
"questionId": "q51",
@@ -5529,8 +8290,8 @@
"actual": "78",
"isCorrect": true,
"inputTokens": 2523,
- "outputTokens": 2759,
- "latencyMs": 20331.962667
+ "outputTokens": 2119,
+ "latencyMs": 31391.252624999994
},
{
"questionId": "q51",
@@ -5541,18 +8302,29 @@
"isCorrect": true,
"inputTokens": 2976,
"outputTokens": 5,
- "latencyMs": 1014.3847079999978
+ "latencyMs": 1051.2665419999976
+ },
+ {
+ "questionId": "q51",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "78",
+ "actual": "78",
+ "isCorrect": true,
+ "inputTokens": 3314,
+ "outputTokens": 2,
+ "latencyMs": 9630.083915999974
},
{
"questionId": "q51",
"format": "csv",
"model": "gpt-5-nano",
"expected": "78",
- "actual": "81",
+ "actual": "84",
"isCorrect": false,
"inputTokens": 2377,
- "outputTokens": 3335,
- "latencyMs": 18037.630208000002
+ "outputTokens": 1863,
+ "latencyMs": 15133.794208000007
},
{
"questionId": "q51",
@@ -5563,29 +8335,51 @@
"isCorrect": false,
"inputTokens": 2850,
"outputTokens": 5,
- "latencyMs": 918.3078749999986
+ "latencyMs": 952.5605000000214
},
{
"questionId": "q51",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "78",
+ "actual": "78",
+ "isCorrect": true,
+ "inputTokens": 3188,
+ "outputTokens": 2,
+ "latencyMs": 11450.481040999992
+ },
+ {
+ "questionId": "q51",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "78",
"actual": "78",
"isCorrect": true,
- "inputTokens": 6312,
- "outputTokens": 1991,
- "latencyMs": 15660.232958000008
+ "inputTokens": 7353,
+ "outputTokens": 903,
+ "latencyMs": 32111.97775000002
},
{
"questionId": "q51",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "78",
+ "actual": "77",
+ "isCorrect": false,
+ "inputTokens": 9354,
+ "outputTokens": 5,
+ "latencyMs": 2015.6932080000115
+ },
+ {
+ "questionId": "q51",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "78",
"actual": "78",
"isCorrect": true,
- "inputTokens": 6359,
- "outputTokens": 5,
- "latencyMs": 1033.7647080000024
+ "inputTokens": 9094,
+ "outputTokens": 2,
+ "latencyMs": 11316.587916999997
},
{
"questionId": "q51",
@@ -5595,8 +8389,8 @@
"actual": "78",
"isCorrect": true,
"inputTokens": 5008,
- "outputTokens": 4295,
- "latencyMs": 26817.97
+ "outputTokens": 1607,
+ "latencyMs": 17228.22670900001
},
{
"questionId": "q51",
@@ -5607,18 +8401,29 @@
"isCorrect": false,
"inputTokens": 5754,
"outputTokens": 5,
- "latencyMs": 1348.084750000009
+ "latencyMs": 1434.8912919999857
+ },
+ {
+ "questionId": "q51",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "78",
+ "actual": "78",
+ "isCorrect": true,
+ "inputTokens": 5740,
+ "outputTokens": 2,
+ "latencyMs": 15144.007791000011
},
{
"questionId": "q52",
"format": "json",
"model": "gpt-5-nano",
"expected": "22",
- "actual": "22",
- "isCorrect": true,
+ "actual": "21",
+ "isCorrect": false,
"inputTokens": 6386,
- "outputTokens": 1223,
- "latencyMs": 10273.866540999996
+ "outputTokens": 839,
+ "latencyMs": 8969.827833999996
},
{
"questionId": "q52",
@@ -5629,7 +8434,18 @@
"isCorrect": false,
"inputTokens": 7864,
"outputTokens": 5,
- "latencyMs": 1081.604707999999
+ "latencyMs": 1038.1520420000015
+ },
+ {
+ "questionId": "q52",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "22",
+ "actual": "22",
+ "isCorrect": true,
+ "inputTokens": 7905,
+ "outputTokens": 2,
+ "latencyMs": 8416.65183399999
},
{
"questionId": "q52",
@@ -5639,8 +8455,8 @@
"actual": "22",
"isCorrect": true,
"inputTokens": 2523,
- "outputTokens": 903,
- "latencyMs": 13862.020499999999
+ "outputTokens": 967,
+ "latencyMs": 9633.799374999973
},
{
"questionId": "q52",
@@ -5651,18 +8467,29 @@
"isCorrect": false,
"inputTokens": 2976,
"outputTokens": 5,
- "latencyMs": 965.817916
+ "latencyMs": 1134.1007079999836
+ },
+ {
+ "questionId": "q52",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "22",
+ "actual": "22",
+ "isCorrect": true,
+ "inputTokens": 3314,
+ "outputTokens": 2,
+ "latencyMs": 11542.581249999988
},
{
"questionId": "q52",
"format": "csv",
"model": "gpt-5-nano",
"expected": "22",
- "actual": "21",
+ "actual": "24",
"isCorrect": false,
"inputTokens": 2377,
- "outputTokens": 2631,
- "latencyMs": 24254.82570799999
+ "outputTokens": 2695,
+ "latencyMs": 41106.853249999986
},
{
"questionId": "q52",
@@ -5673,29 +8500,51 @@
"isCorrect": false,
"inputTokens": 2850,
"outputTokens": 5,
- "latencyMs": 998.7978339999972
+ "latencyMs": 918.981958999997
},
{
"questionId": "q52",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "22",
+ "actual": "22",
+ "isCorrect": true,
+ "inputTokens": 3188,
+ "outputTokens": 2,
+ "latencyMs": 2052.5287920000264
+ },
+ {
+ "questionId": "q52",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "22",
"actual": "22",
"isCorrect": true,
- "inputTokens": 6312,
- "outputTokens": 1095,
- "latencyMs": 10401.351500000004
+ "inputTokens": 7353,
+ "outputTokens": 839,
+ "latencyMs": 8334.775790999993
},
{
"questionId": "q52",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "22",
"actual": "15",
"isCorrect": false,
- "inputTokens": 6359,
+ "inputTokens": 9354,
"outputTokens": 5,
- "latencyMs": 1479.388791999998
+ "latencyMs": 949.7613340000098
+ },
+ {
+ "questionId": "q52",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "22",
+ "actual": "22",
+ "isCorrect": true,
+ "inputTokens": 9094,
+ "outputTokens": 2,
+ "latencyMs": 10658.192250000022
},
{
"questionId": "q52",
@@ -5705,8 +8554,8 @@
"actual": "22",
"isCorrect": true,
"inputTokens": 5008,
- "outputTokens": 839,
- "latencyMs": 8160.454833999989
+ "outputTokens": 1991,
+ "latencyMs": 14355.515540999972
},
{
"questionId": "q52",
@@ -5717,7 +8566,18 @@
"isCorrect": false,
"inputTokens": 5754,
"outputTokens": 5,
- "latencyMs": 1763.230291999993
+ "latencyMs": 1039.7822079999896
+ },
+ {
+ "questionId": "q52",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "22",
+ "actual": "22",
+ "isCorrect": true,
+ "inputTokens": 5740,
+ "outputTokens": 2,
+ "latencyMs": 12535.245041999995
},
{
"questionId": "q53",
@@ -5727,8 +8587,8 @@
"actual": "12",
"isCorrect": true,
"inputTokens": 6394,
- "outputTokens": 1671,
- "latencyMs": 14807.253333
+ "outputTokens": 1223,
+ "latencyMs": 11632.450709000026
},
{
"questionId": "q53",
@@ -5739,7 +8599,18 @@
"isCorrect": false,
"inputTokens": 7872,
"outputTokens": 5,
- "latencyMs": 1185.018333
+ "latencyMs": 1179.524166999996
+ },
+ {
+ "questionId": "q53",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 7916,
+ "outputTokens": 2,
+ "latencyMs": 4426.7412919999915
},
{
"questionId": "q53",
@@ -5749,143 +8620,374 @@
"actual": "12",
"isCorrect": true,
"inputTokens": 2531,
+ "outputTokens": 1799,
+ "latencyMs": 21729.542084000015
+ },
+ {
+ "questionId": "q53",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "12",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 2984,
+ "outputTokens": 5,
+ "latencyMs": 3320.943874999997
+ },
+ {
+ "questionId": "q53",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 3325,
+ "outputTokens": 2,
+ "latencyMs": 5572.28795800003
+ },
+ {
+ "questionId": "q53",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 2385,
+ "outputTokens": 1479,
+ "latencyMs": 23517.660458
+ },
+ {
+ "questionId": "q53",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "12",
+ "actual": "10",
+ "isCorrect": false,
+ "inputTokens": 2858,
+ "outputTokens": 5,
+ "latencyMs": 1028.1668340000033
+ },
+ {
+ "questionId": "q53",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 3199,
+ "outputTokens": 2,
+ "latencyMs": 21513.301958999975
+ },
+ {
+ "questionId": "q53",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 7361,
+ "outputTokens": 1415,
+ "latencyMs": 25169.729082999984
+ },
+ {
+ "questionId": "q53",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "12",
+ "actual": "11",
+ "isCorrect": false,
+ "inputTokens": 9362,
+ "outputTokens": 5,
+ "latencyMs": 1306.0004590000026
+ },
+ {
+ "questionId": "q53",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 9105,
+ "outputTokens": 2,
+ "latencyMs": 22791.16737499999
+ },
+ {
+ "questionId": "q53",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 5016,
+ "outputTokens": 1415,
+ "latencyMs": 18191.111124999996
+ },
+ {
+ "questionId": "q53",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "12",
+ "actual": "10",
+ "isCorrect": false,
+ "inputTokens": 5762,
+ "outputTokens": 5,
+ "latencyMs": 927.1151660000323
+ },
+ {
+ "questionId": "q53",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "13",
+ "isCorrect": false,
+ "inputTokens": 5751,
+ "outputTokens": 2,
+ "latencyMs": 5849.65625
+ },
+ {
+ "questionId": "q54",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 6394,
+ "outputTokens": 1543,
+ "latencyMs": 17624.57283399999
+ },
+ {
+ "questionId": "q54",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "7",
+ "isCorrect": false,
+ "inputTokens": 7872,
+ "outputTokens": 5,
+ "latencyMs": 1445.3690829999978
+ },
+ {
+ "questionId": "q54",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 7916,
+ "outputTokens": 2,
+ "latencyMs": 4641.89829099999
+ },
+ {
+ "questionId": "q54",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2531,
+ "outputTokens": 1095,
+ "latencyMs": 16408.578749999986
+ },
+ {
+ "questionId": "q54",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "6",
+ "isCorrect": false,
+ "inputTokens": 2984,
+ "outputTokens": 5,
+ "latencyMs": 1336.712916999997
+ },
+ {
+ "questionId": "q54",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 3325,
+ "outputTokens": 2,
+ "latencyMs": 5775.600584
+ },
+ {
+ "questionId": "q54",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2385,
+ "outputTokens": 1479,
+ "latencyMs": 15717.845583999995
+ },
+ {
+ "questionId": "q54",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 2858,
+ "outputTokens": 5,
+ "latencyMs": 2198.0668749999604
+ },
+ {
+ "questionId": "q54",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 3199,
+ "outputTokens": 2,
+ "latencyMs": 37479.52691700001
+ },
+ {
+ "questionId": "q54",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 7361,
+ "outputTokens": 1095,
+ "latencyMs": 10663.58587499999
+ },
+ {
+ "questionId": "q54",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 9362,
+ "outputTokens": 5,
+ "latencyMs": 1077.469374999986
+ },
+ {
+ "questionId": "q54",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 9105,
+ "outputTokens": 2,
+ "latencyMs": 16569.429416999978
+ },
+ {
+ "questionId": "q54",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 5016,
+ "outputTokens": 1415,
+ "latencyMs": 15212.04125000001
+ },
+ {
+ "questionId": "q54",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 5762,
+ "outputTokens": 5,
+ "latencyMs": 935.8371249999618
+ },
+ {
+ "questionId": "q54",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "10",
+ "isCorrect": false,
+ "inputTokens": 5751,
+ "outputTokens": 2,
+ "latencyMs": 5121.037708000047
+ },
+ {
+ "questionId": "q55",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 6394,
+ "outputTokens": 1095,
+ "latencyMs": 34446.65704199998
+ },
+ {
+ "questionId": "q55",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 7872,
+ "outputTokens": 5,
+ "latencyMs": 2282.8374170000316
+ },
+ {
+ "questionId": "q55",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 7916,
+ "outputTokens": 2,
+ "latencyMs": 5432.8123749999795
+ },
+ {
+ "questionId": "q55",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2531,
+ "outputTokens": 1479,
+ "latencyMs": 42719.131124999956
+ },
+ {
+ "questionId": "q55",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "11",
+ "actual": "7",
+ "isCorrect": false,
+ "inputTokens": 2984,
+ "outputTokens": 5,
+ "latencyMs": 1832.9572909999988
+ },
+ {
+ "questionId": "q55",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 3325,
+ "outputTokens": 2,
+ "latencyMs": 7711.211624999996
+ },
+ {
+ "questionId": "q55",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2385,
"outputTokens": 1607,
- "latencyMs": 13592.477832999997
+ "latencyMs": 57515.48358300002
},
{
- "questionId": "q53",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "12",
- "actual": "9",
- "isCorrect": false,
- "inputTokens": 2984,
- "outputTokens": 5,
- "latencyMs": 947.2789590000029
- },
- {
- "questionId": "q53",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "12",
- "actual": "12",
- "isCorrect": true,
- "inputTokens": 2385,
- "outputTokens": 2759,
- "latencyMs": 22718.536041999992
- },
- {
- "questionId": "q53",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "12",
- "actual": "10",
- "isCorrect": false,
- "inputTokens": 2858,
- "outputTokens": 5,
- "latencyMs": 973.4814580000093
- },
- {
- "questionId": "q53",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "12",
- "actual": "12",
- "isCorrect": true,
- "inputTokens": 6320,
- "outputTokens": 1031,
- "latencyMs": 10025.186000000002
- },
- {
- "questionId": "q53",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "12",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 6367,
- "outputTokens": 5,
- "latencyMs": 1038.4732499999955
- },
- {
- "questionId": "q53",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "12",
- "actual": "12",
- "isCorrect": true,
- "inputTokens": 5016,
- "outputTokens": 903,
- "latencyMs": 12459.619915999996
- },
- {
- "questionId": "q53",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "12",
- "actual": "10",
- "isCorrect": false,
- "inputTokens": 5762,
- "outputTokens": 5,
- "latencyMs": 1448.7940839999937
- },
- {
- "questionId": "q54",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 6394,
- "outputTokens": 1415,
- "latencyMs": 13094.547666999992
- },
- {
- "questionId": "q54",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "7",
- "isCorrect": false,
- "inputTokens": 7872,
- "outputTokens": 5,
- "latencyMs": 1241.7239169999957
- },
- {
- "questionId": "q54",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 2531,
- "outputTokens": 1031,
- "latencyMs": 10610.864084
- },
- {
- "questionId": "q54",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "6",
- "isCorrect": false,
- "inputTokens": 2984,
- "outputTokens": 5,
- "latencyMs": 1100.7670829999988
- },
- {
- "questionId": "q54",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 2385,
- "outputTokens": 1095,
- "latencyMs": 11523.293417000008
- },
- {
- "questionId": "q54",
+ "questionId": "q55",
"format": "csv",
"model": "claude-haiku-4-5",
"expected": "11",
@@ -5893,150 +8995,62 @@
"isCorrect": false,
"inputTokens": 2858,
"outputTokens": 5,
- "latencyMs": 980.1522499999992
- },
- {
- "questionId": "q54",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 6320,
- "outputTokens": 1095,
- "latencyMs": 8184.143375
- },
- {
- "questionId": "q54",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "6",
- "isCorrect": false,
- "inputTokens": 6367,
- "outputTokens": 5,
- "latencyMs": 1175.0723330000037
- },
- {
- "questionId": "q54",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 5016,
- "outputTokens": 1159,
- "latencyMs": 13082.53912500001
- },
- {
- "questionId": "q54",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 5762,
- "outputTokens": 5,
- "latencyMs": 1020.4026659999945
- },
- {
- "questionId": "q55",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 6394,
- "outputTokens": 1223,
- "latencyMs": 13166.679334
- },
- {
- "questionId": "q55",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 7872,
- "outputTokens": 5,
- "latencyMs": 1090.0060839999933
- },
- {
- "questionId": "q55",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "11",
- "actual": "11",
- "isCorrect": true,
- "inputTokens": 2531,
- "outputTokens": 1287,
- "latencyMs": 11181.234958000001
- },
- {
- "questionId": "q55",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "7",
- "isCorrect": false,
- "inputTokens": 2984,
- "outputTokens": 5,
- "latencyMs": 1365.1262080000015
+ "latencyMs": 3238.0369170000195
},
{
"questionId": "q55",
"format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 3199,
+ "outputTokens": 2,
+ "latencyMs": 9271.402125000022
+ },
+ {
+ "questionId": "q55",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "11",
"actual": "11",
"isCorrect": true,
- "inputTokens": 2385,
+ "inputTokens": 7361,
"outputTokens": 967,
- "latencyMs": 9549.427916999994
+ "latencyMs": 12946.014833999972
},
{
"questionId": "q55",
- "format": "csv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "11",
- "actual": "8",
+ "actual": "9",
"isCorrect": false,
- "inputTokens": 2858,
+ "inputTokens": 9362,
"outputTokens": 5,
- "latencyMs": 981.8662500000064
+ "latencyMs": 1523.2371250000433
},
{
"questionId": "q55",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
"expected": "11",
"actual": "11",
"isCorrect": true,
- "inputTokens": 6320,
- "outputTokens": 1223,
- "latencyMs": 11591.030333000002
- },
- {
- "questionId": "q55",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "11",
- "actual": "7",
- "isCorrect": false,
- "inputTokens": 6367,
- "outputTokens": 5,
- "latencyMs": 1430.038750000007
+ "inputTokens": 9105,
+ "outputTokens": 2,
+ "latencyMs": 11301.93191600003
},
{
"questionId": "q55",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "11",
- "actual": "10",
- "isCorrect": false,
+ "actual": "11",
+ "isCorrect": true,
"inputTokens": 5016,
- "outputTokens": 1735,
- "latencyMs": 11458.303500000009
+ "outputTokens": 1351,
+ "latencyMs": 18129.383040999994
},
{
"questionId": "q55",
@@ -6047,7 +9061,18 @@
"isCorrect": false,
"inputTokens": 5762,
"outputTokens": 5,
- "latencyMs": 1103.2402909999946
+ "latencyMs": 1117.6802920000046
+ },
+ {
+ "questionId": "q55",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 5751,
+ "outputTokens": 2,
+ "latencyMs": 4743.260083000001
},
{
"questionId": "q56",
@@ -6057,8 +9082,8 @@
"actual": "11",
"isCorrect": false,
"inputTokens": 6394,
- "outputTokens": 2631,
- "latencyMs": 16900.63120799999
+ "outputTokens": 1479,
+ "latencyMs": 12632.222667000024
},
{
"questionId": "q56",
@@ -6069,7 +9094,18 @@
"isCorrect": false,
"inputTokens": 7872,
"outputTokens": 5,
- "latencyMs": 1043.442332999999
+ "latencyMs": 1567.1472920000087
+ },
+ {
+ "questionId": "q56",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 7916,
+ "outputTokens": 2,
+ "latencyMs": 5749.258750000037
},
{
"questionId": "q56",
@@ -6079,8 +9115,8 @@
"actual": "12",
"isCorrect": true,
"inputTokens": 2531,
- "outputTokens": 839,
- "latencyMs": 7278.612083
+ "outputTokens": 1479,
+ "latencyMs": 17473.24116700003
},
{
"questionId": "q56",
@@ -6091,7 +9127,18 @@
"isCorrect": false,
"inputTokens": 2984,
"outputTokens": 5,
- "latencyMs": 1705.2114999999903
+ "latencyMs": 922.2049170000246
+ },
+ {
+ "questionId": "q56",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 3325,
+ "outputTokens": 2,
+ "latencyMs": 5561.690833000001
},
{
"questionId": "q56",
@@ -6101,8 +9148,8 @@
"actual": "11",
"isCorrect": false,
"inputTokens": 2385,
- "outputTokens": 1415,
- "latencyMs": 10625.603375000006
+ "outputTokens": 2183,
+ "latencyMs": 23539.67433399998
},
{
"questionId": "q56",
@@ -6113,40 +9160,62 @@
"isCorrect": false,
"inputTokens": 2858,
"outputTokens": 5,
- "latencyMs": 1081.0501670000085
+ "latencyMs": 1159.2557500000112
},
{
"questionId": "q56",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 3199,
+ "outputTokens": 2,
+ "latencyMs": 9863.856417000003
+ },
+ {
+ "questionId": "q56",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "12",
"actual": "12",
"isCorrect": true,
- "inputTokens": 6320,
- "outputTokens": 2055,
- "latencyMs": 17548.71483299999
+ "inputTokens": 7361,
+ "outputTokens": 1927,
+ "latencyMs": 106756.24308399996
},
{
"questionId": "q56",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "12",
- "actual": "7",
+ "actual": "8",
"isCorrect": false,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 5,
- "latencyMs": 2302.2003750000003
+ "latencyMs": 1064.2161659999983
+ },
+ {
+ "questionId": "q56",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 9105,
+ "outputTokens": 2,
+ "latencyMs": 7033.105833999987
},
{
"questionId": "q56",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "12",
- "actual": "11",
- "isCorrect": false,
+ "actual": "12",
+ "isCorrect": true,
"inputTokens": 5016,
- "outputTokens": 1287,
- "latencyMs": 13187.201000000015
+ "outputTokens": 1095,
+ "latencyMs": 14048.506916999992
},
{
"questionId": "q56",
@@ -6157,7 +9226,18 @@
"isCorrect": false,
"inputTokens": 5762,
"outputTokens": 5,
- "latencyMs": 2621.4970829999947
+ "latencyMs": 1192.642125000013
+ },
+ {
+ "questionId": "q56",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "12",
+ "actual": "12",
+ "isCorrect": true,
+ "inputTokens": 5751,
+ "outputTokens": 2,
+ "latencyMs": 5957.613042000041
},
{
"questionId": "q57",
@@ -6167,8 +9247,8 @@
"actual": "62",
"isCorrect": true,
"inputTokens": 6393,
- "outputTokens": 3783,
- "latencyMs": 29393.69395799999
+ "outputTokens": 3719,
+ "latencyMs": 332341.88812499994
},
{
"questionId": "q57",
@@ -6179,7 +9259,18 @@
"isCorrect": true,
"inputTokens": 7872,
"outputTokens": 5,
- "latencyMs": 1402.049291999996
+ "latencyMs": 1168.1113340000156
+ },
+ {
+ "questionId": "q57",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "62",
+ "actual": "62",
+ "isCorrect": true,
+ "inputTokens": 7912,
+ "outputTokens": 2,
+ "latencyMs": 20747.95541699999
},
{
"questionId": "q57",
@@ -6189,8 +9280,8 @@
"actual": "62",
"isCorrect": true,
"inputTokens": 2530,
- "outputTokens": 2823,
- "latencyMs": 23696.75
+ "outputTokens": 3079,
+ "latencyMs": 24893.890125000034
},
{
"questionId": "q57",
@@ -6201,7 +9292,18 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 5,
- "latencyMs": 1064.7778749999998
+ "latencyMs": 1446.5637920000008
+ },
+ {
+ "questionId": "q57",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "62",
+ "actual": "62",
+ "isCorrect": true,
+ "inputTokens": 3321,
+ "outputTokens": 2,
+ "latencyMs": 18187.491625000024
},
{
"questionId": "q57",
@@ -6211,8 +9313,8 @@
"actual": "64",
"isCorrect": false,
"inputTokens": 2384,
- "outputTokens": 3143,
- "latencyMs": 28384.533249999993
+ "outputTokens": 4551,
+ "latencyMs": 61990.75604200002
},
{
"questionId": "q57",
@@ -6223,29 +9325,51 @@
"isCorrect": true,
"inputTokens": 2858,
"outputTokens": 5,
- "latencyMs": 889.2725839999912
+ "latencyMs": 2368.5950840000296
},
{
"questionId": "q57",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "62",
+ "actual": "62",
+ "isCorrect": true,
+ "inputTokens": 3195,
+ "outputTokens": 2,
+ "latencyMs": 19295.422582999978
+ },
+ {
+ "questionId": "q57",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "62",
"actual": "62",
"isCorrect": true,
- "inputTokens": 6319,
- "outputTokens": 6663,
- "latencyMs": 50113.09675
+ "inputTokens": 7360,
+ "outputTokens": 3015,
+ "latencyMs": 27433.851124999986
},
{
"questionId": "q57",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "62",
"actual": "62",
"isCorrect": true,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 5,
- "latencyMs": 1074.8158330000006
+ "latencyMs": 1239.7937919999822
+ },
+ {
+ "questionId": "q57",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "62",
+ "actual": "62",
+ "isCorrect": true,
+ "inputTokens": 9101,
+ "outputTokens": 2,
+ "latencyMs": 21703.45670800004
},
{
"questionId": "q57",
@@ -6255,8 +9379,8 @@
"actual": "62",
"isCorrect": true,
"inputTokens": 5015,
- "outputTokens": 2631,
- "latencyMs": 23841.036083999992
+ "outputTokens": 4615,
+ "latencyMs": 38416.754041999986
},
{
"questionId": "q57",
@@ -6267,7 +9391,18 @@
"isCorrect": true,
"inputTokens": 5762,
"outputTokens": 5,
- "latencyMs": 1010.4629169999971
+ "latencyMs": 974.5636659999727
+ },
+ {
+ "questionId": "q57",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "62",
+ "actual": "62",
+ "isCorrect": true,
+ "inputTokens": 5747,
+ "outputTokens": 2,
+ "latencyMs": 20388.102249999996
},
{
"questionId": "q58",
@@ -6277,8 +9412,8 @@
"actual": "45",
"isCorrect": true,
"inputTokens": 6393,
- "outputTokens": 2247,
- "latencyMs": 18818.030874999997
+ "outputTokens": 2567,
+ "latencyMs": 23536.014041999995
},
{
"questionId": "q58",
@@ -6289,7 +9424,18 @@
"isCorrect": false,
"inputTokens": 7872,
"outputTokens": 5,
- "latencyMs": 1203.152833
+ "latencyMs": 1002.8562090000487
+ },
+ {
+ "questionId": "q58",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "45",
+ "actual": "45",
+ "isCorrect": true,
+ "inputTokens": 7913,
+ "outputTokens": 2,
+ "latencyMs": 35012.274959
},
{
"questionId": "q58",
@@ -6299,8 +9445,8 @@
"actual": "45",
"isCorrect": true,
"inputTokens": 2530,
- "outputTokens": 2631,
- "latencyMs": 21987.539915999994
+ "outputTokens": 3143,
+ "latencyMs": 27182.416041999997
},
{
"questionId": "q58",
@@ -6311,7 +9457,18 @@
"isCorrect": false,
"inputTokens": 2984,
"outputTokens": 5,
- "latencyMs": 1000.0181669999874
+ "latencyMs": 935.4336250000051
+ },
+ {
+ "questionId": "q58",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "45",
+ "actual": "45",
+ "isCorrect": true,
+ "inputTokens": 3322,
+ "outputTokens": 2,
+ "latencyMs": 19937.21420799999
},
{
"questionId": "q58",
@@ -6321,8 +9478,8 @@
"actual": "46",
"isCorrect": false,
"inputTokens": 2384,
- "outputTokens": 3079,
- "latencyMs": 24534.847250000006
+ "outputTokens": 3271,
+ "latencyMs": 26153.538457999995
},
{
"questionId": "q58",
@@ -6333,29 +9490,51 @@
"isCorrect": false,
"inputTokens": 2858,
"outputTokens": 5,
- "latencyMs": 1125.7029999999795
+ "latencyMs": 1029.4126660000184
},
{
"questionId": "q58",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "45",
+ "actual": "45",
+ "isCorrect": true,
+ "inputTokens": 3196,
+ "outputTokens": 2,
+ "latencyMs": 36182.66629199998
+ },
+ {
+ "questionId": "q58",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "45",
"actual": "45",
"isCorrect": true,
- "inputTokens": 6319,
+ "inputTokens": 7360,
"outputTokens": 2823,
- "latencyMs": 27053.90824999998
+ "latencyMs": 27939.341790999984
},
{
"questionId": "q58",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "45",
- "actual": "42",
+ "actual": "47",
"isCorrect": false,
- "inputTokens": 6367,
+ "inputTokens": 9362,
"outputTokens": 5,
- "latencyMs": 1474.1193330000096
+ "latencyMs": 1699.4091669999762
+ },
+ {
+ "questionId": "q58",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "45",
+ "actual": "45",
+ "isCorrect": true,
+ "inputTokens": 9102,
+ "outputTokens": 2,
+ "latencyMs": 20119.059750000015
},
{
"questionId": "q58",
@@ -6365,8 +9544,8 @@
"actual": "45",
"isCorrect": true,
"inputTokens": 5015,
- "outputTokens": 2567,
- "latencyMs": 21642.824207999976
+ "outputTokens": 2631,
+ "latencyMs": 25962.383333999955
},
{
"questionId": "q58",
@@ -6377,7 +9556,18 @@
"isCorrect": false,
"inputTokens": 5762,
"outputTokens": 5,
- "latencyMs": 1170.1535830000066
+ "latencyMs": 1063.877124999999
+ },
+ {
+ "questionId": "q58",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "45",
+ "actual": "45",
+ "isCorrect": true,
+ "inputTokens": 5748,
+ "outputTokens": 2,
+ "latencyMs": 37951.156874999986
},
{
"questionId": "q59",
@@ -6387,30 +9577,41 @@
"actual": "96.17",
"isCorrect": true,
"inputTokens": 9739,
+ "outputTokens": 137,
+ "latencyMs": 2635.883374999976
+ },
+ {
+ "questionId": "q59",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1164.0292079999927
+ },
+ {
+ "questionId": "q59",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 5,
+ "latencyMs": 1510.9628750000265
+ },
+ {
+ "questionId": "q59",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 6013,
"outputTokens": 73,
- "latencyMs": 2340.6126670000085
- },
- {
- "questionId": "q59",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "96.17",
- "actual": "96.17",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1337.4746670000022
- },
- {
- "questionId": "q59",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "96.17",
- "actual": "96.17",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 137,
- "latencyMs": 2275.1715830000176
+ "latencyMs": 3338.3452919999836
},
{
"questionId": "q59",
@@ -6421,7 +9622,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 7,
- "latencyMs": 1086.9557499999937
+ "latencyMs": 1290.2898750000168
+ },
+ {
+ "questionId": "q59",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 5,
+ "latencyMs": 1073.7947919999715
},
{
"questionId": "q59",
@@ -6431,8 +9643,8 @@
"actual": "96.17",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 137,
- "latencyMs": 2881.4037499999977
+ "outputTokens": 201,
+ "latencyMs": 3254.3114590000478
},
{
"questionId": "q59",
@@ -6443,29 +9655,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 7,
- "latencyMs": 1172.774000000005
+ "latencyMs": 1300.0598330000066
},
{
"questionId": "q59",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 5,
+ "latencyMs": 2603.532125000027
+ },
+ {
+ "questionId": "q59",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "96.17",
"actual": "96.17",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 201,
- "latencyMs": 7706.478582999989
+ "inputTokens": 11037,
+ "outputTokens": 137,
+ "latencyMs": 2712.822291999997
},
{
"questionId": "q59",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "96.17",
"actual": "96.17",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 7,
- "latencyMs": 1106.0717920000025
+ "latencyMs": 1369.1374160000123
+ },
+ {
+ "questionId": "q59",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 5,
+ "latencyMs": 1339.450165999995
},
{
"questionId": "q59",
@@ -6476,7 +9710,7 @@
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 137,
- "latencyMs": 6185.161250000005
+ "latencyMs": 2561.059583000024
},
{
"questionId": "q59",
@@ -6487,7 +9721,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 7,
- "latencyMs": 1388.4410000000207
+ "latencyMs": 1122.8535000000265
+ },
+ {
+ "questionId": "q59",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "96.17",
+ "actual": "96.17",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 5,
+ "latencyMs": 1243.387041000009
},
{
"questionId": "q60",
@@ -6497,8 +9742,8 @@
"actual": "shipped",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 136,
- "latencyMs": 6699.9394589999865
+ "outputTokens": 200,
+ "latencyMs": 4276.413916999998
},
{
"questionId": "q60",
@@ -6509,7 +9754,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1152.8117919999931
+ "latencyMs": 1337.8417079999927
+ },
+ {
+ "questionId": "q60",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 2,
+ "latencyMs": 1526.3712500000256
},
{
"questionId": "q60",
@@ -6520,7 +9776,7 @@
"isCorrect": true,
"inputTokens": 6012,
"outputTokens": 136,
- "latencyMs": 2446.019666999986
+ "latencyMs": 2210.3001669999794
},
{
"questionId": "q60",
@@ -6531,7 +9787,18 @@
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 1046.3494580000115
+ "latencyMs": 1227.2460840000422
+ },
+ {
+ "questionId": "q60",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 2,
+ "latencyMs": 1149.5532499999972
},
{
"questionId": "q60",
@@ -6542,7 +9809,7 @@
"isCorrect": true,
"inputTokens": 6780,
"outputTokens": 200,
- "latencyMs": 6084.429165999987
+ "latencyMs": 2463.5065419999883
},
{
"questionId": "q60",
@@ -6553,29 +9820,51 @@
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1787.2428749999963
+ "latencyMs": 1474.229833999998
},
{
"questionId": "q60",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 2,
+ "latencyMs": 3119.7202080000425
+ },
+ {
+ "questionId": "q60",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "shipped",
"actual": "shipped",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 264,
- "latencyMs": 5364.3007919999945
+ "inputTokens": 11036,
+ "outputTokens": 136,
+ "latencyMs": 2996.8577500000247
},
{
"questionId": "q60",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "shipped",
"actual": "shipped",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 13379,
"outputTokens": 4,
- "latencyMs": 1269.2162499999977
+ "latencyMs": 1374.8893749999697
+ },
+ {
+ "questionId": "q60",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 2,
+ "latencyMs": 1361.1552500000107
},
{
"questionId": "q60",
@@ -6585,8 +9874,8 @@
"actual": "shipped",
"isCorrect": true,
"inputTokens": 7372,
- "outputTokens": 72,
- "latencyMs": 2381.514374999999
+ "outputTokens": 136,
+ "latencyMs": 2356.033334000036
},
{
"questionId": "q60",
@@ -6597,7 +9886,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1222.1361669999897
+ "latencyMs": 1128.8600410000072
+ },
+ {
+ "questionId": "q60",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 2,
+ "latencyMs": 1012.1753329999628
},
{
"questionId": "q61",
@@ -6608,7 +9908,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 201,
- "latencyMs": 3641.536167000013
+ "latencyMs": 2894.6042920000036
},
{
"questionId": "q61",
@@ -6619,7 +9919,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 7,
- "latencyMs": 2457.5752079999947
+ "latencyMs": 1140.3883749999804
+ },
+ {
+ "questionId": "q61",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "599.39",
+ "actual": "599.39",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 1286.3832499999553
},
{
"questionId": "q61",
@@ -6630,7 +9941,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 201,
- "latencyMs": 3384.6115839999984
+ "latencyMs": 5983.418707999983
},
{
"questionId": "q61",
@@ -6641,7 +9952,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 7,
- "latencyMs": 1372.8756669999857
+ "latencyMs": 1257.5179999999818
+ },
+ {
+ "questionId": "q61",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "599.39",
+ "actual": "599.39",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 1470.9667500000214
},
{
"questionId": "q61",
@@ -6652,7 +9974,7 @@
"isCorrect": true,
"inputTokens": 6781,
"outputTokens": 265,
- "latencyMs": 5826.962750000006
+ "latencyMs": 3804.386666000006
},
{
"questionId": "q61",
@@ -6663,29 +9985,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 7,
- "latencyMs": 1303.1691670000146
+ "latencyMs": 1181.0549580000225
},
{
"questionId": "q61",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "599.39",
+ "actual": "599.39",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 2825.75008300005
+ },
+ {
+ "questionId": "q61",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "599.39",
"actual": "599.39",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 265,
- "latencyMs": 3602.1091250000172
+ "inputTokens": 11037,
+ "outputTokens": 201,
+ "latencyMs": 4155.127124999999
},
{
"questionId": "q61",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "599.39",
"actual": "599.39",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 7,
- "latencyMs": 1451.1585410000116
+ "latencyMs": 1243.845667000045
+ },
+ {
+ "questionId": "q61",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "599.39",
+ "actual": "599.39",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1183.5630419999943
},
{
"questionId": "q61",
@@ -6696,7 +10040,7 @@
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 137,
- "latencyMs": 2453.183083000011
+ "latencyMs": 3305.4360420000157
},
{
"questionId": "q61",
@@ -6707,7 +10051,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 7,
- "latencyMs": 1152.136541999993
+ "latencyMs": 1122.905792000005
+ },
+ {
+ "questionId": "q61",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "599.39",
+ "actual": "599.39",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1289.1040829999838
},
{
"questionId": "q62",
@@ -6718,7 +10073,7 @@
"isCorrect": true,
"inputTokens": 9738,
"outputTokens": 199,
- "latencyMs": 5025.56916699998
+ "latencyMs": 4459.190540999989
},
{
"questionId": "q62",
@@ -6729,7 +10084,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1111.5014169999922
+ "latencyMs": 1385.2943749999977
+ },
+ {
+ "questionId": "q62",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1281.1537499999977
},
{
"questionId": "q62",
@@ -6739,228 +10105,338 @@
"actual": "processing",
"isCorrect": true,
"inputTokens": 6012,
+ "outputTokens": 135,
+ "latencyMs": 2211.059750000015
+ },
+ {
+ "questionId": "q62",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 6992,
+ "outputTokens": 4,
+ "latencyMs": 1282.652208000014
+ },
+ {
+ "questionId": "q62",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1296.6791250000242
+ },
+ {
+ "questionId": "q62",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 6780,
+ "outputTokens": 135,
+ "latencyMs": 4460.896583999973
+ },
+ {
+ "questionId": "q62",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 8413,
+ "outputTokens": 4,
+ "latencyMs": 1311.2437919999938
+ },
+ {
+ "questionId": "q62",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 2321.0788329999777
+ },
+ {
+ "questionId": "q62",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 11036,
+ "outputTokens": 135,
+ "latencyMs": 2574.011124999961
+ },
+ {
+ "questionId": "q62",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 13379,
+ "outputTokens": 4,
+ "latencyMs": 1331.6849169999477
+ },
+ {
+ "questionId": "q62",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 1876.967500000028
+ },
+ {
+ "questionId": "q62",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 7372,
+ "outputTokens": 71,
+ "latencyMs": 4585.356583999994
+ },
+ {
+ "questionId": "q62",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 8384,
+ "outputTokens": 4,
+ "latencyMs": 1472.130541999999
+ },
+ {
+ "questionId": "q62",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 3066.8415830000304
+ },
+ {
+ "questionId": "q63",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 265,
+ "latencyMs": 4022.9598750000005
+ },
+ {
+ "questionId": "q63",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1480.8643750000047
+ },
+ {
+ "questionId": "q63",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 1615.6131670000032
+ },
+ {
+ "questionId": "q63",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 265,
+ "latencyMs": 3674.1392500000075
+ },
+ {
+ "questionId": "q63",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 7,
+ "latencyMs": 1060.8583750000107
+ },
+ {
+ "questionId": "q63",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 1496.0798749999958
+ },
+ {
+ "questionId": "q63",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 329,
+ "latencyMs": 3936.86050000001
+ },
+ {
+ "questionId": "q63",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1451.5014170000213
+ },
+ {
+ "questionId": "q63",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 3275.3027920000022
+ },
+ {
+ "questionId": "q63",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 521,
+ "latencyMs": 7834.65945799998
+ },
+ {
+ "questionId": "q63",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 7,
+ "latencyMs": 1066.7734170000185
+ },
+ {
+ "questionId": "q63",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1091.2406670000055
+ },
+ {
+ "questionId": "q63",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 265,
+ "latencyMs": 7133.230082999973
+ },
+ {
+ "questionId": "q63",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 7,
+ "latencyMs": 1334.3640829999931
+ },
+ {
+ "questionId": "q63",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "528.71",
+ "actual": "528.71",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1548.7799590000068
+ },
+ {
+ "questionId": "q64",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 9738,
+ "outputTokens": 199,
+ "latencyMs": 3084.847666000016
+ },
+ {
+ "questionId": "q64",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 11906,
+ "outputTokens": 4,
+ "latencyMs": 1400.1154589999933
+ },
+ {
+ "questionId": "q64",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 2145.6674999999814
+ },
+ {
+ "questionId": "q64",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 6012,
"outputTokens": 199,
- "latencyMs": 3548.9061660000007
- },
- {
- "questionId": "q62",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 6992,
- "outputTokens": 4,
- "latencyMs": 1404.0692500000005
- },
- {
- "questionId": "q62",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 6780,
- "outputTokens": 135,
- "latencyMs": 2879.9619169999787
- },
- {
- "questionId": "q62",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 8413,
- "outputTokens": 4,
- "latencyMs": 1258.860249999998
- },
- {
- "questionId": "q62",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 263,
- "latencyMs": 7819.738958000002
- },
- {
- "questionId": "q62",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 9288,
- "outputTokens": 4,
- "latencyMs": 1495.973915999988
- },
- {
- "questionId": "q62",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 7372,
- "outputTokens": 135,
- "latencyMs": 3092.4329169999983
- },
- {
- "questionId": "q62",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "processing",
- "actual": "processing",
- "isCorrect": true,
- "inputTokens": 8384,
- "outputTokens": 4,
- "latencyMs": 1268.1641250000102
- },
- {
- "questionId": "q63",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 265,
- "latencyMs": 4409.96212500002
- },
- {
- "questionId": "q63",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1422.6079999999783
- },
- {
- "questionId": "q63",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 329,
- "latencyMs": 3593.100334000017
- },
- {
- "questionId": "q63",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 7,
- "latencyMs": 1474.3911249999946
- },
- {
- "questionId": "q63",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 265,
- "latencyMs": 5419.795374999987
- },
- {
- "questionId": "q63",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 7,
- "latencyMs": 1059.3489999999874
- },
- {
- "questionId": "q63",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 265,
- "latencyMs": 4783.504167000006
- },
- {
- "questionId": "q63",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1340.6675410000025
- },
- {
- "questionId": "q63",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 329,
- "latencyMs": 4222.140958000004
- },
- {
- "questionId": "q63",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "528.71",
- "actual": "528.71",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 7,
- "latencyMs": 1169.892125000013
- },
- {
- "questionId": "q64",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "pending",
- "actual": "pending",
- "isCorrect": true,
- "inputTokens": 9738,
- "outputTokens": 135,
- "latencyMs": 2854.8382500000007
- },
- {
- "questionId": "q64",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "pending",
- "actual": "pending",
- "isCorrect": true,
- "inputTokens": 11906,
- "outputTokens": 4,
- "latencyMs": 1077.335374999995
- },
- {
- "questionId": "q64",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "pending",
- "actual": "pending",
- "isCorrect": true,
- "inputTokens": 6012,
- "outputTokens": 135,
- "latencyMs": 2525.2092499999853
+ "latencyMs": 2951.514334000007
},
{
"questionId": "q64",
@@ -6971,7 +10447,18 @@
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 2100.2050000000163
+ "latencyMs": 1178.9784170000348
+ },
+ {
+ "questionId": "q64",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1061.4745419999817
},
{
"questionId": "q64",
@@ -6982,7 +10469,7 @@
"isCorrect": true,
"inputTokens": 6780,
"outputTokens": 263,
- "latencyMs": 5882.592499999999
+ "latencyMs": 3550.5126670000027
},
{
"questionId": "q64",
@@ -6993,29 +10480,51 @@
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1168.5295410000253
+ "latencyMs": 1128.6832500000019
},
{
"questionId": "q64",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 2419.836874999979
+ },
+ {
+ "questionId": "q64",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "pending",
"actual": "pending",
"isCorrect": true,
- "inputTokens": 9157,
+ "inputTokens": 11036,
"outputTokens": 263,
- "latencyMs": 3944.433083000011
+ "latencyMs": 18500.49987499998
},
{
"questionId": "q64",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "pending",
"actual": "pending",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 13379,
"outputTokens": 4,
- "latencyMs": 1882.1263749999925
+ "latencyMs": 1697.067417000013
+ },
+ {
+ "questionId": "q64",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 1665.4901669999817
},
{
"questionId": "q64",
@@ -7026,7 +10535,7 @@
"isCorrect": true,
"inputTokens": 7372,
"outputTokens": 135,
- "latencyMs": 1657.7255829999922
+ "latencyMs": 3648.2167090000003
},
{
"questionId": "q64",
@@ -7037,7 +10546,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1056.5719169999938
+ "latencyMs": 1223.7409169999883
+ },
+ {
+ "questionId": "q64",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 2938.2844999999506
},
{
"questionId": "q65",
@@ -7047,8 +10567,8 @@
"actual": "1687.82",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 266,
- "latencyMs": 5764.2531250000175
+ "outputTokens": 202,
+ "latencyMs": 3459.946917000052
},
{
"questionId": "q65",
@@ -7059,7 +10579,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 8,
- "latencyMs": 1241.8239590000012
+ "latencyMs": 1173.402208000014
+ },
+ {
+ "questionId": "q65",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1687.82",
+ "actual": "1687.82",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 7,
+ "latencyMs": 3167.1566250000033
},
{
"questionId": "q65",
@@ -7069,8 +10600,8 @@
"actual": "1687.82",
"isCorrect": true,
"inputTokens": 6013,
- "outputTokens": 266,
- "latencyMs": 3203.148416000011
+ "outputTokens": 202,
+ "latencyMs": 3737.224749999994
},
{
"questionId": "q65",
@@ -7081,7 +10612,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 8,
- "latencyMs": 1395.2265419999894
+ "latencyMs": 926.1720830000122
+ },
+ {
+ "questionId": "q65",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1687.82",
+ "actual": "1687.82",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 7,
+ "latencyMs": 1469.4704999999958
},
{
"questionId": "q65",
@@ -7091,8 +10633,8 @@
"actual": "1687.82",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 330,
- "latencyMs": 3854.1738750000077
+ "outputTokens": 266,
+ "latencyMs": 4014.4818339999765
},
{
"questionId": "q65",
@@ -7103,29 +10645,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 8,
- "latencyMs": 1868.680457999988
+ "latencyMs": 1132.7197079999605
},
{
"questionId": "q65",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1687.82",
+ "actual": "1687.82",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 7,
+ "latencyMs": 3670.1206250000396
+ },
+ {
+ "questionId": "q65",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "1687.82",
"actual": "1687.82",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 330,
- "latencyMs": 4486.571708000003
+ "inputTokens": 11037,
+ "outputTokens": 202,
+ "latencyMs": 4318.927583000041
},
{
"questionId": "q65",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "1687.82",
"actual": "1687.82",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 8,
- "latencyMs": 1336.9320829999924
+ "latencyMs": 1835.1892919999664
+ },
+ {
+ "questionId": "q65",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "1687.82",
+ "actual": "1687.82",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 7,
+ "latencyMs": 1211.4787500000093
},
{
"questionId": "q65",
@@ -7135,8 +10699,8 @@
"actual": "1687.82",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 266,
- "latencyMs": 3571.6664579999924
+ "outputTokens": 202,
+ "latencyMs": 3591.6950419999775
},
{
"questionId": "q65",
@@ -7147,7 +10711,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 8,
- "latencyMs": 1179.5032920000085
+ "latencyMs": 1278.8472920000204
+ },
+ {
+ "questionId": "q65",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1687.82",
+ "actual": "1687.82",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 7,
+ "latencyMs": 2102.123208999983
},
{
"questionId": "q66",
@@ -7157,525 +10732,789 @@
"actual": "cancelled",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 200,
- "latencyMs": 3395.709499999997
- },
- {
- "questionId": "q66",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 11906,
- "outputTokens": 4,
- "latencyMs": 1374.4573329999985
- },
- {
- "questionId": "q66",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 6012,
- "outputTokens": 200,
- "latencyMs": 3162.779542000004
- },
- {
- "questionId": "q66",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 6992,
- "outputTokens": 4,
- "latencyMs": 1010.6076670000039
- },
- {
- "questionId": "q66",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 6780,
- "outputTokens": 328,
- "latencyMs": 3606.7964999999967
- },
- {
- "questionId": "q66",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 8413,
- "outputTokens": 4,
- "latencyMs": 1432.5227920000034
- },
- {
- "questionId": "q66",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 328,
- "latencyMs": 2916.351958000014
- },
- {
- "questionId": "q66",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 9288,
- "outputTokens": 4,
- "latencyMs": 1207.7237920000043
- },
- {
- "questionId": "q66",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 7372,
"outputTokens": 136,
- "latencyMs": 2741.256458000018
+ "latencyMs": 2793.1591250000056
},
{
"questionId": "q66",
- "format": "yaml",
+ "format": "json",
"model": "claude-haiku-4-5",
"expected": "cancelled",
"actual": "cancelled",
"isCorrect": true,
- "inputTokens": 8384,
- "outputTokens": 4,
- "latencyMs": 1385.7817920000234
- },
- {
- "questionId": "q67",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 201,
- "latencyMs": 4731.81024999998
- },
- {
- "questionId": "q67",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1572.4971659999865
- },
- {
- "questionId": "q67",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 137,
- "latencyMs": 2684.556333000015
- },
- {
- "questionId": "q67",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 7,
- "latencyMs": 1314.9989999999816
- },
- {
- "questionId": "q67",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 137,
- "latencyMs": 2746.457541999989
- },
- {
- "questionId": "q67",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 7,
- "latencyMs": 1254.8903329999885
- },
- {
- "questionId": "q67",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 137,
- "latencyMs": 4298.293416
- },
- {
- "questionId": "q67",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1346.4980839999916
- },
- {
- "questionId": "q67",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 265,
- "latencyMs": 3634.2565419999883
- },
- {
- "questionId": "q67",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "423.6",
- "actual": "423.6",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 7,
- "latencyMs": 1363.8280410000007
- },
- {
- "questionId": "q68",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 9738,
- "outputTokens": 392,
- "latencyMs": 3933.217000000004
- },
- {
- "questionId": "q68",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1229.9339579999796
+ "latencyMs": 1319.3459579999908
},
{
- "questionId": "q68",
+ "questionId": "q66",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1572.3595830000122
+ },
+ {
+ "questionId": "q66",
"format": "toon",
"model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
+ "expected": "cancelled",
+ "actual": "cancelled",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 136,
- "latencyMs": 2728.4598340000084
- },
- {
- "questionId": "q68",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 6992,
- "outputTokens": 4,
- "latencyMs": 1427.2494170000136
- },
- {
- "questionId": "q68",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 6780,
- "outputTokens": 200,
- "latencyMs": 3187.385666999995
- },
- {
- "questionId": "q68",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 8413,
- "outputTokens": 4,
- "latencyMs": 1482.2487079999992
- },
- {
- "questionId": "q68",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 9157,
"outputTokens": 264,
- "latencyMs": 3429.744458000001
+ "latencyMs": 4642.070207999961
},
{
- "questionId": "q68",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 9288,
- "outputTokens": 4,
- "latencyMs": 1100.8814589999965
- },
- {
- "questionId": "q68",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 7372,
- "outputTokens": 72,
- "latencyMs": 1993.443707999977
- },
- {
- "questionId": "q68",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 8384,
- "outputTokens": 4,
- "latencyMs": 1105.5260419999831
- },
- {
- "questionId": "q69",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 137,
- "latencyMs": 3255.3775840000017
- },
- {
- "questionId": "q69",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1274.000417000003
- },
- {
- "questionId": "q69",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 265,
- "latencyMs": 3098.326624999987
- },
- {
- "questionId": "q69",
+ "questionId": "q66",
"format": "toon",
"model": "claude-haiku-4-5",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 7,
- "latencyMs": 1057.8637079999899
- },
- {
- "questionId": "q69",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 201,
- "latencyMs": 3651.3826249999984
- },
- {
- "questionId": "q69",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 7,
- "latencyMs": 1404.9795829999784
- },
- {
- "questionId": "q69",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 201,
- "latencyMs": 4157.148833000014
- },
- {
- "questionId": "q69",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1607.9431249999907
- },
- {
- "questionId": "q69",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 329,
- "latencyMs": 4582.246665999992
- },
- {
- "questionId": "q69",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "784.03",
- "actual": "784.03",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 7,
- "latencyMs": 1458.8513329999987
- },
- {
- "questionId": "q70",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "shipped",
- "actual": "shipped",
- "isCorrect": true,
- "inputTokens": 9738,
- "outputTokens": 200,
- "latencyMs": 3341.994207999989
- },
- {
- "questionId": "q70",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "shipped",
- "actual": "shipped",
- "isCorrect": true,
- "inputTokens": 11906,
- "outputTokens": 4,
- "latencyMs": 1144.3136670000094
- },
- {
- "questionId": "q70",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "shipped",
- "actual": "shipped",
- "isCorrect": true,
- "inputTokens": 6012,
- "outputTokens": 392,
- "latencyMs": 6067.672458999994
- },
- {
- "questionId": "q70",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "shipped",
- "actual": "shipped",
+ "expected": "cancelled",
+ "actual": "cancelled",
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 1325.0467500000086
+ "latencyMs": 1161.8217919999734
},
{
- "questionId": "q70",
+ "questionId": "q66",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1045.6249589999788
+ },
+ {
+ "questionId": "q66",
"format": "csv",
"model": "gpt-5-nano",
- "expected": "shipped",
- "actual": "shipped",
+ "expected": "cancelled",
+ "actual": "cancelled",
"isCorrect": true,
"inputTokens": 6780,
"outputTokens": 200,
- "latencyMs": 2847.485000000015
+ "latencyMs": 3501.1775419999612
},
{
- "questionId": "q70",
+ "questionId": "q66",
"format": "csv",
"model": "claude-haiku-4-5",
- "expected": "shipped",
- "actual": "shipped",
+ "expected": "cancelled",
+ "actual": "cancelled",
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1212.1944169999915
+ "latencyMs": 1463.0212910000118
},
{
- "questionId": "q70",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "shipped",
- "actual": "shipped",
+ "questionId": "q66",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
"isCorrect": true,
- "inputTokens": 9157,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 1782.100999999966
+ },
+ {
+ "questionId": "q66",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 11036,
+ "outputTokens": 584,
+ "latencyMs": 7168.528500000015
+ },
+ {
+ "questionId": "q66",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 13379,
+ "outputTokens": 4,
+ "latencyMs": 1339.9878749999916
+ },
+ {
+ "questionId": "q66",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 1196.7808749999967
+ },
+ {
+ "questionId": "q66",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 7372,
+ "outputTokens": 328,
+ "latencyMs": 4938.96991699998
+ },
+ {
+ "questionId": "q66",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 8384,
+ "outputTokens": 4,
+ "latencyMs": 1121.6232500000042
+ },
+ {
+ "questionId": "q66",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 1062.6134160000365
+ },
+ {
+ "questionId": "q67",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 137,
+ "latencyMs": 2332.1545840000035
+ },
+ {
+ "questionId": "q67",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1210.105333000014
+ },
+ {
+ "questionId": "q67",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 5,
+ "latencyMs": 2248.713915999979
+ },
+ {
+ "questionId": "q67",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 201,
+ "latencyMs": 5095.391790999973
+ },
+ {
+ "questionId": "q67",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 7,
+ "latencyMs": 2002.2553749999497
+ },
+ {
+ "questionId": "q67",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 5,
+ "latencyMs": 1447.1179159999592
+ },
+ {
+ "questionId": "q67",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 201,
+ "latencyMs": 7838.877333000011
+ },
+ {
+ "questionId": "q67",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1108.0410839999677
+ },
+ {
+ "questionId": "q67",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 5,
+ "latencyMs": 2419.8735420000157
+ },
+ {
+ "questionId": "q67",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 201,
+ "latencyMs": 4098.654000000039
+ },
+ {
+ "questionId": "q67",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 7,
+ "latencyMs": 1200.5831250000047
+ },
+ {
+ "questionId": "q67",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 5,
+ "latencyMs": 1685.785542000027
+ },
+ {
+ "questionId": "q67",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 201,
+ "latencyMs": 4059.9044170000125
+ },
+ {
+ "questionId": "q67",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 7,
+ "latencyMs": 1264.0358329999726
+ },
+ {
+ "questionId": "q67",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "423.6",
+ "actual": "423.6",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 5,
+ "latencyMs": 1237.0989580000169
+ },
+ {
+ "questionId": "q68",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 9738,
+ "outputTokens": 200,
+ "latencyMs": 3303.1327499999898
+ },
+ {
+ "questionId": "q68",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 11906,
+ "outputTokens": 4,
+ "latencyMs": 1808.5881250000093
+ },
+ {
+ "questionId": "q68",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1355.4241669999901
+ },
+ {
+ "questionId": "q68",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 6012,
+ "outputTokens": 200,
+ "latencyMs": 3711.711249999993
+ },
+ {
+ "questionId": "q68",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 6992,
+ "outputTokens": 4,
+ "latencyMs": 1294.2883750000037
+ },
+ {
+ "questionId": "q68",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1162.5020840000361
+ },
+ {
+ "questionId": "q68",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 6780,
+ "outputTokens": 264,
+ "latencyMs": 3022.083249999967
+ },
+ {
+ "questionId": "q68",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 8413,
+ "outputTokens": 4,
+ "latencyMs": 944.2437079999945
+ },
+ {
+ "questionId": "q68",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 3629.1201669999864
+ },
+ {
+ "questionId": "q68",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 11036,
"outputTokens": 456,
- "latencyMs": 5099.853499999997
+ "latencyMs": 4701.368916000007
+ },
+ {
+ "questionId": "q68",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 13379,
+ "outputTokens": 4,
+ "latencyMs": 1121.0914999999804
+ },
+ {
+ "questionId": "q68",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 2000.4341669999994
+ },
+ {
+ "questionId": "q68",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 7372,
+ "outputTokens": 200,
+ "latencyMs": 6000.394582999987
+ },
+ {
+ "questionId": "q68",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 8384,
+ "outputTokens": 4,
+ "latencyMs": 1584.1092090000166
+ },
+ {
+ "questionId": "q68",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 2002.2350420000148
+ },
+ {
+ "questionId": "q69",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 265,
+ "latencyMs": 7792.974290999991
+ },
+ {
+ "questionId": "q69",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 2028.2800829999615
+ },
+ {
+ "questionId": "q69",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 1505.0516669999924
+ },
+ {
+ "questionId": "q69",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 201,
+ "latencyMs": 7270.891041999974
+ },
+ {
+ "questionId": "q69",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 7,
+ "latencyMs": 2478.4481660000165
+ },
+ {
+ "questionId": "q69",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 1305.2497500000172
+ },
+ {
+ "questionId": "q69",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 393,
+ "latencyMs": 6261.073583999998
+ },
+ {
+ "questionId": "q69",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1863.528500000015
+ },
+ {
+ "questionId": "q69",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 3306.4452499999898
+ },
+ {
+ "questionId": "q69",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 265,
+ "latencyMs": 3464.767792000028
+ },
+ {
+ "questionId": "q69",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 7,
+ "latencyMs": 1144.0890420000069
+ },
+ {
+ "questionId": "q69",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1458.4538750000065
+ },
+ {
+ "questionId": "q69",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 201,
+ "latencyMs": 3276.8598340000026
+ },
+ {
+ "questionId": "q69",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 7,
+ "latencyMs": 1434.8686669999734
+ },
+ {
+ "questionId": "q69",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "784.03",
+ "actual": "784.03",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1570.2152500000084
},
{
"questionId": "q70",
- "format": "markdown-kv",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 9738,
+ "outputTokens": 200,
+ "latencyMs": 3532.8103330000304
+ },
+ {
+ "questionId": "q70",
+ "format": "json",
"model": "claude-haiku-4-5",
"expected": "shipped",
"actual": "shipped",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1284.708416999987
+ "latencyMs": 1212.3070409999928
+ },
+ {
+ "questionId": "q70",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 2,
+ "latencyMs": 1246.4002080000355
+ },
+ {
+ "questionId": "q70",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 6012,
+ "outputTokens": 136,
+ "latencyMs": 6942.459582999989
+ },
+ {
+ "questionId": "q70",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 6992,
+ "outputTokens": 4,
+ "latencyMs": 1144.068333000003
+ },
+ {
+ "questionId": "q70",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 2,
+ "latencyMs": 2209.296417000005
+ },
+ {
+ "questionId": "q70",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 6780,
+ "outputTokens": 136,
+ "latencyMs": 4940.5221670000465
+ },
+ {
+ "questionId": "q70",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 8413,
+ "outputTokens": 4,
+ "latencyMs": 1493.192041000002
+ },
+ {
+ "questionId": "q70",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 2,
+ "latencyMs": 1817.8049579999642
+ },
+ {
+ "questionId": "q70",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 11036,
+ "outputTokens": 136,
+ "latencyMs": 3458.8650829999824
+ },
+ {
+ "questionId": "q70",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 13379,
+ "outputTokens": 4,
+ "latencyMs": 1401.621165999968
+ },
+ {
+ "questionId": "q70",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 2,
+ "latencyMs": 3644.271166999999
},
{
"questionId": "q70",
@@ -7686,7 +11525,7 @@
"isCorrect": true,
"inputTokens": 7372,
"outputTokens": 200,
- "latencyMs": 2745.7869170000195
+ "latencyMs": 2859.7807909999974
},
{
"questionId": "q70",
@@ -7697,7 +11536,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1114.6338329999999
+ "latencyMs": 1170.455874999985
+ },
+ {
+ "questionId": "q70",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "shipped",
+ "actual": "shipped",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 2,
+ "latencyMs": 2668.4208750000107
},
{
"questionId": "q71",
@@ -7708,7 +11558,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 265,
- "latencyMs": 3482.8154170000053
+ "latencyMs": 3387.9897919999785
},
{
"questionId": "q71",
@@ -7719,7 +11569,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 7,
- "latencyMs": 1156.5491669999901
+ "latencyMs": 1210.6735000000335
+ },
+ {
+ "questionId": "q71",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "645.88",
+ "actual": "645.88",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 2313.2734579999815
},
{
"questionId": "q71",
@@ -7730,7 +11591,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 201,
- "latencyMs": 2970.104541000008
+ "latencyMs": 2948.030916000018
},
{
"questionId": "q71",
@@ -7741,7 +11602,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 7,
- "latencyMs": 1297.768374999985
+ "latencyMs": 1499.2446670000209
+ },
+ {
+ "questionId": "q71",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "645.88",
+ "actual": "645.88",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 1259.240832999989
},
{
"questionId": "q71",
@@ -7752,7 +11624,7 @@
"isCorrect": true,
"inputTokens": 6781,
"outputTokens": 201,
- "latencyMs": 3475.6895419999782
+ "latencyMs": 8963.050458999991
},
{
"questionId": "q71",
@@ -7763,29 +11635,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 7,
- "latencyMs": 1469.7436250000028
+ "latencyMs": 1168.6370839999872
},
{
"questionId": "q71",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "645.88",
+ "actual": "645.88",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 2633.771375000011
+ },
+ {
+ "questionId": "q71",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "645.88",
"actual": "645.88",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 265,
- "latencyMs": 4107.424582999985
+ "inputTokens": 11037,
+ "outputTokens": 329,
+ "latencyMs": 7189.561790999956
},
{
"questionId": "q71",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "645.88",
"actual": "645.88",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 7,
- "latencyMs": 1070.4507500000182
+ "latencyMs": 1225.8507080000127
+ },
+ {
+ "questionId": "q71",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "645.88",
+ "actual": "645.88",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1124.1396250000107
},
{
"questionId": "q71",
@@ -7795,8 +11689,8 @@
"actual": "645.88",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 265,
- "latencyMs": 3768.3023749999993
+ "outputTokens": 201,
+ "latencyMs": 3990.592707999982
},
{
"questionId": "q71",
@@ -7807,7 +11701,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 7,
- "latencyMs": 1111.744915999996
+ "latencyMs": 1128.0700419999775
+ },
+ {
+ "questionId": "q71",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "645.88",
+ "actual": "645.88",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1804.0158330000122
},
{
"questionId": "q72",
@@ -7818,7 +11723,7 @@
"isCorrect": true,
"inputTokens": 9738,
"outputTokens": 263,
- "latencyMs": 3199.3634999999776
+ "latencyMs": 3661.423624999996
},
{
"questionId": "q72",
@@ -7829,7 +11734,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1232.4811659999832
+ "latencyMs": 1125.6147919999785
+ },
+ {
+ "questionId": "q72",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1711.6630829999922
},
{
"questionId": "q72",
@@ -7839,8 +11755,8 @@
"actual": "processing",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 263,
- "latencyMs": 5616.989999999991
+ "outputTokens": 199,
+ "latencyMs": 3128.0557079999708
},
{
"questionId": "q72",
@@ -7851,7 +11767,18 @@
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 1697.3162920000032
+ "latencyMs": 1669.1822079999838
+ },
+ {
+ "questionId": "q72",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1274.667958999984
},
{
"questionId": "q72",
@@ -7861,8 +11788,8 @@
"actual": "processing",
"isCorrect": true,
"inputTokens": 6780,
- "outputTokens": 199,
- "latencyMs": 2781.3399999999965
+ "outputTokens": 263,
+ "latencyMs": 3663.237792
},
{
"questionId": "q72",
@@ -7873,29 +11800,51 @@
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1162.0402089999989
+ "latencyMs": 1122.126249999972
},
{
"questionId": "q72",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 1549.8010420000064
+ },
+ {
+ "questionId": "q72",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "processing",
"actual": "processing",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 199,
- "latencyMs": 3651.1349579999805
+ "inputTokens": 11036,
+ "outputTokens": 327,
+ "latencyMs": 6674.916083000018
},
{
"questionId": "q72",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "processing",
"actual": "processing",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 13379,
"outputTokens": 4,
- "latencyMs": 1132.3132920000062
+ "latencyMs": 1230.8339169999817
+ },
+ {
+ "questionId": "q72",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 992.4760409999872
},
{
"questionId": "q72",
@@ -7905,8 +11854,8 @@
"actual": "processing",
"isCorrect": true,
"inputTokens": 7372,
- "outputTokens": 135,
- "latencyMs": 3017.5073749999865
+ "outputTokens": 199,
+ "latencyMs": 3755.6932919999817
},
{
"questionId": "q72",
@@ -7917,7 +11866,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1294.688374999998
+ "latencyMs": 1540.152833
+ },
+ {
+ "questionId": "q72",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "processing",
+ "actual": "processing",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 2185.4502910000156
},
{
"questionId": "q73",
@@ -7927,8 +11887,8 @@
"actual": "371.91",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 201,
- "latencyMs": 3591.221499999985
+ "outputTokens": 265,
+ "latencyMs": 3809.869667000021
},
{
"questionId": "q73",
@@ -7939,7 +11899,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 7,
- "latencyMs": 1329.419332999998
+ "latencyMs": 1150.84375
+ },
+ {
+ "questionId": "q73",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "371.91",
+ "actual": "371.91",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 1217.3986659999937
},
{
"questionId": "q73",
@@ -7950,7 +11921,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 137,
- "latencyMs": 2655.557792000007
+ "latencyMs": 2091.0124589999905
},
{
"questionId": "q73",
@@ -7961,7 +11932,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 7,
- "latencyMs": 1446.9020000000019
+ "latencyMs": 1357.4467920000316
+ },
+ {
+ "questionId": "q73",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "371.91",
+ "actual": "371.91",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 2377.229250000033
},
{
"questionId": "q73",
@@ -7972,7 +11954,7 @@
"isCorrect": true,
"inputTokens": 6781,
"outputTokens": 201,
- "latencyMs": 3450.5822500000068
+ "latencyMs": 2673.4793749999953
},
{
"questionId": "q73",
@@ -7983,29 +11965,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 7,
- "latencyMs": 1291.2180410000146
+ "latencyMs": 1785.7454999999609
},
{
"questionId": "q73",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "371.91",
+ "actual": "371.91",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 1956.5365410000086
+ },
+ {
+ "questionId": "q73",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "371.91",
"actual": "371.91",
"isCorrect": true,
- "inputTokens": 9158,
+ "inputTokens": 11037,
"outputTokens": 201,
- "latencyMs": 2803.9767500000016
+ "latencyMs": 2943.3867910000263
},
{
"questionId": "q73",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "371.91",
"actual": "371.91",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 7,
- "latencyMs": 1098.5968749999884
+ "latencyMs": 1264.3261250000214
+ },
+ {
+ "questionId": "q73",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "371.91",
+ "actual": "371.91",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1479.502083999978
},
{
"questionId": "q73",
@@ -8015,8 +12019,8 @@
"actual": "371.91",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 201,
- "latencyMs": 3047.8699999999953
+ "outputTokens": 137,
+ "latencyMs": 2697.696667000011
},
{
"questionId": "q73",
@@ -8027,7 +12031,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 7,
- "latencyMs": 1800.6882080000069
+ "latencyMs": 1319.8920829999843
+ },
+ {
+ "questionId": "q73",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "371.91",
+ "actual": "371.91",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1655.4022090000217
},
{
"questionId": "q74",
@@ -8037,8 +12052,8 @@
"actual": "pending",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 199,
- "latencyMs": 2957.2203330000048
+ "outputTokens": 327,
+ "latencyMs": 3728.9863749999786
},
{
"questionId": "q74",
@@ -8049,7 +12064,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1165.7748750000028
+ "latencyMs": 1403.8238750000019
+ },
+ {
+ "questionId": "q74",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1610.8924579999875
},
{
"questionId": "q74",
@@ -8059,8 +12085,8 @@
"actual": "pending",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 135,
- "latencyMs": 2362.283208000008
+ "outputTokens": 199,
+ "latencyMs": 3121.718416000018
},
{
"questionId": "q74",
@@ -8071,7 +12097,18 @@
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 1871.7275829999999
+ "latencyMs": 1051.426999999967
+ },
+ {
+ "questionId": "q74",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1171.1483340000268
},
{
"questionId": "q74",
@@ -8082,7 +12119,7 @@
"isCorrect": true,
"inputTokens": 6780,
"outputTokens": 263,
- "latencyMs": 4747.243208
+ "latencyMs": 2642.1894589999574
},
{
"questionId": "q74",
@@ -8093,29 +12130,51 @@
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1275.342082999996
+ "latencyMs": 1286.3537080000388
},
{
"questionId": "q74",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 3901.2503750000033
+ },
+ {
+ "questionId": "q74",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "pending",
"actual": "pending",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 199,
- "latencyMs": 3180.0179160000116
+ "inputTokens": 11036,
+ "outputTokens": 263,
+ "latencyMs": 3386.3902919999673
},
{
"questionId": "q74",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "pending",
"actual": "pending",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 13379,
"outputTokens": 4,
- "latencyMs": 2343.5514580000017
+ "latencyMs": 1593.6848750000354
+ },
+ {
+ "questionId": "q74",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 1085.9149159999797
},
{
"questionId": "q74",
@@ -8126,7 +12185,7 @@
"isCorrect": true,
"inputTokens": 7372,
"outputTokens": 135,
- "latencyMs": 2362.525915999984
+ "latencyMs": 2352.2881669999915
},
{
"questionId": "q74",
@@ -8137,7 +12196,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1231.4291669999948
+ "latencyMs": 1046.4814580000238
+ },
+ {
+ "questionId": "q74",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "pending",
+ "actual": "pending",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 1687.5740409999853
},
{
"questionId": "q75",
@@ -8147,8 +12217,8 @@
"actual": "1066",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 200,
- "latencyMs": 3091.9045840000035
+ "outputTokens": 264,
+ "latencyMs": 5460.0885409999755
},
{
"questionId": "q75",
@@ -8159,7 +12229,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 6,
- "latencyMs": 1111.9695000000065
+ "latencyMs": 1246.0814159999718
+ },
+ {
+ "questionId": "q75",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1066",
+ "actual": "1066",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 1696.832666000002
},
{
"questionId": "q75",
@@ -8169,8 +12250,8 @@
"actual": "1066",
"isCorrect": true,
"inputTokens": 6013,
- "outputTokens": 264,
- "latencyMs": 3977.5146669999813
+ "outputTokens": 200,
+ "latencyMs": 2906.3054160000174
},
{
"questionId": "q75",
@@ -8181,7 +12262,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 6,
- "latencyMs": 1195.262208
+ "latencyMs": 1201.3947090000147
+ },
+ {
+ "questionId": "q75",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1066",
+ "actual": "1066.00",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 7,
+ "latencyMs": 1377.305457999988
},
{
"questionId": "q75",
@@ -8191,8 +12283,8 @@
"actual": "1066",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 328,
- "latencyMs": 3839.0627499999828
+ "outputTokens": 456,
+ "latencyMs": 8801.27112499997
},
{
"questionId": "q75",
@@ -8203,29 +12295,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 6,
- "latencyMs": 2186.8021250000165
+ "latencyMs": 1433.466666000022
},
{
"questionId": "q75",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1066",
+ "actual": "1066",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 3448.654917000036
+ },
+ {
+ "questionId": "q75",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "1066",
"actual": "1066",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 328,
- "latencyMs": 6945.004667000001
+ "inputTokens": 11037,
+ "outputTokens": 264,
+ "latencyMs": 4939.312791000004
},
{
"questionId": "q75",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "1066",
"actual": "1066",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 6,
- "latencyMs": 1103.6762919999892
+ "latencyMs": 1252.419332999969
+ },
+ {
+ "questionId": "q75",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "1066",
+ "actual": "1066.00",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 7,
+ "latencyMs": 1151.2592920000316
},
{
"questionId": "q75",
@@ -8235,8 +12349,8 @@
"actual": "1066",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 264,
- "latencyMs": 3924.5181250000023
+ "outputTokens": 136,
+ "latencyMs": 3143.9853749999893
},
{
"questionId": "q75",
@@ -8247,7 +12361,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 6,
- "latencyMs": 1023.334583000018
+ "latencyMs": 1177.0768329999992
+ },
+ {
+ "questionId": "q75",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1066",
+ "actual": "1066.0",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 1535.377165999962
},
{
"questionId": "q76",
@@ -8257,8 +12382,8 @@
"actual": "cancelled",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 264,
- "latencyMs": 4017.931666999997
+ "outputTokens": 328,
+ "latencyMs": 10990.360375000047
},
{
"questionId": "q76",
@@ -8269,7 +12394,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1278.6839580000087
+ "latencyMs": 1467.304375000007
+ },
+ {
+ "questionId": "q76",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1316.8680830000085
},
{
"questionId": "q76",
@@ -8279,206 +12415,305 @@
"actual": "cancelled",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 200,
- "latencyMs": 2566.9374580000003
- },
- {
- "questionId": "q76",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 6992,
- "outputTokens": 4,
- "latencyMs": 958.4104159999988
- },
- {
- "questionId": "q76",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 6780,
- "outputTokens": 264,
- "latencyMs": 3640.0960409999825
- },
- {
- "questionId": "q76",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 8413,
- "outputTokens": 4,
- "latencyMs": 1534.7306249999965
- },
- {
- "questionId": "q76",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 328,
- "latencyMs": 3905.6711249999935
- },
- {
- "questionId": "q76",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 9288,
- "outputTokens": 4,
- "latencyMs": 2067.435375000001
- },
- {
- "questionId": "q76",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 7372,
- "outputTokens": 264,
- "latencyMs": 3613.7146249999932
- },
- {
- "questionId": "q76",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "cancelled",
- "actual": "cancelled",
- "isCorrect": true,
- "inputTokens": 8384,
- "outputTokens": 4,
- "latencyMs": 1154.955958000006
- },
- {
- "questionId": "q77",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 330,
- "latencyMs": 3904.2146250000224
- },
- {
- "questionId": "q77",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 8,
- "latencyMs": 1618.7487079999992
- },
- {
- "questionId": "q77",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 202,
- "latencyMs": 2906.194541999983
- },
- {
- "questionId": "q77",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 8,
- "latencyMs": 1481.559333000012
- },
- {
- "questionId": "q77",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 266,
- "latencyMs": 3879.7539999999863
- },
- {
- "questionId": "q77",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 8,
- "latencyMs": 1809.5822499999776
- },
- {
- "questionId": "q77",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 202,
- "latencyMs": 3147.330500000011
- },
- {
- "questionId": "q77",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 8,
- "latencyMs": 1297.2377080000006
- },
- {
- "questionId": "q77",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 394,
- "latencyMs": 3710.157500000001
- },
- {
- "questionId": "q77",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "1697.4",
- "actual": "1697.4",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 8,
- "latencyMs": 1238.5442500000063
- },
- {
- "questionId": "q78",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "delivered",
- "actual": "delivered",
- "isCorrect": true,
- "inputTokens": 9738,
"outputTokens": 392,
- "latencyMs": 4101.743083999987
+ "latencyMs": 4399.92220900004
+ },
+ {
+ "questionId": "q76",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 6992,
+ "outputTokens": 4,
+ "latencyMs": 1077.4348749999772
+ },
+ {
+ "questionId": "q76",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1317.501791000017
+ },
+ {
+ "questionId": "q76",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 6780,
+ "outputTokens": 200,
+ "latencyMs": 4153.370333999977
+ },
+ {
+ "questionId": "q76",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 8413,
+ "outputTokens": 4,
+ "latencyMs": 1147.2140420000069
+ },
+ {
+ "questionId": "q76",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 1243.451000000001
+ },
+ {
+ "questionId": "q76",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 11036,
+ "outputTokens": 328,
+ "latencyMs": 7804.228665999952
+ },
+ {
+ "questionId": "q76",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 13379,
+ "outputTokens": 4,
+ "latencyMs": 1144.1722500000033
+ },
+ {
+ "questionId": "q76",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 857.7333750000107
+ },
+ {
+ "questionId": "q76",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 7372,
+ "outputTokens": 136,
+ "latencyMs": 2287.29574999999
+ },
+ {
+ "questionId": "q76",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 8384,
+ "outputTokens": 4,
+ "latencyMs": 1285.9760839999653
+ },
+ {
+ "questionId": "q76",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "cancelled",
+ "actual": "cancelled",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 1174.2349580000155
+ },
+ {
+ "questionId": "q77",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 266,
+ "latencyMs": 4109.542333999998
+ },
+ {
+ "questionId": "q77",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 8,
+ "latencyMs": 1433.0992499999702
+ },
+ {
+ "questionId": "q77",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 6,
+ "latencyMs": 3301.268875000009
+ },
+ {
+ "questionId": "q77",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 394,
+ "latencyMs": 4952.654542000033
+ },
+ {
+ "questionId": "q77",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 8,
+ "latencyMs": 1165.5959999999614
+ },
+ {
+ "questionId": "q77",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 6,
+ "latencyMs": 982.1686660000123
+ },
+ {
+ "questionId": "q77",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 266,
+ "latencyMs": 4735.772292000009
+ },
+ {
+ "questionId": "q77",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 8,
+ "latencyMs": 1361.5435829999624
+ },
+ {
+ "questionId": "q77",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 6,
+ "latencyMs": 2838.4672920000157
+ },
+ {
+ "questionId": "q77",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 394,
+ "latencyMs": 4771.182459000032
+ },
+ {
+ "questionId": "q77",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 8,
+ "latencyMs": 1202.4828330000164
+ },
+ {
+ "questionId": "q77",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 6,
+ "latencyMs": 1063.3247500000289
+ },
+ {
+ "questionId": "q77",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 202,
+ "latencyMs": 7751.146624999994
+ },
+ {
+ "questionId": "q77",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 8,
+ "latencyMs": 1352.936708000023
+ },
+ {
+ "questionId": "q77",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1697.4",
+ "actual": "1697.4",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 6,
+ "latencyMs": 3135.286582999979
+ },
+ {
+ "questionId": "q78",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 9738,
+ "outputTokens": 264,
+ "latencyMs": 3105.402541999996
},
{
"questionId": "q78",
@@ -8489,7 +12724,18 @@
"isCorrect": true,
"inputTokens": 11906,
"outputTokens": 4,
- "latencyMs": 1170.750417000003
+ "latencyMs": 1140.6077500000247
+ },
+ {
+ "questionId": "q78",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 12112,
+ "outputTokens": 1,
+ "latencyMs": 1257.6969169999938
},
{
"questionId": "q78",
@@ -8499,8 +12745,8 @@
"actual": "delivered",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 264,
- "latencyMs": 8324.009665999998
+ "outputTokens": 72,
+ "latencyMs": 2142.8472499999916
},
{
"questionId": "q78",
@@ -8511,7 +12757,18 @@
"isCorrect": true,
"inputTokens": 6992,
"outputTokens": 4,
- "latencyMs": 1173.343790999992
+ "latencyMs": 1485.6063330000034
+ },
+ {
+ "questionId": "q78",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 1,
+ "latencyMs": 1350.4362079999992
},
{
"questionId": "q78",
@@ -8522,7 +12779,7 @@
"isCorrect": true,
"inputTokens": 6780,
"outputTokens": 264,
- "latencyMs": 3005.4394999999786
+ "latencyMs": 3870.94754199998
},
{
"questionId": "q78",
@@ -8533,29 +12790,51 @@
"isCorrect": true,
"inputTokens": 8413,
"outputTokens": 4,
- "latencyMs": 1376.5506659999955
+ "latencyMs": 1153.2942499999772
},
{
"questionId": "q78",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 1,
+ "latencyMs": 2935.8738330000197
+ },
+ {
+ "questionId": "q78",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "delivered",
"actual": "delivered",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 136,
- "latencyMs": 3209.5317499999946
+ "inputTokens": 11036,
+ "outputTokens": 328,
+ "latencyMs": 4063.2786669999477
},
{
"questionId": "q78",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "delivered",
"actual": "delivered",
"isCorrect": true,
- "inputTokens": 9288,
+ "inputTokens": 13379,
"outputTokens": 4,
- "latencyMs": 1299.4064170000202
+ "latencyMs": 1202.6428329999908
+ },
+ {
+ "questionId": "q78",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 1,
+ "latencyMs": 1221.4335410000058
},
{
"questionId": "q78",
@@ -8565,8 +12844,8 @@
"actual": "delivered",
"isCorrect": true,
"inputTokens": 7372,
- "outputTokens": 264,
- "latencyMs": 3753.726042000024
+ "outputTokens": 200,
+ "latencyMs": 5382.740458999993
},
{
"questionId": "q78",
@@ -8577,7 +12856,18 @@
"isCorrect": true,
"inputTokens": 8384,
"outputTokens": 4,
- "latencyMs": 1134.558416999993
+ "latencyMs": 1434.1426659999997
+ },
+ {
+ "questionId": "q78",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "delivered",
+ "actual": "delivered",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 1,
+ "latencyMs": 1046.8339999999735
},
{
"questionId": "q79",
@@ -8588,7 +12878,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 73,
- "latencyMs": 2494.451874999999
+ "latencyMs": 2607.845874999999
},
{
"questionId": "q79",
@@ -8599,7 +12889,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 9,
- "latencyMs": 1270.5290410000016
+ "latencyMs": 1676.4270830000169
+ },
+ {
+ "questionId": "q79",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 3,
+ "latencyMs": 1219.0042910000193
},
{
"questionId": "q79",
@@ -8610,7 +12911,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 137,
- "latencyMs": 2403.4134579999954
+ "latencyMs": 3378.1006669999915
},
{
"questionId": "q79",
@@ -8621,7 +12922,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 9,
- "latencyMs": 1673.0169579999929
+ "latencyMs": 1979.5205839999835
+ },
+ {
+ "questionId": "q79",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 3,
+ "latencyMs": 1439.3422910000081
},
{
"questionId": "q79",
@@ -8631,52 +12943,74 @@
"actual": "Valerie Braun",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 73,
- "latencyMs": 1704.8420409999962
- },
- {
- "questionId": "q79",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Valerie Braun",
- "actual": "Valerie Braun",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 9,
- "latencyMs": 1447.5210840000072
- },
- {
- "questionId": "q79",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Valerie Braun",
- "actual": "Valerie Braun",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 73,
- "latencyMs": 1638.756207999977
- },
- {
- "questionId": "q79",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Valerie Braun",
- "actual": "Valerie Braun",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 9,
- "latencyMs": 1504.7892920000013
- },
- {
- "questionId": "q79",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Valerie Braun",
- "actual": "Valerie Braun",
- "isCorrect": true,
- "inputTokens": 7373,
"outputTokens": 137,
- "latencyMs": 2409.509625000006
+ "latencyMs": 2889.578749999986
+ },
+ {
+ "questionId": "q79",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 9,
+ "latencyMs": 1190.1848750000354
+ },
+ {
+ "questionId": "q79",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 3,
+ "latencyMs": 2444.884665999969
+ },
+ {
+ "questionId": "q79",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 73,
+ "latencyMs": 2360.869958999974
+ },
+ {
+ "questionId": "q79",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 9,
+ "latencyMs": 1299.0499999999884
+ },
+ {
+ "questionId": "q79",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 3,
+ "latencyMs": 932.0124589999905
+ },
+ {
+ "questionId": "q79",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 73,
+ "latencyMs": 3092.9805410000263
},
{
"questionId": "q79",
@@ -8687,7 +13021,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 9,
- "latencyMs": 1318.699833999999
+ "latencyMs": 1872.3574159999844
+ },
+ {
+ "questionId": "q79",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Valerie Braun",
+ "actual": "Valerie Braun",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 3,
+ "latencyMs": 1216.4535000000033
},
{
"questionId": "q80",
@@ -8698,7 +13043,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 138,
- "latencyMs": 2616.233749999985
+ "latencyMs": 2404.87479099998
},
{
"questionId": "q80",
@@ -8709,7 +13054,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 9,
- "latencyMs": 1314.3836249999877
+ "latencyMs": 2182.619249999989
+ },
+ {
+ "questionId": "q80",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 3,
+ "latencyMs": 1508.2469580000034
},
{
"questionId": "q80",
@@ -8720,7 +13076,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 138,
- "latencyMs": 2722.7087499999907
+ "latencyMs": 3670.61050000001
},
{
"questionId": "q80",
@@ -8731,7 +13087,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 9,
- "latencyMs": 1190.632500000007
+ "latencyMs": 1291.4328749999986
+ },
+ {
+ "questionId": "q80",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 3,
+ "latencyMs": 1201.7425829999847
},
{
"questionId": "q80",
@@ -8741,1042 +13108,1559 @@
"actual": "Anita Kozey",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 330,
- "latencyMs": 4346.388291999989
- },
- {
- "questionId": "q80",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Anita Kozey",
- "actual": "Anita Kozey",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 9,
- "latencyMs": 1327.8158750000002
- },
- {
- "questionId": "q80",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Anita Kozey",
- "actual": "Anita Kozey",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 74,
- "latencyMs": 2443.0598340000142
- },
- {
- "questionId": "q80",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Anita Kozey",
- "actual": "Anita Kozey",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 9,
- "latencyMs": 1396.4260829999985
- },
- {
- "questionId": "q80",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Anita Kozey",
- "actual": "Anita Kozey",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 266,
- "latencyMs": 4886.8007919999945
- },
- {
- "questionId": "q80",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Anita Kozey",
- "actual": "Anita Kozey",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 9,
- "latencyMs": 1469.287249999994
- },
- {
- "questionId": "q81",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 139,
- "latencyMs": 2891.1199170000036
- },
- {
- "questionId": "q81",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 10,
- "latencyMs": 1342.1902079999854
- },
- {
- "questionId": "q81",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 139,
- "latencyMs": 2846.046624999988
- },
- {
- "questionId": "q81",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 10,
- "latencyMs": 1327.919499999989
- },
- {
- "questionId": "q81",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 139,
- "latencyMs": 4302.444041999988
- },
- {
- "questionId": "q81",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 10,
- "latencyMs": 1207.6207500000019
- },
- {
- "questionId": "q81",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 267,
- "latencyMs": 3389.5046659999934
- },
- {
- "questionId": "q81",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 10,
- "latencyMs": 1236.2248340000224
- },
- {
- "questionId": "q81",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 139,
- "latencyMs": 2138.4831669999985
- },
- {
- "questionId": "q81",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Elmer Kub PhD",
- "actual": "Elmer Kub PhD",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 10,
- "latencyMs": 1233.3828330000106
- },
- {
- "questionId": "q82",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 138,
- "latencyMs": 3346.8621669999848
- },
- {
- "questionId": "q82",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 10,
- "latencyMs": 1321.650082999986
- },
- {
- "questionId": "q82",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 138,
- "latencyMs": 2395.766499999998
- },
- {
- "questionId": "q82",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 10,
- "latencyMs": 1749.51670800001
- },
- {
- "questionId": "q82",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 330,
- "latencyMs": 4207.4487500000105
- },
- {
- "questionId": "q82",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 10,
- "latencyMs": 1495.846125000011
- },
- {
- "questionId": "q82",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 266,
- "latencyMs": 4258.881374999997
- },
- {
- "questionId": "q82",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 10,
- "latencyMs": 1113.9782499999856
- },
- {
- "questionId": "q82",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 74,
- "latencyMs": 1841.1115829999908
- },
- {
- "questionId": "q82",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Maxine Zemlak",
- "actual": "Maxine Zemlak",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 10,
- "latencyMs": 1350.6631249999919
- },
- {
- "questionId": "q83",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 138,
- "latencyMs": 2322.9531669999997
- },
- {
- "questionId": "q83",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1556.4763749999984
- },
- {
- "questionId": "q83",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 74,
- "latencyMs": 2354.004667000001
- },
- {
- "questionId": "q83",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 7,
- "latencyMs": 1314.1952909999818
- },
- {
- "questionId": "q83",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 138,
- "latencyMs": 3437.8392080000194
- },
- {
- "questionId": "q83",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 7,
- "latencyMs": 1131.0356249999895
- },
- {
- "questionId": "q83",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 138,
- "latencyMs": 3209.646000000008
- },
- {
- "questionId": "q83",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1175.6475829999836
- },
- {
- "questionId": "q83",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 266,
- "latencyMs": 3785.0792920000094
- },
- {
- "questionId": "q83",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Emanuel Littel",
- "actual": "Emanuel Littel",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 7,
- "latencyMs": 1314.7905420000025
- },
- {
- "questionId": "q84",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 72,
- "latencyMs": 2562.896166999999
- },
- {
- "questionId": "q84",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 3205.178583000001
- },
- {
- "questionId": "q84",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 136,
- "latencyMs": 3746.9874170000257
- },
- {
- "questionId": "q84",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 7,
- "latencyMs": 1159.280584000022
- },
- {
- "questionId": "q84",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Andrew Kling",
- "actual": "Marvin Thiel",
- "isCorrect": false,
- "inputTokens": 6781,
"outputTokens": 202,
- "latencyMs": 2584.499542000005
+ "latencyMs": 4846.332458000048
},
{
- "questionId": "q84",
+ "questionId": "q80",
"format": "csv",
"model": "claude-haiku-4-5",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 7,
- "latencyMs": 1249.9375
- },
- {
- "questionId": "q84",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 136,
- "latencyMs": 2068.6956669999927
- },
- {
- "questionId": "q84",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1733.235834000021
- },
- {
- "questionId": "q84",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 200,
- "latencyMs": 3831.721124999982
- },
- {
- "questionId": "q84",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Andrew Kling",
- "actual": "Andrew Kling",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 7,
- "latencyMs": 1311.1745419999934
- },
- {
- "questionId": "q85",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 139,
- "latencyMs": 5464.460791999998
- },
- {
- "questionId": "q85",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 9,
- "latencyMs": 1266.8881249999977
- },
- {
- "questionId": "q85",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 203,
- "latencyMs": 2957.0821250000154
- },
- {
- "questionId": "q85",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 9,
- "latencyMs": 1264.50791700001
- },
- {
- "questionId": "q85",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 331,
- "latencyMs": 3740.643666000018
- },
- {
- "questionId": "q85",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 9,
- "latencyMs": 1310.5358749999723
+ "latencyMs": 1134.4527920000255
},
{
- "questionId": "q85",
- "format": "markdown-kv",
+ "questionId": "q80",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 3,
+ "latencyMs": 2760.9979579999927
+ },
+ {
+ "questionId": "q80",
+ "format": "xml",
"model": "gpt-5-nano",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 139,
- "latencyMs": 2979.4539579999982
+ "inputTokens": 11037,
+ "outputTokens": 138,
+ "latencyMs": 4943.049208999961
},
{
- "questionId": "q85",
- "format": "markdown-kv",
+ "questionId": "q80",
+ "format": "xml",
"model": "claude-haiku-4-5",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 9,
- "latencyMs": 2026.8683329999913
+ "latencyMs": 1163.70645899995
},
{
- "questionId": "q85",
+ "questionId": "q80",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 3,
+ "latencyMs": 2088.2969169999706
+ },
+ {
+ "questionId": "q80",
"format": "yaml",
"model": "gpt-5-nano",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 139,
- "latencyMs": 2932.0294159999758
- },
- {
- "questionId": "q85",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Morris O'Hara",
- "actual": "Morris O'Hara",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 9,
- "latencyMs": 1130.2447079999838
- },
- {
- "questionId": "q86",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 9739,
- "outputTokens": 203,
- "latencyMs": 2576.945458000002
- },
- {
- "questionId": "q86",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 9,
- "latencyMs": 1214.6620409999741
- },
- {
- "questionId": "q86",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 203,
- "latencyMs": 3718.371167000005
- },
- {
- "questionId": "q86",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 9,
- "latencyMs": 1374.984832999995
- },
- {
- "questionId": "q86",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 139,
- "latencyMs": 2313.5867499999877
- },
- {
- "questionId": "q86",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 9,
- "latencyMs": 1325.0793330000015
- },
- {
- "questionId": "q86",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 139,
- "latencyMs": 2777.8669999999984
- },
- {
- "questionId": "q86",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 9,
- "latencyMs": 1246.2134589999914
- },
- {
- "questionId": "q86",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 7373,
- "outputTokens": 75,
- "latencyMs": 2246.8254580000066
- },
- {
- "questionId": "q86",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "Elijah Franecki",
- "actual": "Elijah Franecki",
- "isCorrect": true,
- "inputTokens": 8385,
- "outputTokens": 9,
- "latencyMs": 1573.5733749999781
- },
- {
- "questionId": "q87",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
- "isCorrect": true,
- "inputTokens": 9739,
"outputTokens": 74,
- "latencyMs": 2494.7630000000063
+ "latencyMs": 1973.243833000015
},
{
- "questionId": "q87",
+ "questionId": "q80",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 9,
+ "latencyMs": 1430.9339170000167
+ },
+ {
+ "questionId": "q80",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Anita Kozey",
+ "actual": "Anita Kozey",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 3,
+ "latencyMs": 1687.4137919999775
+ },
+ {
+ "questionId": "q81",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 203,
+ "latencyMs": 3178.392749999999
+ },
+ {
+ "questionId": "q81",
"format": "json",
"model": "claude-haiku-4-5",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
"isCorrect": true,
"inputTokens": 11907,
- "outputTokens": 7,
- "latencyMs": 1135.412083000003
+ "outputTokens": 10,
+ "latencyMs": 1213.1997499999707
},
{
- "questionId": "q87",
+ "questionId": "q81",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 1591.6145830000169
+ },
+ {
+ "questionId": "q81",
"format": "toon",
"model": "gpt-5-nano",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 203,
+ "latencyMs": 3938.462541999994
+ },
+ {
+ "questionId": "q81",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 10,
+ "latencyMs": 1552.203542000032
+ },
+ {
+ "questionId": "q81",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 1499.0997919999645
+ },
+ {
+ "questionId": "q81",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 203,
+ "latencyMs": 5183.275583000039
+ },
+ {
+ "questionId": "q81",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 10,
+ "latencyMs": 1740.2195410000277
+ },
+ {
+ "questionId": "q81",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 3886.555624999979
+ },
+ {
+ "questionId": "q81",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 203,
+ "latencyMs": 6655.238542000006
+ },
+ {
+ "questionId": "q81",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 10,
+ "latencyMs": 1357.9108329999726
+ },
+ {
+ "questionId": "q81",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 1344.8635829999694
+ },
+ {
+ "questionId": "q81",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 139,
+ "latencyMs": 10553.66091700003
+ },
+ {
+ "questionId": "q81",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 10,
+ "latencyMs": 1807.1954169999808
+ },
+ {
+ "questionId": "q81",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Elmer Kub PhD",
+ "actual": "Elmer Kub PhD",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 2490.0647499999614
+ },
+ {
+ "questionId": "q82",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 138,
+ "latencyMs": 4916.117375000031
+ },
+ {
+ "questionId": "q82",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 10,
+ "latencyMs": 1074.780374999973
+ },
+ {
+ "questionId": "q82",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 1412.95891700004
+ },
+ {
+ "questionId": "q82",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 138,
- "latencyMs": 2332.6303330000082
+ "latencyMs": 2372.7108339999686
},
{
- "questionId": "q87",
+ "questionId": "q82",
"format": "toon",
"model": "claude-haiku-4-5",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 10,
+ "latencyMs": 1261.033374999999
+ },
+ {
+ "questionId": "q82",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 1507.3635420000064
+ },
+ {
+ "questionId": "q82",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 266,
+ "latencyMs": 4028.793000000005
+ },
+ {
+ "questionId": "q82",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 10,
+ "latencyMs": 1685.5001250000205
+ },
+ {
+ "questionId": "q82",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 4534.999041000032
+ },
+ {
+ "questionId": "q82",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 202,
+ "latencyMs": 3417.137708000024
+ },
+ {
+ "questionId": "q82",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 10,
+ "latencyMs": 1361.4405830000178
+ },
+ {
+ "questionId": "q82",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 2432.530415999994
+ },
+ {
+ "questionId": "q82",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 202,
+ "latencyMs": 5838.863542000006
+ },
+ {
+ "questionId": "q82",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 10,
+ "latencyMs": 1243.5272090000217
+ },
+ {
+ "questionId": "q82",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Maxine Zemlak",
+ "actual": "Maxine Zemlak",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 3514.3164579999866
+ },
+ {
+ "questionId": "q83",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 202,
+ "latencyMs": 6595.4543330000015
+ },
+ {
+ "questionId": "q83",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1498.3081660000025
+ },
+ {
+ "questionId": "q83",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 2013.447125000006
+ },
+ {
+ "questionId": "q83",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 202,
+ "latencyMs": 3336.2056250000023
+ },
+ {
+ "questionId": "q83",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 7,
- "latencyMs": 1175.6766249999928
+ "latencyMs": 1070.626500000013
},
{
- "questionId": "q87",
+ "questionId": "q83",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 1394.0314590000198
+ },
+ {
+ "questionId": "q83",
"format": "csv",
"model": "gpt-5-nano",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
"inputTokens": 6781,
+ "outputTokens": 266,
+ "latencyMs": 4194.179917000001
+ },
+ {
+ "questionId": "q83",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1139.8458330000285
+ },
+ {
+ "questionId": "q83",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 3437.878625000012
+ },
+ {
+ "questionId": "q83",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 11037,
"outputTokens": 458,
- "latencyMs": 4252.623416000017
+ "latencyMs": 13446.595333000005
},
{
- "questionId": "q87",
- "format": "csv",
+ "questionId": "q83",
+ "format": "xml",
"model": "claude-haiku-4-5",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
- "inputTokens": 8414,
+ "inputTokens": 13380,
"outputTokens": 7,
- "latencyMs": 1297.546416999976
+ "latencyMs": 2680.581542
},
{
- "questionId": "q87",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "questionId": "q83",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 74,
- "latencyMs": 2264.2770829999936
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 1203.1962920000078
},
{
- "questionId": "q87",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 7,
- "latencyMs": 1055.0764170000039
- },
- {
- "questionId": "q87",
+ "questionId": "q83",
"format": "yaml",
"model": "gpt-5-nano",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 138,
- "latencyMs": 3193.2753749999974
+ "latencyMs": 4011.303083000006
},
{
- "questionId": "q87",
+ "questionId": "q83",
"format": "yaml",
"model": "claude-haiku-4-5",
- "expected": "Malcolm Erdman",
- "actual": "Malcolm Erdman",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 7,
- "latencyMs": 1912.7229999999981
+ "latencyMs": 1039.7921659999993
},
{
- "questionId": "q88",
+ "questionId": "q83",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Emanuel Littel",
+ "actual": "Emanuel Littel",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 2480.1701660000253
+ },
+ {
+ "questionId": "q84",
"format": "json",
"model": "gpt-5-nano",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 138,
- "latencyMs": 2147.5894160000025
+ "outputTokens": 136,
+ "latencyMs": 4735.566333000024
},
{
- "questionId": "q88",
+ "questionId": "q84",
"format": "json",
"model": "claude-haiku-4-5",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1280.546875
+ },
+ {
+ "questionId": "q84",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 2,
+ "latencyMs": 1865.3758329999982
+ },
+ {
+ "questionId": "q84",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 200,
+ "latencyMs": 2902.7560829999857
+ },
+ {
+ "questionId": "q84",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 7,
+ "latencyMs": 1081.401291999966
+ },
+ {
+ "questionId": "q84",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 2,
+ "latencyMs": 1030.250207999954
+ },
+ {
+ "questionId": "q84",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 264,
+ "latencyMs": 3382.8625409999513
+ },
+ {
+ "questionId": "q84",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1059.5115829999559
+ },
+ {
+ "questionId": "q84",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 2,
+ "latencyMs": 4047.5788749999483
+ },
+ {
+ "questionId": "q84",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 264,
+ "latencyMs": 4623.2353329999605
+ },
+ {
+ "questionId": "q84",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 7,
+ "latencyMs": 1069.810291999951
+ },
+ {
+ "questionId": "q84",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 2,
+ "latencyMs": 1081.8097089999937
+ },
+ {
+ "questionId": "q84",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 264,
+ "latencyMs": 8454.222833000007
+ },
+ {
+ "questionId": "q84",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 7,
+ "latencyMs": 1248.3214579999913
+ },
+ {
+ "questionId": "q84",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Andrew Kling",
+ "actual": "Andrew Kling",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 2,
+ "latencyMs": 3052.669667000009
+ },
+ {
+ "questionId": "q85",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 139,
+ "latencyMs": 6477.822083999985
+ },
+ {
+ "questionId": "q85",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 9,
- "latencyMs": 1377.5190409999923
+ "latencyMs": 1177.795124999946
},
{
- "questionId": "q88",
+ "questionId": "q85",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 2578.6090829999885
+ },
+ {
+ "questionId": "q85",
"format": "toon",
"model": "gpt-5-nano",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 6013,
- "outputTokens": 202,
- "latencyMs": 4472.317459000013
+ "outputTokens": 139,
+ "latencyMs": 11574.13941599999
},
{
- "questionId": "q88",
+ "questionId": "q85",
"format": "toon",
"model": "claude-haiku-4-5",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 9,
- "latencyMs": 1376.0682919999817
+ "latencyMs": 1197.251500000013
},
{
- "questionId": "q88",
+ "questionId": "q85",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 902.3842500000028
+ },
+ {
+ "questionId": "q85",
"format": "csv",
"model": "gpt-5-nano",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 202,
- "latencyMs": 6952.122459000006
+ "outputTokens": 267,
+ "latencyMs": 5139.725291999988
},
{
- "questionId": "q88",
+ "questionId": "q85",
"format": "csv",
"model": "claude-haiku-4-5",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 9,
- "latencyMs": 1178.8732909999962
+ "latencyMs": 1539.0101670000004
},
{
- "questionId": "q88",
- "format": "markdown-kv",
+ "questionId": "q85",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 5590.813292000035
+ },
+ {
+ "questionId": "q85",
+ "format": "xml",
"model": "gpt-5-nano",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 266,
- "latencyMs": 3619.214917000005
+ "inputTokens": 11037,
+ "outputTokens": 459,
+ "latencyMs": 5332.691916999989
},
{
- "questionId": "q88",
- "format": "markdown-kv",
+ "questionId": "q85",
+ "format": "xml",
"model": "claude-haiku-4-5",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 9,
- "latencyMs": 1212.3732920000039
+ "latencyMs": 1692.4654169999994
},
{
- "questionId": "q88",
+ "questionId": "q85",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 981.0666250000359
+ },
+ {
+ "questionId": "q85",
"format": "yaml",
"model": "gpt-5-nano",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 202,
- "latencyMs": 5169.327332999994
+ "outputTokens": 331,
+ "latencyMs": 4571.373957999982
},
{
- "questionId": "q88",
+ "questionId": "q85",
"format": "yaml",
"model": "claude-haiku-4-5",
- "expected": "Fannie Skiles",
- "actual": "Fannie Skiles",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 9,
- "latencyMs": 1452.6941670000087
+ "latencyMs": 1186.5836659999914
},
{
- "questionId": "q89",
+ "questionId": "q85",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Morris O'Hara",
+ "actual": "Morris O'Hara",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 3083.60266699997
+ },
+ {
+ "questionId": "q86",
"format": "json",
"model": "gpt-5-nano",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 395,
- "latencyMs": 3384.798125000001
+ "outputTokens": 203,
+ "latencyMs": 6090.284833999991
},
{
- "questionId": "q89",
+ "questionId": "q86",
"format": "json",
"model": "claude-haiku-4-5",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 11907,
- "outputTokens": 10,
- "latencyMs": 1241.960665999999
+ "outputTokens": 9,
+ "latencyMs": 1271.532459000009
},
{
- "questionId": "q89",
+ "questionId": "q86",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 5,
+ "latencyMs": 1557.2529580000555
+ },
+ {
+ "questionId": "q86",
"format": "toon",
"model": "gpt-5-nano",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 6013,
- "outputTokens": 331,
- "latencyMs": 4747.914124999981
+ "outputTokens": 203,
+ "latencyMs": 3250.3466250000056
},
{
- "questionId": "q89",
+ "questionId": "q86",
"format": "toon",
"model": "claude-haiku-4-5",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 6993,
- "outputTokens": 10,
- "latencyMs": 1302.8907080000208
+ "outputTokens": 9,
+ "latencyMs": 1201.9044580000336
},
{
- "questionId": "q89",
+ "questionId": "q86",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 5,
+ "latencyMs": 874.0206250000047
+ },
+ {
+ "questionId": "q86",
"format": "csv",
"model": "gpt-5-nano",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 331,
- "latencyMs": 3532.4660830000066
+ "outputTokens": 203,
+ "latencyMs": 9473.656583999982
},
{
- "questionId": "q89",
+ "questionId": "q86",
"format": "csv",
"model": "claude-haiku-4-5",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 8414,
- "outputTokens": 10,
- "latencyMs": 1203.086540999997
+ "outputTokens": 9,
+ "latencyMs": 1253.2470420000027
},
{
- "questionId": "q89",
- "format": "markdown-kv",
+ "questionId": "q86",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 5,
+ "latencyMs": 2383.5771250000107
+ },
+ {
+ "questionId": "q86",
+ "format": "xml",
"model": "gpt-5-nano",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 331,
- "latencyMs": 4074.5077089999977
+ "inputTokens": 11037,
+ "outputTokens": 267,
+ "latencyMs": 6551.133333000005
},
{
- "questionId": "q89",
- "format": "markdown-kv",
+ "questionId": "q86",
+ "format": "xml",
"model": "claude-haiku-4-5",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 10,
- "latencyMs": 1345.891499999998
+ "inputTokens": 13380,
+ "outputTokens": 9,
+ "latencyMs": 1116.6841669999994
},
{
- "questionId": "q89",
+ "questionId": "q86",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 5,
+ "latencyMs": 2014.7545000000391
+ },
+ {
+ "questionId": "q86",
"format": "yaml",
"model": "gpt-5-nano",
- "expected": "Sonja Emmerich",
- "actual": "Sonja Emmerich",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 75,
- "latencyMs": 1885.0838330000115
+ "latencyMs": 2472.76654099999
+ },
+ {
+ "questionId": "q86",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 9,
+ "latencyMs": 1175.5650410000235
+ },
+ {
+ "questionId": "q86",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Elijah Franecki",
+ "actual": "Elijah Franecki",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 5,
+ "latencyMs": 1389.2444590000086
+ },
+ {
+ "questionId": "q87",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 266,
+ "latencyMs": 4308.579541000014
+ },
+ {
+ "questionId": "q87",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 7,
+ "latencyMs": 1423.6036659999518
+ },
+ {
+ "questionId": "q87",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 3,
+ "latencyMs": 2240.639916999964
+ },
+ {
+ "questionId": "q87",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 202,
+ "latencyMs": 3581.8104590000003
+ },
+ {
+ "questionId": "q87",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 7,
+ "latencyMs": 1104.380625000049
+ },
+ {
+ "questionId": "q87",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 3,
+ "latencyMs": 1940.0862910000142
+ },
+ {
+ "questionId": "q87",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 202,
+ "latencyMs": 4205.585124999983
+ },
+ {
+ "questionId": "q87",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 7,
+ "latencyMs": 1249.4729159999988
+ },
+ {
+ "questionId": "q87",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 3,
+ "latencyMs": 3377.5699580000364
+ },
+ {
+ "questionId": "q87",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 266,
+ "latencyMs": 4378.770917000016
+ },
+ {
+ "questionId": "q87",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 7,
+ "latencyMs": 1283.0947499999893
+ },
+ {
+ "questionId": "q87",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 3,
+ "latencyMs": 1649.8935409999685
+ },
+ {
+ "questionId": "q87",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 138,
+ "latencyMs": 4596.174417000031
+ },
+ {
+ "questionId": "q87",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 7,
+ "latencyMs": 1117.4153749999823
+ },
+ {
+ "questionId": "q87",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Malcolm Erdman",
+ "actual": "Malcolm Erdman",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 3,
+ "latencyMs": 2916.328375000041
+ },
+ {
+ "questionId": "q88",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 202,
+ "latencyMs": 6150.88295900001
+ },
+ {
+ "questionId": "q88",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 9,
+ "latencyMs": 3154.254249999998
+ },
+ {
+ "questionId": "q88",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 1595.2374999999884
+ },
+ {
+ "questionId": "q88",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 138,
+ "latencyMs": 2656.5287499999977
+ },
+ {
+ "questionId": "q88",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 9,
+ "latencyMs": 1990.0005419999943
+ },
+ {
+ "questionId": "q88",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 2321.1809169999906
+ },
+ {
+ "questionId": "q88",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 266,
+ "latencyMs": 3915.817207999993
+ },
+ {
+ "questionId": "q88",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 9,
+ "latencyMs": 1246.5829580000136
+ },
+ {
+ "questionId": "q88",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 4516.533583000011
+ },
+ {
+ "questionId": "q88",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 202,
+ "latencyMs": 5059.808416999993
+ },
+ {
+ "questionId": "q88",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 9,
+ "latencyMs": 1927.3214579999913
+ },
+ {
+ "questionId": "q88",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 1175.4753750000382
+ },
+ {
+ "questionId": "q88",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 138,
+ "latencyMs": 6212.469625000027
+ },
+ {
+ "questionId": "q88",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 8385,
+ "outputTokens": 9,
+ "latencyMs": 1526.3683329999913
+ },
+ {
+ "questionId": "q88",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Fannie Skiles",
+ "actual": "Fannie Skiles",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 3560.557833000028
+ },
+ {
+ "questionId": "q89",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 9739,
+ "outputTokens": 331,
+ "latencyMs": 4333.316457999987
+ },
+ {
+ "questionId": "q89",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 10,
+ "latencyMs": 1150.7639999999665
+ },
+ {
+ "questionId": "q89",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 2529.932083999971
+ },
+ {
+ "questionId": "q89",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 203,
+ "latencyMs": 3581.042041000037
+ },
+ {
+ "questionId": "q89",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 10,
+ "latencyMs": 1568.8872919999994
+ },
+ {
+ "questionId": "q89",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 1319.7952499999665
+ },
+ {
+ "questionId": "q89",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 331,
+ "latencyMs": 3538.970499999996
+ },
+ {
+ "questionId": "q89",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 10,
+ "latencyMs": 1241.5265000000363
+ },
+ {
+ "questionId": "q89",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 3917.9875000000466
+ },
+ {
+ "questionId": "q89",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 395,
+ "latencyMs": 7058.911167000013
+ },
+ {
+ "questionId": "q89",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 10,
+ "latencyMs": 1205.0128329999861
+ },
+ {
+ "questionId": "q89",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 1415.7616670000134
+ },
+ {
+ "questionId": "q89",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 139,
+ "latencyMs": 2635.5764160000253
},
{
"questionId": "q89",
@@ -9787,7 +14671,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 10,
- "latencyMs": 1182.5891669999983
+ "latencyMs": 1153.0579160000198
+ },
+ {
+ "questionId": "q89",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Sonja Emmerich",
+ "actual": "Sonja Emmerich",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 2894.0762920000125
},
{
"questionId": "q90",
@@ -9798,7 +14693,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 140,
- "latencyMs": 2772.3258339999884
+ "latencyMs": 6845.755584000028
},
{
"questionId": "q90",
@@ -9809,7 +14704,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 10,
- "latencyMs": 1424.9674579999992
+ "latencyMs": 2363.831957999966
+ },
+ {
+ "questionId": "q90",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Frank Emmerich DVM",
+ "actual": "Frank Emmerich DVM",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 5,
+ "latencyMs": 2646.4628749999683
},
{
"questionId": "q90",
@@ -9819,8 +14725,8 @@
"actual": "Frank Emmerich DVM",
"isCorrect": true,
"inputTokens": 6013,
- "outputTokens": 204,
- "latencyMs": 2900.4731660000107
+ "outputTokens": 140,
+ "latencyMs": 2236.9238749999786
},
{
"questionId": "q90",
@@ -9831,7 +14737,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 10,
- "latencyMs": 2815.817249999993
+ "latencyMs": 1023.8160830000415
+ },
+ {
+ "questionId": "q90",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Frank Emmerich DVM",
+ "actual": "Frank Emmerich DVM",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 5,
+ "latencyMs": 1165.2285000000265
},
{
"questionId": "q90",
@@ -9842,7 +14759,7 @@
"isCorrect": true,
"inputTokens": 6781,
"outputTokens": 268,
- "latencyMs": 3637.2442089999968
+ "latencyMs": 4066.1428750000196
},
{
"questionId": "q90",
@@ -9853,29 +14770,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 10,
- "latencyMs": 1104.2333339999896
+ "latencyMs": 1570.4565409999923
},
{
"questionId": "q90",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Frank Emmerich DVM",
+ "actual": "Frank Emmerich DVM",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 5,
+ "latencyMs": 3472.6348330000183
+ },
+ {
+ "questionId": "q90",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Frank Emmerich DVM",
"actual": "Frank Emmerich DVM",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 396,
- "latencyMs": 8213.703791999986
+ "inputTokens": 11037,
+ "outputTokens": 268,
+ "latencyMs": 3361.3982500000275
},
{
"questionId": "q90",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Frank Emmerich DVM",
"actual": "Frank Emmerich DVM",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 10,
- "latencyMs": 2875.9923749999725
+ "latencyMs": 1247.454334000009
+ },
+ {
+ "questionId": "q90",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Frank Emmerich DVM",
+ "actual": "Frank Emmerich DVM",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 5,
+ "latencyMs": 1382.5874590000021
},
{
"questionId": "q90",
@@ -9886,7 +14825,7 @@
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 140,
- "latencyMs": 2809.8342080000148
+ "latencyMs": 2949.110708000022
},
{
"questionId": "q90",
@@ -9897,7 +14836,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 10,
- "latencyMs": 1306.0824999999895
+ "latencyMs": 1160.699499999988
+ },
+ {
+ "questionId": "q90",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Frank Emmerich DVM",
+ "actual": "Frank Emmerich DVM",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 5,
+ "latencyMs": 3016.852790999983
},
{
"questionId": "q91",
@@ -9907,96 +14857,140 @@
"actual": "Ronald Collins",
"isCorrect": true,
"inputTokens": 9739,
- "outputTokens": 265,
- "latencyMs": 3632.680000000022
- },
- {
- "questionId": "q91",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 5,
- "latencyMs": 1446.0535420000087
- },
- {
- "questionId": "q91",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 201,
- "latencyMs": 2629.6447500000068
- },
- {
- "questionId": "q91",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 6993,
- "outputTokens": 5,
- "latencyMs": 1387.298958999978
- },
- {
- "questionId": "q91",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 6781,
- "outputTokens": 457,
- "latencyMs": 8303.644042
- },
- {
- "questionId": "q91",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 8414,
- "outputTokens": 5,
- "latencyMs": 1178.2771250000224
- },
- {
- "questionId": "q91",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 329,
- "latencyMs": 3967.7135410000046
- },
- {
- "questionId": "q91",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 9289,
- "outputTokens": 5,
- "latencyMs": 1278.0479160000104
- },
- {
- "questionId": "q91",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "Ronald Collins",
- "actual": "Ronald Collins",
- "isCorrect": true,
- "inputTokens": 7373,
"outputTokens": 73,
- "latencyMs": 1974.7658750000119
+ "latencyMs": 2769.32262500003
+ },
+ {
+ "questionId": "q91",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 5,
+ "latencyMs": 1252.1112919999869
+ },
+ {
+ "questionId": "q91",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 2,
+ "latencyMs": 1906.2817499999655
+ },
+ {
+ "questionId": "q91",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 6013,
+ "outputTokens": 201,
+ "latencyMs": 5391.403708000027
+ },
+ {
+ "questionId": "q91",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 6993,
+ "outputTokens": 5,
+ "latencyMs": 1126.4195000000182
+ },
+ {
+ "questionId": "q91",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 2,
+ "latencyMs": 1148.1653749999823
+ },
+ {
+ "questionId": "q91",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 6781,
+ "outputTokens": 265,
+ "latencyMs": 3649.6608329999726
+ },
+ {
+ "questionId": "q91",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 8414,
+ "outputTokens": 5,
+ "latencyMs": 1054.9641670000274
+ },
+ {
+ "questionId": "q91",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 2,
+ "latencyMs": 4520.085083000013
+ },
+ {
+ "questionId": "q91",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 11037,
+ "outputTokens": 137,
+ "latencyMs": 3783.5575830000453
+ },
+ {
+ "questionId": "q91",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 13380,
+ "outputTokens": 5,
+ "latencyMs": 1200.0155000000377
+ },
+ {
+ "questionId": "q91",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 2,
+ "latencyMs": 1914.0702499999898
+ },
+ {
+ "questionId": "q91",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 7373,
+ "outputTokens": 265,
+ "latencyMs": 8789.486250000016
},
{
"questionId": "q91",
@@ -10007,7 +15001,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 5,
- "latencyMs": 1496.9746670000022
+ "latencyMs": 1445.0254999999888
+ },
+ {
+ "questionId": "q91",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Ronald Collins",
+ "actual": "Ronald Collins",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 2,
+ "latencyMs": 3330.7725830000127
},
{
"questionId": "q92",
@@ -10018,7 +15023,7 @@
"isCorrect": true,
"inputTokens": 9739,
"outputTokens": 201,
- "latencyMs": 4246.4962499999965
+ "latencyMs": 6413.151542000007
},
{
"questionId": "q92",
@@ -10029,7 +15034,18 @@
"isCorrect": true,
"inputTokens": 11907,
"outputTokens": 8,
- "latencyMs": 1322.2766660000198
+ "latencyMs": 1204.1578749999753
+ },
+ {
+ "questionId": "q92",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Jeannie Klein",
+ "actual": "Jeannie Klein",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 3,
+ "latencyMs": 1412.2799170000362
},
{
"questionId": "q92",
@@ -10040,7 +15056,7 @@
"isCorrect": true,
"inputTokens": 6013,
"outputTokens": 137,
- "latencyMs": 2135.097083999979
+ "latencyMs": 2630.434041999979
},
{
"questionId": "q92",
@@ -10051,7 +15067,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 8,
- "latencyMs": 1213.9765000000189
+ "latencyMs": 1546.8669579999987
+ },
+ {
+ "questionId": "q92",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Jeannie Klein",
+ "actual": "Jeannie Klein",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 3,
+ "latencyMs": 2373.892125000013
},
{
"questionId": "q92",
@@ -10061,8 +15088,8 @@
"actual": "Jeannie Klein",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 265,
- "latencyMs": 3583.0762920000125
+ "outputTokens": 201,
+ "latencyMs": 3202.2820420000353
},
{
"questionId": "q92",
@@ -10073,29 +15100,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 8,
- "latencyMs": 1353.168249999988
+ "latencyMs": 1227.2948330000509
},
{
"questionId": "q92",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Jeannie Klein",
+ "actual": "Jeannie Klein",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 3,
+ "latencyMs": 3743.526792000048
+ },
+ {
+ "questionId": "q92",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Jeannie Klein",
"actual": "Jeannie Klein",
"isCorrect": true,
- "inputTokens": 9158,
+ "inputTokens": 11037,
"outputTokens": 201,
- "latencyMs": 3724.366249999992
+ "latencyMs": 3238.171458000026
},
{
"questionId": "q92",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Jeannie Klein",
"actual": "Jeannie Klein",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 8,
- "latencyMs": 1239.5215000000026
+ "latencyMs": 1180.7857080000103
+ },
+ {
+ "questionId": "q92",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Jeannie Klein",
+ "actual": "Jeannie Klein",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 3,
+ "latencyMs": 1142.4927089999546
},
{
"questionId": "q92",
@@ -10106,7 +15155,7 @@
"isCorrect": true,
"inputTokens": 7373,
"outputTokens": 137,
- "latencyMs": 2863.772667000012
+ "latencyMs": 3021.9724590000114
},
{
"questionId": "q92",
@@ -10117,7 +15166,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 8,
- "latencyMs": 1297.5507919999945
+ "latencyMs": 1821.3516250000102
+ },
+ {
+ "questionId": "q92",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Jeannie Klein",
+ "actual": "Jeannie Klein",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 3,
+ "latencyMs": 2796.1425000000163
},
{
"questionId": "q93",
@@ -10127,30 +15187,41 @@
"actual": "Joshua Watsica",
"isCorrect": true,
"inputTokens": 9739,
+ "outputTokens": 138,
+ "latencyMs": 2788.065082999994
+ },
+ {
+ "questionId": "q93",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 11907,
+ "outputTokens": 8,
+ "latencyMs": 1367.4712089999812
+ },
+ {
+ "questionId": "q93",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 12113,
+ "outputTokens": 4,
+ "latencyMs": 1443.3402910000295
+ },
+ {
+ "questionId": "q93",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 6013,
"outputTokens": 202,
- "latencyMs": 2533.5459160000028
- },
- {
- "questionId": "q93",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "Joshua Watsica",
- "actual": "Joshua Watsica",
- "isCorrect": true,
- "inputTokens": 11907,
- "outputTokens": 8,
- "latencyMs": 1313.4649999999965
- },
- {
- "questionId": "q93",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "Joshua Watsica",
- "actual": "Joshua Watsica",
- "isCorrect": true,
- "inputTokens": 6013,
- "outputTokens": 74,
- "latencyMs": 1609.448166999995
+ "latencyMs": 3654.0896250000224
},
{
"questionId": "q93",
@@ -10161,7 +15232,18 @@
"isCorrect": true,
"inputTokens": 6993,
"outputTokens": 8,
- "latencyMs": 1257.2229999999981
+ "latencyMs": 1028.997875000001
+ },
+ {
+ "questionId": "q93",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 7201,
+ "outputTokens": 4,
+ "latencyMs": 996.1445419999654
},
{
"questionId": "q93",
@@ -10171,8 +15253,8 @@
"actual": "Joshua Watsica",
"isCorrect": true,
"inputTokens": 6781,
- "outputTokens": 458,
- "latencyMs": 5294.154332999984
+ "outputTokens": 266,
+ "latencyMs": 6677.9684579999885
},
{
"questionId": "q93",
@@ -10183,29 +15265,51 @@
"isCorrect": true,
"inputTokens": 8414,
"outputTokens": 8,
- "latencyMs": 1363.172208999982
+ "latencyMs": 1639.9640409999993
},
{
"questionId": "q93",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 7838,
+ "outputTokens": 4,
+ "latencyMs": 1652.2167079999927
+ },
+ {
+ "questionId": "q93",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "Joshua Watsica",
"actual": "Joshua Watsica",
"isCorrect": true,
- "inputTokens": 9158,
- "outputTokens": 74,
- "latencyMs": 2154.742499999993
+ "inputTokens": 11037,
+ "outputTokens": 202,
+ "latencyMs": 3802.7754580000183
},
{
"questionId": "q93",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "Joshua Watsica",
"actual": "Joshua Watsica",
"isCorrect": true,
- "inputTokens": 9289,
+ "inputTokens": 13380,
"outputTokens": 8,
- "latencyMs": 1509.8229580000043
+ "latencyMs": 3327.393792000017
+ },
+ {
+ "questionId": "q93",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 13451,
+ "outputTokens": 4,
+ "latencyMs": 1257.9510420000297
},
{
"questionId": "q93",
@@ -10215,8 +15319,8 @@
"actual": "Joshua Watsica",
"isCorrect": true,
"inputTokens": 7373,
- "outputTokens": 74,
- "latencyMs": 2010.5185419999762
+ "outputTokens": 202,
+ "latencyMs": 3074.6058750000084
},
{
"questionId": "q93",
@@ -10227,7 +15331,18 @@
"isCorrect": true,
"inputTokens": 8385,
"outputTokens": 8,
- "latencyMs": 1193.5151659999974
+ "latencyMs": 1146.4290829999954
+ },
+ {
+ "questionId": "q93",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "Joshua Watsica",
+ "actual": "Joshua Watsica",
+ "isCorrect": true,
+ "inputTokens": 8427,
+ "outputTokens": 4,
+ "latencyMs": 1712.0292920000502
},
{
"questionId": "q94",
@@ -10237,52 +15352,74 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 9735,
- "outputTokens": 1031,
- "latencyMs": 9550.510582999996
- },
- {
- "questionId": "q94",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 11902,
- "outputTokens": 5,
- "latencyMs": 1146.0822499999776
- },
- {
- "questionId": "q94",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6009,
- "outputTokens": 775,
- "latencyMs": 6479.700542000006
- },
- {
- "questionId": "q94",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 6988,
- "outputTokens": 5,
- "latencyMs": 1329.610708000022
- },
- {
- "questionId": "q94",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6777,
"outputTokens": 967,
- "latencyMs": 15240.216207999998
+ "latencyMs": 11158.31029200001
+ },
+ {
+ "questionId": "q94",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 11902,
+ "outputTokens": 5,
+ "latencyMs": 1969.3274160000146
+ },
+ {
+ "questionId": "q94",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 12107,
+ "outputTokens": 1,
+ "latencyMs": 1012.6363329999731
+ },
+ {
+ "questionId": "q94",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6009,
+ "outputTokens": 839,
+ "latencyMs": 12387.267332999967
+ },
+ {
+ "questionId": "q94",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 6988,
+ "outputTokens": 5,
+ "latencyMs": 1146.578125
+ },
+ {
+ "questionId": "q94",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7195,
+ "outputTokens": 2,
+ "latencyMs": 6065.854290999996
+ },
+ {
+ "questionId": "q94",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6777,
+ "outputTokens": 583,
+ "latencyMs": 5722.737124999985
},
{
"questionId": "q94",
@@ -10293,29 +15430,51 @@
"isCorrect": false,
"inputTokens": 8409,
"outputTokens": 5,
- "latencyMs": 1203.151125000004
+ "latencyMs": 1162.2037910000072
},
{
"questionId": "q94",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7832,
+ "outputTokens": 2,
+ "latencyMs": 5346.4215829999885
+ },
+ {
+ "questionId": "q94",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9154,
- "outputTokens": 583,
- "latencyMs": 6073.186583000002
+ "inputTokens": 11033,
+ "outputTokens": 967,
+ "latencyMs": 9711.181042000011
},
{
"questionId": "q94",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "10",
"actual": "8",
"isCorrect": false,
- "inputTokens": 9284,
+ "inputTokens": 13375,
"outputTokens": 5,
- "latencyMs": 1452.6655419999734
+ "latencyMs": 1180.9850839999854
+ },
+ {
+ "questionId": "q94",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 13445,
+ "outputTokens": 2,
+ "latencyMs": 6629.622541000019
},
{
"questionId": "q94",
@@ -10325,8 +15484,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 7369,
- "outputTokens": 647,
- "latencyMs": 7084.941665999999
+ "outputTokens": 583,
+ "latencyMs": 5019.671374999976
},
{
"questionId": "q94",
@@ -10337,7 +15496,18 @@
"isCorrect": false,
"inputTokens": 8380,
"outputTokens": 5,
- "latencyMs": 1120.7099159999925
+ "latencyMs": 1167.7568749999627
+ },
+ {
+ "questionId": "q94",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 8421,
+ "outputTokens": 1,
+ "latencyMs": 1625.168708000041
},
{
"questionId": "q95",
@@ -10347,8 +15517,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 9735,
- "outputTokens": 903,
- "latencyMs": 8906.334791000001
+ "outputTokens": 775,
+ "latencyMs": 7411.724082999979
},
{
"questionId": "q95",
@@ -10359,7 +15529,18 @@
"isCorrect": false,
"inputTokens": 11902,
"outputTokens": 5,
- "latencyMs": 1109.434333000012
+ "latencyMs": 1554.4648750000051
+ },
+ {
+ "questionId": "q95",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 12107,
+ "outputTokens": 2,
+ "latencyMs": 2038.4110000000219
},
{
"questionId": "q95",
@@ -10369,8 +15550,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 6009,
- "outputTokens": 391,
- "latencyMs": 4955.000415999995
+ "outputTokens": 455,
+ "latencyMs": 8813.801208000048
},
{
"questionId": "q95",
@@ -10381,7 +15562,18 @@
"isCorrect": false,
"inputTokens": 6988,
"outputTokens": 5,
- "latencyMs": 1040.817624999996
+ "latencyMs": 1344.8304580000113
+ },
+ {
+ "questionId": "q95",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 7195,
+ "outputTokens": 1,
+ "latencyMs": 795.6426249999786
},
{
"questionId": "q95",
@@ -10391,8 +15583,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 6777,
- "outputTokens": 775,
- "latencyMs": 8308.952791000018
+ "outputTokens": 903,
+ "latencyMs": 9739.22754199995
},
{
"questionId": "q95",
@@ -10403,29 +15595,51 @@
"isCorrect": false,
"inputTokens": 8409,
"outputTokens": 5,
- "latencyMs": 1128.542833000014
+ "latencyMs": 1163.627124999999
},
{
"questionId": "q95",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7832,
+ "outputTokens": 2,
+ "latencyMs": 4444.457624999981
+ },
+ {
+ "questionId": "q95",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9154,
- "outputTokens": 775,
- "latencyMs": 7118.855291000014
+ "inputTokens": 11033,
+ "outputTokens": 1415,
+ "latencyMs": 14405.558917000017
},
{
"questionId": "q95",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "10",
"actual": "8",
"isCorrect": false,
- "inputTokens": 9284,
+ "inputTokens": 13375,
"outputTokens": 5,
- "latencyMs": 1232.1081249999988
+ "latencyMs": 1603.5181249999441
+ },
+ {
+ "questionId": "q95",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 13445,
+ "outputTokens": 1,
+ "latencyMs": 1466.009625000006
},
{
"questionId": "q95",
@@ -10435,8 +15649,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 7369,
- "outputTokens": 647,
- "latencyMs": 6776.706208000018
+ "outputTokens": 583,
+ "latencyMs": 50147.72520799999
},
{
"questionId": "q95",
@@ -10447,7 +15661,18 @@
"isCorrect": false,
"inputTokens": 8380,
"outputTokens": 5,
- "latencyMs": 1677.1033330000064
+ "latencyMs": 1600.4076660000137
+ },
+ {
+ "questionId": "q95",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 8421,
+ "outputTokens": 1,
+ "latencyMs": 1974.6425419999869
},
{
"questionId": "q96",
@@ -10457,162 +15682,239 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 9736,
- "outputTokens": 583,
- "latencyMs": 5866.636624999985
- },
- {
- "questionId": "q96",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 11902,
- "outputTokens": 5,
- "latencyMs": 1574.224125000008
- },
- {
- "questionId": "q96",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6010,
- "outputTokens": 711,
- "latencyMs": 7998.43637499999
- },
- {
- "questionId": "q96",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "7",
- "isCorrect": false,
- "inputTokens": 6988,
- "outputTokens": 5,
- "latencyMs": 1175.3050419999927
- },
- {
- "questionId": "q96",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6778,
- "outputTokens": 647,
- "latencyMs": 6424.974583000003
- },
- {
- "questionId": "q96",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 8409,
- "outputTokens": 5,
- "latencyMs": 1352.1832500000019
- },
- {
- "questionId": "q96",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 9155,
- "outputTokens": 647,
- "latencyMs": 6132.921792000008
- },
- {
- "questionId": "q96",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "8",
- "isCorrect": false,
- "inputTokens": 9284,
- "outputTokens": 5,
- "latencyMs": 1241.7496250000258
- },
- {
- "questionId": "q96",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 7370,
- "outputTokens": 455,
- "latencyMs": 8074.935457999993
- },
- {
- "questionId": "q96",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "7",
- "isCorrect": false,
- "inputTokens": 8380,
- "outputTokens": 5,
- "latencyMs": 1294.4225830000069
- },
- {
- "questionId": "q97",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 9736,
- "outputTokens": 775,
- "latencyMs": 7724.665375000011
- },
- {
- "questionId": "q97",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 11902,
- "outputTokens": 5,
- "latencyMs": 1450.864333000005
- },
- {
- "questionId": "q97",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6010,
- "outputTokens": 711,
- "latencyMs": 5055.026333999995
- },
- {
- "questionId": "q97",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6988,
- "outputTokens": 5,
- "latencyMs": 1177.2059999999765
- },
- {
- "questionId": "q97",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "10",
- "actual": "10",
- "isCorrect": true,
- "inputTokens": 6778,
"outputTokens": 839,
- "latencyMs": 7951.241416999983
+ "latencyMs": 6029.78350000002
+ },
+ {
+ "questionId": "q96",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 11902,
+ "outputTokens": 5,
+ "latencyMs": 1108.4398330000113
+ },
+ {
+ "questionId": "q96",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 12108,
+ "outputTokens": 1,
+ "latencyMs": 1581.965291999979
+ },
+ {
+ "questionId": "q96",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6010,
+ "outputTokens": 647,
+ "latencyMs": 21748.776332999987
+ },
+ {
+ "questionId": "q96",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "7",
+ "isCorrect": false,
+ "inputTokens": 6988,
+ "outputTokens": 5,
+ "latencyMs": 2333.9817080000066
+ },
+ {
+ "questionId": "q96",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 7196,
+ "outputTokens": 1,
+ "latencyMs": 1115.266958000022
+ },
+ {
+ "questionId": "q96",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6778,
+ "outputTokens": 583,
+ "latencyMs": 5761.870166999986
+ },
+ {
+ "questionId": "q96",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 8409,
+ "outputTokens": 5,
+ "latencyMs": 1110.2957919999608
+ },
+ {
+ "questionId": "q96",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7833,
+ "outputTokens": 2,
+ "latencyMs": 5206.065542000055
+ },
+ {
+ "questionId": "q96",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 11034,
+ "outputTokens": 839,
+ "latencyMs": 10213.124458000006
+ },
+ {
+ "questionId": "q96",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 13375,
+ "outputTokens": 5,
+ "latencyMs": 1085.2472919999855
+ },
+ {
+ "questionId": "q96",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 13446,
+ "outputTokens": 2,
+ "latencyMs": 6148.1957500000135
+ },
+ {
+ "questionId": "q96",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7370,
+ "outputTokens": 647,
+ "latencyMs": 10606.282000000007
+ },
+ {
+ "questionId": "q96",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "7",
+ "isCorrect": false,
+ "inputTokens": 8380,
+ "outputTokens": 5,
+ "latencyMs": 1061.5612079999992
+ },
+ {
+ "questionId": "q96",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "8",
+ "isCorrect": false,
+ "inputTokens": 8422,
+ "outputTokens": 1,
+ "latencyMs": 940.8403330000001
+ },
+ {
+ "questionId": "q97",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 9736,
+ "outputTokens": 647,
+ "latencyMs": 6429.81362500001
+ },
+ {
+ "questionId": "q97",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 11902,
+ "outputTokens": 5,
+ "latencyMs": 1373.5127499999944
+ },
+ {
+ "questionId": "q97",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 12107,
+ "outputTokens": 1,
+ "latencyMs": 1618.8752080000122
+ },
+ {
+ "questionId": "q97",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6010,
+ "outputTokens": 583,
+ "latencyMs": 5288.105207999994
+ },
+ {
+ "questionId": "q97",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6988,
+ "outputTokens": 5,
+ "latencyMs": 974.4008749999921
+ },
+ {
+ "questionId": "q97",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7195,
+ "outputTokens": 2,
+ "latencyMs": 994.4026250000461
+ },
+ {
+ "questionId": "q97",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 6778,
+ "outputTokens": 1479,
+ "latencyMs": 44513.282000000065
},
{
"questionId": "q97",
@@ -10623,29 +15925,51 @@
"isCorrect": true,
"inputTokens": 8409,
"outputTokens": 5,
- "latencyMs": 1537.2077500000014
+ "latencyMs": 1579.2647080000024
},
{
"questionId": "q97",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7832,
+ "outputTokens": 2,
+ "latencyMs": 6760.291374999972
+ },
+ {
+ "questionId": "q97",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9155,
- "outputTokens": 519,
- "latencyMs": 9752.917709000001
+ "inputTokens": 11034,
+ "outputTokens": 647,
+ "latencyMs": 6886.205707999994
},
{
"questionId": "q97",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9284,
+ "inputTokens": 13375,
"outputTokens": 5,
- "latencyMs": 1101.1202090000152
+ "latencyMs": 1140.8538749999716
+ },
+ {
+ "questionId": "q97",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 13445,
+ "outputTokens": 2,
+ "latencyMs": 5500.930916999932
},
{
"questionId": "q97",
@@ -10656,18 +15980,29 @@
"isCorrect": true,
"inputTokens": 7370,
"outputTokens": 647,
- "latencyMs": 5711.038375000004
+ "latencyMs": 6873.12387499999
},
{
"questionId": "q97",
"format": "yaml",
"model": "claude-haiku-4-5",
"expected": "10",
- "actual": "10",
- "isCorrect": true,
+ "actual": "9",
+ "isCorrect": false,
"inputTokens": 8380,
"outputTokens": 5,
- "latencyMs": 1208.3837910000002
+ "latencyMs": 1385.4246660000063
+ },
+ {
+ "questionId": "q97",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "9",
+ "isCorrect": false,
+ "inputTokens": 8421,
+ "outputTokens": 1,
+ "latencyMs": 1070.8007499999949
},
{
"questionId": "q98",
@@ -10678,7 +16013,7 @@
"isCorrect": true,
"inputTokens": 9736,
"outputTokens": 775,
- "latencyMs": 6578.005040999997
+ "latencyMs": 10215.419124999957
},
{
"questionId": "q98",
@@ -10689,7 +16024,18 @@
"isCorrect": false,
"inputTokens": 11902,
"outputTokens": 5,
- "latencyMs": 1351.4712499999732
+ "latencyMs": 1169.6882500000065
+ },
+ {
+ "questionId": "q98",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 12107,
+ "outputTokens": 2,
+ "latencyMs": 1497.445791999984
},
{
"questionId": "q98",
@@ -10700,7 +16046,7 @@
"isCorrect": true,
"inputTokens": 6010,
"outputTokens": 583,
- "latencyMs": 6437.821874999994
+ "latencyMs": 17780.296249999956
},
{
"questionId": "q98",
@@ -10711,7 +16057,18 @@
"isCorrect": false,
"inputTokens": 6988,
"outputTokens": 5,
- "latencyMs": 1155.7898750000168
+ "latencyMs": 1507.771624999994
+ },
+ {
+ "questionId": "q98",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7195,
+ "outputTokens": 2,
+ "latencyMs": 1089.9117079999996
},
{
"questionId": "q98",
@@ -10721,8 +16078,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 6778,
- "outputTokens": 647,
- "latencyMs": 6673.183250000002
+ "outputTokens": 583,
+ "latencyMs": 6443.644124999992
},
{
"questionId": "q98",
@@ -10733,29 +16090,51 @@
"isCorrect": true,
"inputTokens": 8409,
"outputTokens": 5,
- "latencyMs": 1359.994417000009
+ "latencyMs": 1212.1155410000356
},
{
"questionId": "q98",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 7832,
+ "outputTokens": 2,
+ "latencyMs": 5152.548582999967
+ },
+ {
+ "questionId": "q98",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9155,
+ "inputTokens": 11034,
"outputTokens": 647,
- "latencyMs": 5806.33679099998
+ "latencyMs": 12689.804665999953
},
{
"questionId": "q98",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "10",
"actual": "10",
"isCorrect": true,
- "inputTokens": 9284,
+ "inputTokens": 13375,
"outputTokens": 5,
- "latencyMs": 1339.4869999999937
+ "latencyMs": 1122.1935420000227
+ },
+ {
+ "questionId": "q98",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 13445,
+ "outputTokens": 2,
+ "latencyMs": 1011.1309159999946
},
{
"questionId": "q98",
@@ -10765,8 +16144,8 @@
"actual": "10",
"isCorrect": true,
"inputTokens": 7370,
- "outputTokens": 519,
- "latencyMs": 6011.0411669999885
+ "outputTokens": 711,
+ "latencyMs": 9792.569583000033
},
{
"questionId": "q98",
@@ -10777,29 +16156,51 @@
"isCorrect": false,
"inputTokens": 8380,
"outputTokens": 5,
- "latencyMs": 1305.6029999999737
+ "latencyMs": 1111.848708000034
+ },
+ {
+ "questionId": "q98",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "10",
+ "actual": "10",
+ "isCorrect": true,
+ "inputTokens": 8421,
+ "outputTokens": 2,
+ "latencyMs": 868.7284579999978
},
{
"questionId": "q99",
"format": "json",
"model": "gpt-5-nano",
"expected": "42342.25",
- "actual": "41001.14",
+ "actual": "41304.82",
"isCorrect": false,
"inputTokens": 9736,
- "outputTokens": 1226,
- "latencyMs": 11276.714458000002
+ "outputTokens": 2698,
+ "latencyMs": 46504.10175000003
},
{
"questionId": "q99",
"format": "json",
"model": "claude-haiku-4-5",
"expected": "42342.25",
- "actual": "48,847.66",
+ "actual": "50,847.47",
"isCorrect": false,
"inputTokens": 11902,
"outputTokens": 9,
- "latencyMs": 1400.5162910000072
+ "latencyMs": 1987.3346250000177
+ },
+ {
+ "questionId": "q99",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "42342.25",
+ "actual": "40000.00",
+ "isCorrect": false,
+ "inputTokens": 12108,
+ "outputTokens": 8,
+ "latencyMs": 7707.775332999998
},
{
"questionId": "q99",
@@ -10809,8 +16210,8 @@
"actual": "42342.25",
"isCorrect": true,
"inputTokens": 6010,
- "outputTokens": 5962,
- "latencyMs": 50971.727667
+ "outputTokens": 5578,
+ "latencyMs": 48586.554000000004
},
{
"questionId": "q99",
@@ -10821,7 +16222,18 @@
"isCorrect": false,
"inputTokens": 6988,
"outputTokens": 9,
- "latencyMs": 1118.9986250000075
+ "latencyMs": 3438.9107920000097
+ },
+ {
+ "questionId": "q99",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "42342.25",
+ "actual": "40000.00",
+ "isCorrect": false,
+ "inputTokens": 7196,
+ "outputTokens": 8,
+ "latencyMs": 6512.329665999976
},
{
"questionId": "q99",
@@ -10831,8 +16243,8 @@
"actual": "42342.25",
"isCorrect": true,
"inputTokens": 6778,
- "outputTokens": 3082,
- "latencyMs": 22816.508165999985
+ "outputTokens": 4874,
+ "latencyMs": 37911.18645799998
},
{
"questionId": "q99",
@@ -10843,29 +16255,51 @@
"isCorrect": false,
"inputTokens": 8409,
"outputTokens": 9,
- "latencyMs": 1104.31912499998
+ "latencyMs": 1071.3846250000643
},
{
"questionId": "q99",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "42342.25",
+ "actual": "40000.00",
+ "isCorrect": false,
+ "inputTokens": 7833,
+ "outputTokens": 8,
+ "latencyMs": 7891.89620800002
+ },
+ {
+ "questionId": "q99",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "42342.25",
- "actual": "42425.97",
- "isCorrect": false,
- "inputTokens": 9155,
- "outputTokens": 2762,
- "latencyMs": 17412.623583000008
+ "actual": "42342.25",
+ "isCorrect": true,
+ "inputTokens": 11034,
+ "outputTokens": 3338,
+ "latencyMs": 23923.247208000044
},
{
"questionId": "q99",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "42342.25",
"actual": "47,847.47",
"isCorrect": false,
- "inputTokens": 9284,
+ "inputTokens": 13375,
"outputTokens": 9,
- "latencyMs": 1435.553082999977
+ "latencyMs": 1182.405207999982
+ },
+ {
+ "questionId": "q99",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "42342.25",
+ "actual": "43000.00",
+ "isCorrect": false,
+ "inputTokens": 13446,
+ "outputTokens": 8,
+ "latencyMs": 9388.739500000025
},
{
"questionId": "q99",
@@ -10875,19 +16309,30 @@
"actual": "42342.25",
"isCorrect": true,
"inputTokens": 7370,
- "outputTokens": 3402,
- "latencyMs": 26299.00112500001
+ "outputTokens": 3082,
+ "latencyMs": 31024.954041999998
},
{
"questionId": "q99",
"format": "yaml",
"model": "claude-haiku-4-5",
"expected": "42342.25",
- "actual": "41,847.47",
+ "actual": "47,847.89",
"isCorrect": false,
"inputTokens": 8380,
"outputTokens": 9,
- "latencyMs": 1272.4541250000184
+ "latencyMs": 1240.8969590000343
+ },
+ {
+ "questionId": "q99",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "42342.25",
+ "actual": "30900.09",
+ "isCorrect": false,
+ "inputTokens": 8422,
+ "outputTokens": 8,
+ "latencyMs": 2345.1206249999814
},
{
"questionId": "q100",
@@ -10897,8 +16342,8 @@
"actual": "44",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 1351,
- "latencyMs": 13461.932250000013
+ "outputTokens": 2567,
+ "latencyMs": 53935.78729200002
},
{
"questionId": "q100",
@@ -10909,7 +16354,18 @@
"isCorrect": false,
"inputTokens": 11904,
"outputTokens": 5,
- "latencyMs": 1772.9891250000219
+ "latencyMs": 1066.0944579999195
+ },
+ {
+ "questionId": "q100",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "44",
+ "actual": "45",
+ "isCorrect": false,
+ "inputTokens": 12112,
+ "outputTokens": 2,
+ "latencyMs": 1494.8697500000708
},
{
"questionId": "q100",
@@ -10919,8 +16375,8 @@
"actual": "44",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 1735,
- "latencyMs": 14196.807250000013
+ "outputTokens": 1351,
+ "latencyMs": 14949.407374999952
},
{
"questionId": "q100",
@@ -10931,7 +16387,18 @@
"isCorrect": false,
"inputTokens": 6990,
"outputTokens": 5,
- "latencyMs": 1749.7322920000006
+ "latencyMs": 967.8411250000354
+ },
+ {
+ "questionId": "q100",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "44",
+ "actual": "44",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 2,
+ "latencyMs": 12734.97745799995
},
{
"questionId": "q100",
@@ -10941,41 +16408,63 @@
"actual": "44",
"isCorrect": true,
"inputTokens": 6780,
- "outputTokens": 1863,
- "latencyMs": 14291.044916999992
+ "outputTokens": 1607,
+ "latencyMs": 15572.392542000045
},
{
"questionId": "q100",
"format": "csv",
"model": "claude-haiku-4-5",
"expected": "44",
- "actual": "47",
+ "actual": "48",
"isCorrect": false,
"inputTokens": 8411,
"outputTokens": 5,
- "latencyMs": 1453.1822079999838
+ "latencyMs": 2052.4572499999776
},
{
"questionId": "q100",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "44",
+ "actual": "44",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 2,
+ "latencyMs": 13219.975833000033
+ },
+ {
+ "questionId": "q100",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "44",
"actual": "44",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 1799,
- "latencyMs": 16012.806332999986
+ "inputTokens": 11036,
+ "outputTokens": 1735,
+ "latencyMs": 69773.56662499998
},
{
"questionId": "q100",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "44",
- "actual": "48",
+ "actual": "45",
"isCorrect": false,
- "inputTokens": 9286,
+ "inputTokens": 13377,
"outputTokens": 5,
- "latencyMs": 1761.131041000015
+ "latencyMs": 1719.8178329999791
+ },
+ {
+ "questionId": "q100",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "44",
+ "actual": "44",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 2,
+ "latencyMs": 11322.527541999938
},
{
"questionId": "q100",
@@ -10985,8 +16474,8 @@
"actual": "44",
"isCorrect": true,
"inputTokens": 7372,
- "outputTokens": 1415,
- "latencyMs": 12218.14491599999
+ "outputTokens": 1607,
+ "latencyMs": 20736.131416000077
},
{
"questionId": "q100",
@@ -10997,7 +16486,18 @@
"isCorrect": false,
"inputTokens": 8382,
"outputTokens": 5,
- "latencyMs": 1255.681917000009
+ "latencyMs": 1052.186207999941
+ },
+ {
+ "questionId": "q100",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "44",
+ "actual": "47",
+ "isCorrect": false,
+ "inputTokens": 8426,
+ "outputTokens": 2,
+ "latencyMs": 1184.4893750000047
},
{
"questionId": "q101",
@@ -11007,8 +16507,8 @@
"actual": "39",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 2311,
- "latencyMs": 22316.87704199998
+ "outputTokens": 967,
+ "latencyMs": 12279.209374999977
},
{
"questionId": "q101",
@@ -11019,7 +16519,18 @@
"isCorrect": false,
"inputTokens": 11904,
"outputTokens": 5,
- "latencyMs": 1090.176792000013
+ "latencyMs": 1297.988250000053
+ },
+ {
+ "questionId": "q101",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "39",
+ "actual": "45",
+ "isCorrect": false,
+ "inputTokens": 12112,
+ "outputTokens": 2,
+ "latencyMs": 1760.7460000000428
},
{
"questionId": "q101",
@@ -11029,8 +16540,8 @@
"actual": "39",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 1095,
- "latencyMs": 7211.767082999984
+ "outputTokens": 1351,
+ "latencyMs": 10500.295707999961
},
{
"questionId": "q101",
@@ -11041,7 +16552,18 @@
"isCorrect": false,
"inputTokens": 6990,
"outputTokens": 5,
- "latencyMs": 1129.9290000000037
+ "latencyMs": 1138.843208999955
+ },
+ {
+ "questionId": "q101",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "39",
+ "actual": "39",
+ "isCorrect": true,
+ "inputTokens": 7200,
+ "outputTokens": 2,
+ "latencyMs": 9441.675416999962
},
{
"questionId": "q101",
@@ -11051,8 +16573,8 @@
"actual": "39",
"isCorrect": true,
"inputTokens": 6780,
- "outputTokens": 1415,
- "latencyMs": 15701.471499999985
+ "outputTokens": 1863,
+ "latencyMs": 19287.06454199995
},
{
"questionId": "q101",
@@ -11063,29 +16585,51 @@
"isCorrect": false,
"inputTokens": 8411,
"outputTokens": 5,
- "latencyMs": 1251.5472500000033
+ "latencyMs": 1490.810999999987
},
{
"questionId": "q101",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "39",
+ "actual": "39",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 2,
+ "latencyMs": 12331.178375000018
+ },
+ {
+ "questionId": "q101",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "39",
"actual": "39",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 1799,
- "latencyMs": 16689.30345800001
+ "inputTokens": 11036,
+ "outputTokens": 3335,
+ "latencyMs": 26443.42041599995
},
{
"questionId": "q101",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "39",
- "actual": "41",
+ "actual": "38",
"isCorrect": false,
- "inputTokens": 9286,
+ "inputTokens": 13377,
"outputTokens": 5,
- "latencyMs": 1168.8190419999883
+ "latencyMs": 1419.3634590000147
+ },
+ {
+ "questionId": "q101",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "39",
+ "actual": "39",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 2,
+ "latencyMs": 11403.771042000037
},
{
"questionId": "q101",
@@ -11095,8 +16639,8 @@
"actual": "39",
"isCorrect": true,
"inputTokens": 7372,
- "outputTokens": 1863,
- "latencyMs": 14505.393958999979
+ "outputTokens": 1671,
+ "latencyMs": 14214.94204200001
},
{
"questionId": "q101",
@@ -11107,7 +16651,18 @@
"isCorrect": false,
"inputTokens": 8382,
"outputTokens": 5,
- "latencyMs": 1149.8783330000006
+ "latencyMs": 1183.1556669999845
+ },
+ {
+ "questionId": "q101",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "39",
+ "actual": "39",
+ "isCorrect": true,
+ "inputTokens": 8426,
+ "outputTokens": 2,
+ "latencyMs": 12192.347249999992
},
{
"questionId": "q102",
@@ -11117,8 +16672,8 @@
"actual": "32",
"isCorrect": true,
"inputTokens": 9738,
- "outputTokens": 1607,
- "latencyMs": 13945.93979200002
+ "outputTokens": 2311,
+ "latencyMs": 286602.893667
},
{
"questionId": "q102",
@@ -11129,7 +16684,18 @@
"isCorrect": false,
"inputTokens": 11904,
"outputTokens": 5,
- "latencyMs": 1175.8143749999872
+ "latencyMs": 1132.721833000076
+ },
+ {
+ "questionId": "q102",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "32",
+ "actual": "37",
+ "isCorrect": false,
+ "inputTokens": 12112,
+ "outputTokens": 2,
+ "latencyMs": 1632.5237090000883
},
{
"questionId": "q102",
@@ -11139,8 +16705,8 @@
"actual": "32",
"isCorrect": true,
"inputTokens": 6012,
- "outputTokens": 1351,
- "latencyMs": 11991.764750000002
+ "outputTokens": 839,
+ "latencyMs": 12142.227125000092
},
{
"questionId": "q102",
@@ -11151,7 +16717,18 @@
"isCorrect": false,
"inputTokens": 6990,
"outputTokens": 5,
- "latencyMs": 1643.4279169999936
+ "latencyMs": 1184.7071669999277
+ },
+ {
+ "questionId": "q102",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "32",
+ "actual": "37",
+ "isCorrect": false,
+ "inputTokens": 7200,
+ "outputTokens": 2,
+ "latencyMs": 1000.1081669999985
},
{
"questionId": "q102",
@@ -11161,8 +16738,8 @@
"actual": "32",
"isCorrect": true,
"inputTokens": 6780,
- "outputTokens": 1799,
- "latencyMs": 17324.695000000007
+ "outputTokens": 1287,
+ "latencyMs": 45846.97675000003
},
{
"questionId": "q102",
@@ -11173,29 +16750,51 @@
"isCorrect": false,
"inputTokens": 8411,
"outputTokens": 5,
- "latencyMs": 1197.7254160000011
+ "latencyMs": 1744.5200829999521
},
{
"questionId": "q102",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "32",
+ "actual": "32",
+ "isCorrect": true,
+ "inputTokens": 7837,
+ "outputTokens": 2,
+ "latencyMs": 12398.869249999989
+ },
+ {
+ "questionId": "q102",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "32",
"actual": "32",
"isCorrect": true,
- "inputTokens": 9157,
- "outputTokens": 1607,
- "latencyMs": 22426.01029199999
+ "inputTokens": 11036,
+ "outputTokens": 1351,
+ "latencyMs": 12448.268124999944
},
{
"questionId": "q102",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "32",
"actual": "28",
"isCorrect": false,
- "inputTokens": 9286,
+ "inputTokens": 13377,
"outputTokens": 5,
- "latencyMs": 1065.6509170000209
+ "latencyMs": 1155.887459000107
+ },
+ {
+ "questionId": "q102",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "32",
+ "actual": "32",
+ "isCorrect": true,
+ "inputTokens": 13450,
+ "outputTokens": 2,
+ "latencyMs": 12662.306666000048
},
{
"questionId": "q102",
@@ -11205,8 +16804,8 @@
"actual": "31",
"isCorrect": false,
"inputTokens": 7372,
- "outputTokens": 1543,
- "latencyMs": 12786.843416999996
+ "outputTokens": 1799,
+ "latencyMs": 15611.27658299997
},
{
"questionId": "q102",
@@ -11217,7 +16816,18 @@
"isCorrect": false,
"inputTokens": 8382,
"outputTokens": 5,
- "latencyMs": 2054.993749999994
+ "latencyMs": 1592.5243330000667
+ },
+ {
+ "questionId": "q102",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "32",
+ "actual": "37",
+ "isCorrect": false,
+ "inputTokens": 8426,
+ "outputTokens": 2,
+ "latencyMs": 1257.715124999988
},
{
"questionId": "q103",
@@ -11228,7 +16838,7 @@
"isCorrect": true,
"inputTokens": 3712,
"outputTokens": 72,
- "latencyMs": 2244.986208999995
+ "latencyMs": 1883.4624169999734
},
{
"questionId": "q103",
@@ -11239,7 +16849,18 @@
"isCorrect": true,
"inputTokens": 4080,
"outputTokens": 6,
- "latencyMs": 1162.9390420000127
+ "latencyMs": 1072.3808749999152
+ },
+ {
+ "questionId": "q103",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6975",
+ "actual": "6975",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 2622.4323750000913
},
{
"questionId": "q103",
@@ -11250,7 +16871,7 @@
"isCorrect": true,
"inputTokens": 1563,
"outputTokens": 136,
- "latencyMs": 2179.3558330000087
+ "latencyMs": 15307.557292000041
},
{
"questionId": "q103",
@@ -11261,7 +16882,18 @@
"isCorrect": true,
"inputTokens": 1509,
"outputTokens": 6,
- "latencyMs": 1013.4975409999897
+ "latencyMs": 1084.2609999999404
+ },
+ {
+ "questionId": "q103",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6975",
+ "actual": "6975",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2758.0986669999547
},
{
"questionId": "q103",
@@ -11272,7 +16904,7 @@
"isCorrect": true,
"inputTokens": 1441,
"outputTokens": 72,
- "latencyMs": 4859.720833999978
+ "latencyMs": 1854.1639169999398
},
{
"questionId": "q103",
@@ -11283,29 +16915,51 @@
"isCorrect": true,
"inputTokens": 1445,
"outputTokens": 6,
- "latencyMs": 1437.758375000005
+ "latencyMs": 948.2132079999428
},
{
"questionId": "q103",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6975",
+ "actual": "6975",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 2243.337582999957
+ },
+ {
+ "questionId": "q103",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "6975",
"actual": "6975",
"isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 72,
- "latencyMs": 3120.702874999988
+ "inputTokens": 4423,
+ "outputTokens": 200,
+ "latencyMs": 4750.478917
},
{
"questionId": "q103",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "6975",
"actual": "6975",
"isCorrect": true,
- "inputTokens": 3415,
+ "inputTokens": 4787,
"outputTokens": 6,
- "latencyMs": 1051.775708000001
+ "latencyMs": 1168.2797080000164
+ },
+ {
+ "questionId": "q103",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6975",
+ "actual": "6975",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 1235.7723750000587
},
{
"questionId": "q103",
@@ -11316,7 +16970,7 @@
"isCorrect": true,
"inputTokens": 2985,
"outputTokens": 72,
- "latencyMs": 2182.880084000004
+ "latencyMs": 4593.343416000018
},
{
"questionId": "q103",
@@ -11327,7 +16981,18 @@
"isCorrect": true,
"inputTokens": 3110,
"outputTokens": 6,
- "latencyMs": 1045.2009580000013
+ "latencyMs": 1005.8936250000261
+ },
+ {
+ "questionId": "q103",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6975",
+ "actual": "6975",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1302.4004580000183
},
{
"questionId": "q104",
@@ -11338,7 +17003,7 @@
"isCorrect": true,
"inputTokens": 3711,
"outputTokens": 138,
- "latencyMs": 5291.923750000016
+ "latencyMs": 10838.235042000073
},
{
"questionId": "q104",
@@ -11349,7 +17014,18 @@
"isCorrect": true,
"inputTokens": 4079,
"outputTokens": 8,
- "latencyMs": 1009.6958750000049
+ "latencyMs": 1148.390958999982
+ },
+ {
+ "questionId": "q104",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6686.23",
+ "actual": "6686.23",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 2339.6254999999655
},
{
"questionId": "q104",
@@ -11359,8 +17035,8 @@
"actual": "6686.23",
"isCorrect": true,
"inputTokens": 1562,
- "outputTokens": 74,
- "latencyMs": 2582.2320419999887
+ "outputTokens": 138,
+ "latencyMs": 7077.6732909999555
},
{
"questionId": "q104",
@@ -11371,7 +17047,18 @@
"isCorrect": true,
"inputTokens": 1508,
"outputTokens": 8,
- "latencyMs": 1203.816542000015
+ "latencyMs": 1064.9028750000289
+ },
+ {
+ "questionId": "q104",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6686.23",
+ "actual": "6686.23",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 2335.216167000006
},
{
"questionId": "q104",
@@ -11381,8 +17068,8 @@
"actual": "6686.23",
"isCorrect": true,
"inputTokens": 1440,
- "outputTokens": 138,
- "latencyMs": 2774.835167000012
+ "outputTokens": 74,
+ "latencyMs": 5253.633124999935
},
{
"questionId": "q104",
@@ -11393,29 +17080,51 @@
"isCorrect": true,
"inputTokens": 1444,
"outputTokens": 8,
- "latencyMs": 979.9191669999855
+ "latencyMs": 1438.5572920000413
},
{
"questionId": "q104",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6686.23",
+ "actual": "6686.23",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 1807.325458999956
+ },
+ {
+ "questionId": "q104",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "6686.23",
"actual": "6686.23",
"isCorrect": true,
- "inputTokens": 3828,
+ "inputTokens": 4422,
"outputTokens": 138,
- "latencyMs": 2616.684333000012
+ "latencyMs": 3436.290666999994
},
{
"questionId": "q104",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "6686.23",
"actual": "6686.23",
"isCorrect": true,
- "inputTokens": 3414,
+ "inputTokens": 4786,
"outputTokens": 8,
- "latencyMs": 1253.4844169999997
+ "latencyMs": 1125.5812910000095
+ },
+ {
+ "questionId": "q104",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6686.23",
+ "actual": "6686.23",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 984.154334000079
},
{
"questionId": "q104",
@@ -11425,8 +17134,8 @@
"actual": "6686.23",
"isCorrect": true,
"inputTokens": 2984,
- "outputTokens": 74,
- "latencyMs": 2267.1155000000144
+ "outputTokens": 138,
+ "latencyMs": 4561.665000000037
},
{
"questionId": "q104",
@@ -11437,282 +17146,425 @@
"isCorrect": true,
"inputTokens": 3109,
"outputTokens": 8,
- "latencyMs": 1185.4212080000143
+ "latencyMs": 1273.080958000035
},
{
- "questionId": "q105",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 136,
- "latencyMs": 2905.6011250000156
- },
- {
- "questionId": "q105",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 1571.1469999999972
- },
- {
- "questionId": "q105",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 328,
- "latencyMs": 3884.65858399999
- },
- {
- "questionId": "q105",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 1207.1518330000108
- },
- {
- "questionId": "q105",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 72,
- "latencyMs": 1995.0557919999992
- },
- {
- "questionId": "q105",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 1238.8113749999902
- },
- {
- "questionId": "q105",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 136,
- "latencyMs": 5824.06574999998
- },
- {
- "questionId": "q105",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "7500",
- "actual": "7500",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 1337.474749999994
- },
- {
- "questionId": "q105",
+ "questionId": "q104",
"format": "yaml",
- "model": "gpt-5-nano",
- "expected": "7500",
- "actual": "7500",
+ "model": "gemini-2.5-flash",
+ "expected": "6686.23",
+ "actual": "6686.23",
"isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 2286.1839580000087
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 1065.2617909999099
},
{
"questionId": "q105",
- "format": "yaml",
- "model": "claude-haiku-4-5",
+ "format": "json",
+ "model": "gpt-5-nano",
"expected": "7500",
"actual": "7500",
"isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 1326.3640000000014
- },
- {
- "questionId": "q106",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 3801.309249999991
- },
- {
- "questionId": "q106",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 1054.8991249999963
- },
- {
- "questionId": "q106",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 1562,
- "outputTokens": 74,
- "latencyMs": 3338.1347499999974
- },
- {
- "questionId": "q106",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 1508,
- "outputTokens": 8,
- "latencyMs": 1393.589082999999
- },
- {
- "questionId": "q106",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 202,
- "latencyMs": 3719.6092089999875
- },
- {
- "questionId": "q106",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 1030.9656669999822
- },
- {
- "questionId": "q106",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 74,
- "latencyMs": 2226.628250000009
- },
- {
- "questionId": "q106",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1154.132540999999
- },
- {
- "questionId": "q106",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 2922.2590830000117
- },
- {
- "questionId": "q106",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "14297.05",
- "actual": "14297.05",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 2048.011916999996
- },
- {
- "questionId": "q107",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "6692",
- "actual": "6692",
- "isCorrect": true,
"inputTokens": 3712,
"outputTokens": 200,
- "latencyMs": 2520.5313329999917
+ "latencyMs": 3926.1200409999583
},
{
- "questionId": "q107",
+ "questionId": "q105",
"format": "json",
"model": "claude-haiku-4-5",
- "expected": "6692",
- "actual": "6692",
+ "expected": "7500",
+ "actual": "7500",
"isCorrect": true,
"inputTokens": 4080,
"outputTokens": 6,
- "latencyMs": 943.3422089999949
+ "latencyMs": 1170.2935419999994
},
{
- "questionId": "q107",
+ "questionId": "q105",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 2907.920374999987
+ },
+ {
+ "questionId": "q105",
"format": "toon",
"model": "gpt-5-nano",
- "expected": "6692",
- "actual": "6692",
+ "expected": "7500",
+ "actual": "7500",
"isCorrect": true,
"inputTokens": 1563,
"outputTokens": 136,
- "latencyMs": 2300.8406249999825
+ "latencyMs": 6013.766874999972
},
{
- "questionId": "q107",
+ "questionId": "q105",
"format": "toon",
"model": "claude-haiku-4-5",
- "expected": "6692",
- "actual": "6692",
+ "expected": "7500",
+ "actual": "7500",
"isCorrect": true,
"inputTokens": 1509,
"outputTokens": 6,
- "latencyMs": 1128.4146670000046
+ "latencyMs": 1029.452791999909
},
{
- "questionId": "q107",
+ "questionId": "q105",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 1767.9035409999778
+ },
+ {
+ "questionId": "q105",
"format": "csv",
"model": "gpt-5-nano",
- "expected": "6692",
- "actual": "6692",
+ "expected": "7500",
+ "actual": "7500",
"isCorrect": true,
"inputTokens": 1441,
"outputTokens": 200,
- "latencyMs": 2929.585208000004
+ "latencyMs": 2931.0335839999607
+ },
+ {
+ "questionId": "q105",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 857.5665409999201
+ },
+ {
+ "questionId": "q105",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1870.161458000075
+ },
+ {
+ "questionId": "q105",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 4423,
+ "outputTokens": 136,
+ "latencyMs": 2792.1963339999784
+ },
+ {
+ "questionId": "q105",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 1112.5085419999668
+ },
+ {
+ "questionId": "q105",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 2572.699583999929
+ },
+ {
+ "questionId": "q105",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 136,
+ "latencyMs": 3129.4847079999745
+ },
+ {
+ "questionId": "q105",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 2352.252790999948
+ },
+ {
+ "questionId": "q105",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "7500",
+ "actual": "7500",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1623.8393749999814
+ },
+ {
+ "questionId": "q106",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 74,
+ "latencyMs": 5410.545292000053
+ },
+ {
+ "questionId": "q106",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 1382.8987500000512
+ },
+ {
+ "questionId": "q106",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 8,
+ "latencyMs": 2918.163458999945
+ },
+ {
+ "questionId": "q106",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 138,
+ "latencyMs": 2478.2083329999587
+ },
+ {
+ "questionId": "q106",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 1265.4150420000078
+ },
+ {
+ "questionId": "q106",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 8,
+ "latencyMs": 1943.8234170000069
+ },
+ {
+ "questionId": "q106",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 1440,
+ "outputTokens": 138,
+ "latencyMs": 4516.7844160000095
+ },
+ {
+ "questionId": "q106",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 1502.5052920000162
+ },
+ {
+ "questionId": "q106",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 8,
+ "latencyMs": 2691.783666000003
+ },
+ {
+ "questionId": "q106",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 138,
+ "latencyMs": 4047.482250000001
+ },
+ {
+ "questionId": "q106",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1547.010666999966
+ },
+ {
+ "questionId": "q106",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 8,
+ "latencyMs": 1679.222165999934
+ },
+ {
+ "questionId": "q106",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 202,
+ "latencyMs": 4740.509624999948
+ },
+ {
+ "questionId": "q106",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1271.0033330000006
+ },
+ {
+ "questionId": "q106",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "14297.05",
+ "actual": "14297.05",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 8,
+ "latencyMs": 2636.093916999991
+ },
+ {
+ "questionId": "q107",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 72,
+ "latencyMs": 8298.315874999971
+ },
+ {
+ "questionId": "q107",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 1520.9959589999635
+ },
+ {
+ "questionId": "q107",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 2487.122250000015
+ },
+ {
+ "questionId": "q107",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 136,
+ "latencyMs": 2142.1067079999484
+ },
+ {
+ "questionId": "q107",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 1108.5955839999951
+ },
+ {
+ "questionId": "q107",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2469.1304579999996
+ },
+ {
+ "questionId": "q107",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 136,
+ "latencyMs": 2567.9449590001022
},
{
"questionId": "q107",
@@ -11723,29 +17575,51 @@
"isCorrect": true,
"inputTokens": 1445,
"outputTokens": 6,
- "latencyMs": 1230.4635420000122
+ "latencyMs": 1078.092707999982
},
{
"questionId": "q107",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1809.784708000021
+ },
+ {
+ "questionId": "q107",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "6692",
"actual": "6692",
"isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 136,
- "latencyMs": 3650.3654169999936
+ "inputTokens": 4423,
+ "outputTokens": 200,
+ "latencyMs": 2525.847415999975
},
{
"questionId": "q107",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "6692",
"actual": "6692",
"isCorrect": true,
- "inputTokens": 3415,
+ "inputTokens": 4787,
"outputTokens": 6,
- "latencyMs": 985.8184590000019
+ "latencyMs": 1085.6306249999907
+ },
+ {
+ "questionId": "q107",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 2901.1133329999866
},
{
"questionId": "q107",
@@ -11755,8 +17629,8 @@
"actual": "6692",
"isCorrect": true,
"inputTokens": 2985,
- "outputTokens": 328,
- "latencyMs": 3772.2553330000082
+ "outputTokens": 200,
+ "latencyMs": 3336.295124999946
},
{
"questionId": "q107",
@@ -11767,7 +17641,18 @@
"isCorrect": true,
"inputTokens": 3110,
"outputTokens": 6,
- "latencyMs": 1311.8630419999827
+ "latencyMs": 1092.8172920000507
+ },
+ {
+ "questionId": "q107",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6692",
+ "actual": "6692",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1070.4765419999603
},
{
"questionId": "q108",
@@ -11777,8 +17662,8 @@
"actual": "9302.76",
"isCorrect": true,
"inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 2935.785124999995
+ "outputTokens": 74,
+ "latencyMs": 4454.346332999994
},
{
"questionId": "q108",
@@ -11789,7 +17674,18 @@
"isCorrect": true,
"inputTokens": 4079,
"outputTokens": 8,
- "latencyMs": 1391.9168749999953
+ "latencyMs": 1455.8378749999683
+ },
+ {
+ "questionId": "q108",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 1775.3881249999395
},
{
"questionId": "q108",
@@ -11799,8 +17695,8 @@
"actual": "9302.76",
"isCorrect": true,
"inputTokens": 1562,
- "outputTokens": 138,
- "latencyMs": 5759.15529200001
+ "outputTokens": 74,
+ "latencyMs": 3750.9490000000224
},
{
"questionId": "q108",
@@ -11811,7 +17707,18 @@
"isCorrect": true,
"inputTokens": 1508,
"outputTokens": 8,
- "latencyMs": 1064.3980420000153
+ "latencyMs": 1294.0682909999741
+ },
+ {
+ "questionId": "q108",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 2086.9909169999883
},
{
"questionId": "q108",
@@ -11821,481 +17728,723 @@
"actual": "9302.76",
"isCorrect": true,
"inputTokens": 1440,
+ "outputTokens": 138,
+ "latencyMs": 2283.21883300005
+ },
+ {
+ "questionId": "q108",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 983.0039999999572
+ },
+ {
+ "questionId": "q108",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 2159.7753329999978
+ },
+ {
+ "questionId": "q108",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 202,
+ "latencyMs": 6951.322584000067
+ },
+ {
+ "questionId": "q108",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1090.7049170000246
+ },
+ {
+ "questionId": "q108",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1449.565457999939
+ },
+ {
+ "questionId": "q108",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 138,
+ "latencyMs": 3853.0687920000637
+ },
+ {
+ "questionId": "q108",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1126.2435420000693
+ },
+ {
+ "questionId": "q108",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "9302.76",
+ "actual": "9302.76",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 1764.1200830000453
+ },
+ {
+ "questionId": "q109",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 136,
+ "latencyMs": 3300.9657910000533
+ },
+ {
+ "questionId": "q109",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 1052.1962920000078
+ },
+ {
+ "questionId": "q109",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 3287.65862500004
+ },
+ {
+ "questionId": "q109",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 200,
+ "latencyMs": 3891.706874999916
+ },
+ {
+ "questionId": "q109",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 1081.2852920000441
+ },
+ {
+ "questionId": "q109",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2226.4307500000577
+ },
+ {
+ "questionId": "q109",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 72,
+ "latencyMs": 1982.5622910000384
+ },
+ {
+ "questionId": "q109",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 929.4726250000531
+ },
+ {
+ "questionId": "q109",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1787.2903330000117
+ },
+ {
+ "questionId": "q109",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 4423,
+ "outputTokens": 264,
+ "latencyMs": 3257.529749999987
+ },
+ {
+ "questionId": "q109",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 1576.1779170000227
+ },
+ {
+ "questionId": "q109",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 2836.7503750000615
+ },
+ {
+ "questionId": "q109",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 136,
+ "latencyMs": 4072.856582999928
+ },
+ {
+ "questionId": "q109",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 974.9362500000279
+ },
+ {
+ "questionId": "q109",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "3285",
+ "actual": "3285",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1213.922833000077
+ },
+ {
+ "questionId": "q110",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 138,
+ "latencyMs": 3493.7957090000855
+ },
+ {
+ "questionId": "q110",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 1142.0260000000708
+ },
+ {
+ "questionId": "q110",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 2381.430916000041
+ },
+ {
+ "questionId": "q110",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 138,
+ "latencyMs": 2413.9573330000276
+ },
+ {
+ "questionId": "q110",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 1847.1221249999944
+ },
+ {
+ "questionId": "q110",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 2303.37033299997
+ },
+ {
+ "questionId": "q110",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 1440,
+ "outputTokens": 138,
+ "latencyMs": 2214.3459579999326
+ },
+ {
+ "questionId": "q110",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 1087.8486249999842
+ },
+ {
+ "questionId": "q110",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 1525.997917000088
+ },
+ {
+ "questionId": "q110",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 202,
+ "latencyMs": 2952.5206250000047
+ },
+ {
+ "questionId": "q110",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1203.7597079999978
+ },
+ {
+ "questionId": "q110",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1580.2738329999847
+ },
+ {
+ "questionId": "q110",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 138,
+ "latencyMs": 2473.919208999956
+ },
+ {
+ "questionId": "q110",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1452.058374999906
+ },
+ {
+ "questionId": "q110",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "3826.93",
+ "actual": "3826.93",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 2691.815042000031
+ },
+ {
+ "questionId": "q111",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 136,
+ "latencyMs": 2043.9027500000084
+ },
+ {
+ "questionId": "q111",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 1085.5088339999784
+ },
+ {
+ "questionId": "q111",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 1648.2013329999754
+ },
+ {
+ "questionId": "q111",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 136,
+ "latencyMs": 3078.3677920000628
+ },
+ {
+ "questionId": "q111",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 953.482166999951
+ },
+ {
+ "questionId": "q111",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2107.5470000000205
+ },
+ {
+ "questionId": "q111",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 72,
+ "latencyMs": 2056.58216599992
+ },
+ {
+ "questionId": "q111",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 1345.5024170000106
+ },
+ {
+ "questionId": "q111",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1387.981958999997
+ },
+ {
+ "questionId": "q111",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 4423,
+ "outputTokens": 136,
+ "latencyMs": 3227.920458999928
+ },
+ {
+ "questionId": "q111",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 1789.7077919999138
+ },
+ {
+ "questionId": "q111",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 3015.3227080000797
+ },
+ {
+ "questionId": "q111",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 200,
+ "latencyMs": 2481.5284170000814
+ },
+ {
+ "questionId": "q111",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 2319.2710829999996
+ },
+ {
+ "questionId": "q111",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6191",
+ "actual": "6191",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1736.7912920000963
+ },
+ {
+ "questionId": "q112",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 138,
+ "latencyMs": 2613.5518750000047
+ },
+ {
+ "questionId": "q112",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 1411.1959170000628
+ },
+ {
+ "questionId": "q112",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 2631.1534589999355
+ },
+ {
+ "questionId": "q112",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 74,
+ "latencyMs": 2247.1309170000022
+ },
+ {
+ "questionId": "q112",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 935.4031660000328
+ },
+ {
+ "questionId": "q112",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 3261.111125000054
+ },
+ {
+ "questionId": "q112",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 1440,
"outputTokens": 74,
- "latencyMs": 3640.193708000006
+ "latencyMs": 2420.4490409999853
},
{
- "questionId": "q108",
+ "questionId": "q112",
"format": "csv",
"model": "claude-haiku-4-5",
- "expected": "9302.76",
- "actual": "9302.76",
+ "expected": "1854.66",
+ "actual": "1854.66",
"isCorrect": true,
"inputTokens": 1444,
"outputTokens": 8,
- "latencyMs": 983.806166000024
- },
- {
- "questionId": "q108",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "9302.76",
- "actual": "9302.76",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 266,
- "latencyMs": 2604.2135000000126
- },
- {
- "questionId": "q108",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "9302.76",
- "actual": "9302.76",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1128.6182499999995
- },
- {
- "questionId": "q108",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "9302.76",
- "actual": "9302.76",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 2548.5608749999956
- },
- {
- "questionId": "q108",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "9302.76",
- "actual": "9302.76",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 1029.5365000000165
- },
- {
- "questionId": "q109",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 136,
- "latencyMs": 3983.6009170000034
- },
- {
- "questionId": "q109",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 1095.2366250000196
- },
- {
- "questionId": "q109",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 72,
- "latencyMs": 2207.884417000023
- },
- {
- "questionId": "q109",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 2292.4111660000053
- },
- {
- "questionId": "q109",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 136,
- "latencyMs": 2749.430541000009
- },
- {
- "questionId": "q109",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 1215.8329999999842
- },
- {
- "questionId": "q109",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 136,
- "latencyMs": 2086.6161659999925
- },
- {
- "questionId": "q109",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 1299.715790999995
- },
- {
- "questionId": "q109",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 7107.394916999998
- },
- {
- "questionId": "q109",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "3285",
- "actual": "3285",
- "isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 899.2319579999894
- },
- {
- "questionId": "q110",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 2810.5213330000115
- },
- {
- "questionId": "q110",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 989.2326659999962
- },
- {
- "questionId": "q110",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 1562,
- "outputTokens": 138,
- "latencyMs": 2622.7841670000053
- },
- {
- "questionId": "q110",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 1508,
- "outputTokens": 8,
- "latencyMs": 850.1227920000092
- },
- {
- "questionId": "q110",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 138,
- "latencyMs": 3057.1578750000044
- },
- {
- "questionId": "q110",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 1261.3340000000026
- },
- {
- "questionId": "q110",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 202,
- "latencyMs": 3061.791499999992
- },
- {
- "questionId": "q110",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1196.6509999999835
- },
- {
- "questionId": "q110",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 3567.4540839999972
- },
- {
- "questionId": "q110",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "3826.93",
- "actual": "3826.93",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 1033.8556249999965
- },
- {
- "questionId": "q111",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 136,
- "latencyMs": 2842.961707999988
- },
- {
- "questionId": "q111",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 1258.130582999991
- },
- {
- "questionId": "q111",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 456,
- "latencyMs": 5828.652415999997
- },
- {
- "questionId": "q111",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 1004.821958000015
- },
- {
- "questionId": "q111",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 72,
- "latencyMs": 3102.38612499999
- },
- {
- "questionId": "q111",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 1454.8658750000177
- },
- {
- "questionId": "q111",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 136,
- "latencyMs": 2018.8434999999881
- },
- {
- "questionId": "q111",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 1237.4057080000057
- },
- {
- "questionId": "q111",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 3670.7451670000155
- },
- {
- "questionId": "q111",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "6191",
- "actual": "6191",
- "isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 1070.646584000002
+ "latencyMs": 1112.1383340000175
},
{
"questionId": "q112",
- "format": "json",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 2340.017957999953
+ },
+ {
+ "questionId": "q112",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "1854.66",
"actual": "1854.66",
"isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 202,
- "latencyMs": 3731.3879579999775
- },
- {
- "questionId": "q112",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "1854.66",
- "actual": "1854.66",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 1387.9798329999903
- },
- {
- "questionId": "q112",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "1854.66",
- "actual": "1854.66",
- "isCorrect": true,
- "inputTokens": 1562,
+ "inputTokens": 4422,
"outputTokens": 394,
- "latencyMs": 5560.397957999987
+ "latencyMs": 17092.246334000025
},
{
"questionId": "q112",
- "format": "toon",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "1854.66",
"actual": "1854.66",
"isCorrect": true,
- "inputTokens": 1508,
+ "inputTokens": 4786,
"outputTokens": 8,
- "latencyMs": 1552.963958999986
+ "latencyMs": 1153.1710829999065
},
{
"questionId": "q112",
- "format": "csv",
- "model": "gpt-5-nano",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
"expected": "1854.66",
"actual": "1854.66",
"isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 138,
- "latencyMs": 21759.84366700001
- },
- {
- "questionId": "q112",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "1854.66",
- "actual": "1854.66",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 1132.519083000021
- },
- {
- "questionId": "q112",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "1854.66",
- "actual": "1854.66",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 138,
- "latencyMs": 2277.2652499999967
- },
- {
- "questionId": "q112",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "1854.66",
- "actual": "1854.66",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1098.0825420000183
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1490.9894589999458
},
{
"questionId": "q112",
@@ -12306,7 +18455,7 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 202,
- "latencyMs": 2813.10504200001
+ "latencyMs": 3339.092583000078
},
{
"questionId": "q112",
@@ -12317,7 +18466,18 @@
"isCorrect": true,
"inputTokens": 3109,
"outputTokens": 8,
- "latencyMs": 1131.9674159999995
+ "latencyMs": 1555.5642919999082
+ },
+ {
+ "questionId": "q112",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1854.66",
+ "actual": "1854.66",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 2120.2490830000024
},
{
"questionId": "q113",
@@ -12327,8 +18487,8 @@
"actual": "4696",
"isCorrect": true,
"inputTokens": 3712,
- "outputTokens": 136,
- "latencyMs": 6657.446207999979
+ "outputTokens": 200,
+ "latencyMs": 3111.5985420000507
},
{
"questionId": "q113",
@@ -12339,7 +18499,18 @@
"isCorrect": true,
"inputTokens": 4080,
"outputTokens": 6,
- "latencyMs": 1265.4548749999958
+ "latencyMs": 968.7054999999236
+ },
+ {
+ "questionId": "q113",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "4696",
+ "actual": "4696",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 3022.979249999975
},
{
"questionId": "q113",
@@ -12350,7 +18521,7 @@
"isCorrect": true,
"inputTokens": 1563,
"outputTokens": 136,
- "latencyMs": 3299.298792000016
+ "latencyMs": 3835.2764579999493
},
{
"questionId": "q113",
@@ -12361,7 +18532,18 @@
"isCorrect": true,
"inputTokens": 1509,
"outputTokens": 6,
- "latencyMs": 1618.5091249999823
+ "latencyMs": 1366.261957999901
+ },
+ {
+ "questionId": "q113",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "4696",
+ "actual": "4696",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 1964.8687499999069
},
{
"questionId": "q113",
@@ -12371,8 +18553,8 @@
"actual": "4696",
"isCorrect": true,
"inputTokens": 1441,
- "outputTokens": 136,
- "latencyMs": 5353.29241699999
+ "outputTokens": 264,
+ "latencyMs": 3045.071499999962
},
{
"questionId": "q113",
@@ -12383,29 +18565,51 @@
"isCorrect": true,
"inputTokens": 1445,
"outputTokens": 6,
- "latencyMs": 870.5113749999728
+ "latencyMs": 804.4215829999885
},
{
"questionId": "q113",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "4696",
+ "actual": "4696",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1822.1931249999907
+ },
+ {
+ "questionId": "q113",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "4696",
"actual": "4696",
"isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 200,
- "latencyMs": 2780.5659159999923
+ "inputTokens": 4423,
+ "outputTokens": 136,
+ "latencyMs": 2214.7718329998897
},
{
"questionId": "q113",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "4696",
"actual": "4696",
"isCorrect": true,
- "inputTokens": 3415,
+ "inputTokens": 4787,
"outputTokens": 6,
- "latencyMs": 1069.2415409999958
+ "latencyMs": 1151.622665999923
+ },
+ {
+ "questionId": "q113",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "4696",
+ "actual": "4696",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 1762.1509579999838
},
{
"questionId": "q113",
@@ -12416,7 +18620,7 @@
"isCorrect": true,
"inputTokens": 2985,
"outputTokens": 200,
- "latencyMs": 3036.145666999975
+ "latencyMs": 2739.4318329999223
},
{
"questionId": "q113",
@@ -12427,7 +18631,18 @@
"isCorrect": true,
"inputTokens": 3110,
"outputTokens": 6,
- "latencyMs": 1252.9633329999924
+ "latencyMs": 1074.2716670000227
+ },
+ {
+ "questionId": "q113",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "4696",
+ "actual": "4696",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1362.9514999999665
},
{
"questionId": "q114",
@@ -12438,7 +18653,7 @@
"isCorrect": true,
"inputTokens": 3711,
"outputTokens": 138,
- "latencyMs": 2617.047249999974
+ "latencyMs": 2877.9115410000086
},
{
"questionId": "q114",
@@ -12449,7 +18664,18 @@
"isCorrect": true,
"inputTokens": 4079,
"outputTokens": 8,
- "latencyMs": 1261.9117079999996
+ "latencyMs": 1239.7438750000438
+ },
+ {
+ "questionId": "q114",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "4211.6",
+ "actual": "4211.6",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 6,
+ "latencyMs": 1514.1683330000378
},
{
"questionId": "q114",
@@ -12460,7 +18686,7 @@
"isCorrect": true,
"inputTokens": 1562,
"outputTokens": 202,
- "latencyMs": 6192.06358300001
+ "latencyMs": 2804.6751670000376
},
{
"questionId": "q114",
@@ -12471,7 +18697,18 @@
"isCorrect": true,
"inputTokens": 1508,
"outputTokens": 8,
- "latencyMs": 1158.3806249999907
+ "latencyMs": 979.8223330000183
+ },
+ {
+ "questionId": "q114",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "4211.6",
+ "actual": "4211.6",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 6,
+ "latencyMs": 2323.508334000013
},
{
"questionId": "q114",
@@ -12481,8 +18718,8 @@
"actual": "4211.6",
"isCorrect": true,
"inputTokens": 1440,
- "outputTokens": 138,
- "latencyMs": 2867.840083999996
+ "outputTokens": 74,
+ "latencyMs": 1690.5704579999438
},
{
"questionId": "q114",
@@ -12493,29 +18730,51 @@
"isCorrect": true,
"inputTokens": 1444,
"outputTokens": 8,
- "latencyMs": 856.2939580000238
+ "latencyMs": 886.4768329999642
},
{
"questionId": "q114",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "4211.6",
+ "actual": "4211.6",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 6,
+ "latencyMs": 1805.5540000000037
+ },
+ {
+ "questionId": "q114",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "4211.6",
"actual": "4211.6",
"isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 138,
- "latencyMs": 2329.6339579999913
+ "inputTokens": 4422,
+ "outputTokens": 266,
+ "latencyMs": 4743.464458000031
},
{
"questionId": "q114",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "4211.6",
"actual": "4211.6",
"isCorrect": true,
- "inputTokens": 3414,
+ "inputTokens": 4786,
"outputTokens": 8,
- "latencyMs": 1106.5591669999994
+ "latencyMs": 1165.764332999941
+ },
+ {
+ "questionId": "q114",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "4211.6",
+ "actual": "4211.6",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 6,
+ "latencyMs": 2148.3432500000345
},
{
"questionId": "q114",
@@ -12526,7 +18785,7 @@
"isCorrect": true,
"inputTokens": 2984,
"outputTokens": 138,
- "latencyMs": 2590.7533330000006
+ "latencyMs": 2704.757041999954
},
{
"questionId": "q114",
@@ -12537,7 +18796,18 @@
"isCorrect": true,
"inputTokens": 3109,
"outputTokens": 8,
- "latencyMs": 1007.0892920000188
+ "latencyMs": 1058.6455829999177
+ },
+ {
+ "questionId": "q114",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "4211.6",
+ "actual": "4211.6",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 6,
+ "latencyMs": 2256.7089169999817
},
{
"questionId": "q115",
@@ -12547,8 +18817,8 @@
"actual": "6196",
"isCorrect": true,
"inputTokens": 3712,
- "outputTokens": 200,
- "latencyMs": 3839.2745000000286
+ "outputTokens": 136,
+ "latencyMs": 2360.8099159999983
},
{
"questionId": "q115",
@@ -12559,7 +18829,18 @@
"isCorrect": true,
"inputTokens": 4080,
"outputTokens": 6,
- "latencyMs": 1388.2399160000205
+ "latencyMs": 1535.8384579999838
+ },
+ {
+ "questionId": "q115",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6196",
+ "actual": "6196",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 3278.595083000022
},
{
"questionId": "q115",
@@ -12569,8 +18850,8 @@
"actual": "6196",
"isCorrect": true,
"inputTokens": 1563,
- "outputTokens": 200,
- "latencyMs": 3955.22095800002
+ "outputTokens": 328,
+ "latencyMs": 7969.119124999968
},
{
"questionId": "q115",
@@ -12581,7 +18862,18 @@
"isCorrect": true,
"inputTokens": 1509,
"outputTokens": 6,
- "latencyMs": 1036.567458000005
+ "latencyMs": 1099.6044580000453
+ },
+ {
+ "questionId": "q115",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6196",
+ "actual": "6196",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2112.666833000025
},
{
"questionId": "q115",
@@ -12591,8 +18883,8 @@
"actual": "6196",
"isCorrect": true,
"inputTokens": 1441,
- "outputTokens": 200,
- "latencyMs": 5566.705209000007
+ "outputTokens": 72,
+ "latencyMs": 1636.6678329999559
},
{
"questionId": "q115",
@@ -12603,29 +18895,51 @@
"isCorrect": true,
"inputTokens": 1445,
"outputTokens": 6,
- "latencyMs": 1078.5011670000094
+ "latencyMs": 902.907957999967
},
{
"questionId": "q115",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6196",
+ "actual": "6196",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1787.2734170000767
+ },
+ {
+ "questionId": "q115",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "6196",
"actual": "6196",
"isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 200,
- "latencyMs": 2956.9618330000376
+ "inputTokens": 4423,
+ "outputTokens": 264,
+ "latencyMs": 3207.286208000034
},
{
"questionId": "q115",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "6196",
"actual": "6196",
"isCorrect": true,
- "inputTokens": 3415,
+ "inputTokens": 4787,
"outputTokens": 6,
- "latencyMs": 1797.4496250000084
+ "latencyMs": 1176.4805000000633
+ },
+ {
+ "questionId": "q115",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6196",
+ "actual": "6196",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 3314.0558330001077
},
{
"questionId": "q115",
@@ -12635,8 +18949,8 @@
"actual": "6196",
"isCorrect": true,
"inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 2647.741832999978
+ "outputTokens": 200,
+ "latencyMs": 5537.94308300002
},
{
"questionId": "q115",
@@ -12647,7 +18961,18 @@
"isCorrect": true,
"inputTokens": 3110,
"outputTokens": 6,
- "latencyMs": 1221.9055410000146
+ "latencyMs": 914.5840419998858
+ },
+ {
+ "questionId": "q115",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6196",
+ "actual": "6196",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1747.4003750000848
},
{
"questionId": "q116",
@@ -12657,8 +18982,8 @@
"actual": "6105.3",
"isCorrect": true,
"inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 3783.334333000006
+ "outputTokens": 202,
+ "latencyMs": 5452.725000000093
},
{
"questionId": "q116",
@@ -12669,7 +18994,18 @@
"isCorrect": true,
"inputTokens": 4079,
"outputTokens": 8,
- "latencyMs": 1135.7771670000511
+ "latencyMs": 1257.8495419999817
+ },
+ {
+ "questionId": "q116",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 6,
+ "latencyMs": 1183.2777500000084
},
{
"questionId": "q116",
@@ -12679,668 +19015,998 @@
"actual": "6105.3",
"isCorrect": true,
"inputTokens": 1562,
- "outputTokens": 266,
- "latencyMs": 3364.4232920000213
- },
- {
- "questionId": "q116",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 1508,
- "outputTokens": 8,
- "latencyMs": 1161.263666999992
- },
- {
- "questionId": "q116",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 74,
- "latencyMs": 3646.0659589999705
- },
- {
- "questionId": "q116",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 955.7597500000265
- },
- {
- "questionId": "q116",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 74,
- "latencyMs": 2345.2203750000335
- },
- {
- "questionId": "q116",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1541.918249999988
- },
- {
- "questionId": "q116",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 6126.976708000002
- },
- {
- "questionId": "q116",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "6105.3",
- "actual": "6105.3",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 1097.440709000046
- },
- {
- "questionId": "q117",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 264,
- "latencyMs": 3404.643708999967
- },
- {
- "questionId": "q117",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 1227.7047499999753
- },
- {
- "questionId": "q117",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 136,
- "latencyMs": 2495.85037499998
- },
- {
- "questionId": "q117",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 1048.344832999981
- },
- {
- "questionId": "q117",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 136,
- "latencyMs": 3007.2462499999674
- },
- {
- "questionId": "q117",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 840.0351669999654
- },
- {
- "questionId": "q117",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 328,
- "latencyMs": 3149.872374999977
- },
- {
- "questionId": "q117",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 973.716167000006
- },
- {
- "questionId": "q117",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 456,
- "latencyMs": 5305.827791999967
- },
- {
- "questionId": "q117",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "6528",
- "actual": "6528",
- "isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 953.3122500000172
- },
- {
- "questionId": "q118",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 3435.850167000026
- },
- {
- "questionId": "q118",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 1110.8856249999953
- },
- {
- "questionId": "q118",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 1562,
- "outputTokens": 266,
- "latencyMs": 3303.3427500000107
- },
- {
- "questionId": "q118",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 1508,
- "outputTokens": 8,
- "latencyMs": 954.5857910000486
- },
- {
- "questionId": "q118",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 138,
- "latencyMs": 5035.666582999984
- },
- {
- "questionId": "q118",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 867.9529159999802
- },
- {
- "questionId": "q118",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 202,
- "latencyMs": 2817.1118750000023
- },
- {
- "questionId": "q118",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1029.4406660000095
- },
- {
- "questionId": "q118",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 2521.28145900002
- },
- {
- "questionId": "q118",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "1136.09",
- "actual": "1136.09",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 1266.9695000000065
- },
- {
- "questionId": "q119",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 72,
- "latencyMs": 2383.6225830000476
- },
- {
- "questionId": "q119",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 1100.3007499999949
- },
- {
- "questionId": "q119",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 200,
- "latencyMs": 2816.252374999982
- },
- {
- "questionId": "q119",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 1030.0248330000322
- },
- {
- "questionId": "q119",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 72,
- "latencyMs": 1819.5161669999943
- },
- {
- "questionId": "q119",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 1012.0581670000101
- },
- {
- "questionId": "q119",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 136,
- "latencyMs": 2960.8910000000033
- },
- {
- "questionId": "q119",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 1346.7110000000102
- },
- {
- "questionId": "q119",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 3081.40625
- },
- {
- "questionId": "q119",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "4689",
- "actual": "4689",
- "isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 1485.0133330000099
- },
- {
- "questionId": "q120",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 138,
- "latencyMs": 3632.860875000013
- },
- {
- "questionId": "q120",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 1224.803750000021
- },
- {
- "questionId": "q120",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 1562,
- "outputTokens": 138,
- "latencyMs": 2323.675958000007
- },
- {
- "questionId": "q120",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 1508,
- "outputTokens": 8,
- "latencyMs": 1114.0831669999752
- },
- {
- "questionId": "q120",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 1440,
- "outputTokens": 202,
- "latencyMs": 3465.111333000008
- },
- {
- "questionId": "q120",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 1444,
- "outputTokens": 8,
- "latencyMs": 1082.4990419999813
- },
- {
- "questionId": "q120",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 138,
- "latencyMs": 5648.285415999999
- },
- {
- "questionId": "q120",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 3414,
- "outputTokens": 8,
- "latencyMs": 1087.8757500000065
- },
- {
- "questionId": "q120",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 4587.399166000017
- },
- {
- "questionId": "q120",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "2637.73",
- "actual": "2637.73",
- "isCorrect": true,
- "inputTokens": 3109,
- "outputTokens": 8,
- "latencyMs": 1007.4333340000012
- },
- {
- "questionId": "q121",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 3712,
- "outputTokens": 72,
- "latencyMs": 2307.9398339999607
- },
- {
- "questionId": "q121",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 4080,
- "outputTokens": 6,
- "latencyMs": 2368.3719580000034
- },
- {
- "questionId": "q121",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 1563,
- "outputTokens": 200,
- "latencyMs": 3587.720166999963
- },
- {
- "questionId": "q121",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 1509,
- "outputTokens": 6,
- "latencyMs": 1053.9867080000113
- },
- {
- "questionId": "q121",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 1441,
- "outputTokens": 136,
- "latencyMs": 1593.4699169999803
- },
- {
- "questionId": "q121",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 1445,
- "outputTokens": 6,
- "latencyMs": 2256.4729170000064
- },
- {
- "questionId": "q121",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 3829,
- "outputTokens": 200,
- "latencyMs": 4466.158916999993
- },
- {
- "questionId": "q121",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 3415,
- "outputTokens": 6,
- "latencyMs": 1305.1236670000362
- },
- {
- "questionId": "q121",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 2985,
- "outputTokens": 136,
- "latencyMs": 3014.9748339999933
- },
- {
- "questionId": "q121",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "5685",
- "actual": "5685",
- "isCorrect": true,
- "inputTokens": 3110,
- "outputTokens": 6,
- "latencyMs": 1421.9597920000087
- },
- {
- "questionId": "q122",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "3421.06",
- "actual": "3421.06",
- "isCorrect": true,
- "inputTokens": 3711,
- "outputTokens": 202,
- "latencyMs": 19503.25695900002
- },
- {
- "questionId": "q122",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "3421.06",
- "actual": "3421.06",
- "isCorrect": true,
- "inputTokens": 4079,
- "outputTokens": 8,
- "latencyMs": 1164.002959000005
- },
- {
- "questionId": "q122",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "3421.06",
- "actual": "3421.06",
- "isCorrect": true,
- "inputTokens": 1562,
"outputTokens": 330,
- "latencyMs": 4662.637042000017
+ "latencyMs": 7140.693124999991
+ },
+ {
+ "questionId": "q116",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 1131.5447919999715
+ },
+ {
+ "questionId": "q116",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 6,
+ "latencyMs": 2556.5294579999754
+ },
+ {
+ "questionId": "q116",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 1440,
+ "outputTokens": 266,
+ "latencyMs": 3158.0195420000236
+ },
+ {
+ "questionId": "q116",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 926.703375000041
+ },
+ {
+ "questionId": "q116",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 6,
+ "latencyMs": 2144.0341659999685
+ },
+ {
+ "questionId": "q116",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 202,
+ "latencyMs": 3109.7603749999544
+ },
+ {
+ "questionId": "q116",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "6105.3",
+ "actual": "6105.30",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1212.1927079999587
+ },
+ {
+ "questionId": "q116",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 6,
+ "latencyMs": 3449.487916999962
+ },
+ {
+ "questionId": "q116",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 138,
+ "latencyMs": 2570.9303749999963
+ },
+ {
+ "questionId": "q116",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1058.9517500000075
+ },
+ {
+ "questionId": "q116",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6105.3",
+ "actual": "6105.3",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 6,
+ "latencyMs": 1379.4884169999277
+ },
+ {
+ "questionId": "q117",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 200,
+ "latencyMs": 2630.738624999998
+ },
+ {
+ "questionId": "q117",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 884.325959000038
+ },
+ {
+ "questionId": "q117",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 2599.299457999994
+ },
+ {
+ "questionId": "q117",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 200,
+ "latencyMs": 5174.115041999961
+ },
+ {
+ "questionId": "q117",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 1230.3996659999248
+ },
+ {
+ "questionId": "q117",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 2081.4514590000035
+ },
+ {
+ "questionId": "q117",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 456,
+ "latencyMs": 4708.666958000045
+ },
+ {
+ "questionId": "q117",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 1065.470417000004
+ },
+ {
+ "questionId": "q117",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1987.3131250001024
+ },
+ {
+ "questionId": "q117",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 4423,
+ "outputTokens": 200,
+ "latencyMs": 3420.324041999993
+ },
+ {
+ "questionId": "q117",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 897.2685829999391
+ },
+ {
+ "questionId": "q117",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 1442.7957500000484
+ },
+ {
+ "questionId": "q117",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 264,
+ "latencyMs": 3038.6226250000764
+ },
+ {
+ "questionId": "q117",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 1260.5887920000823
+ },
+ {
+ "questionId": "q117",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6528",
+ "actual": "6528",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1877.516042000032
+ },
+ {
+ "questionId": "q118",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 266,
+ "latencyMs": 40974.3431249999
+ },
+ {
+ "questionId": "q118",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 867.1927500000456
+ },
+ {
+ "questionId": "q118",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 3284.4902500000317
+ },
+ {
+ "questionId": "q118",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 586,
+ "latencyMs": 5396.599999999977
+ },
+ {
+ "questionId": "q118",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 1174.796290999977
+ },
+ {
+ "questionId": "q118",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 2751.699709000066
+ },
+ {
+ "questionId": "q118",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 1440,
+ "outputTokens": 138,
+ "latencyMs": 3463.471459000022
+ },
+ {
+ "questionId": "q118",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 925.253083000076
+ },
+ {
+ "questionId": "q118",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 3240.4625000000233
+ },
+ {
+ "questionId": "q118",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 138,
+ "latencyMs": 7405.421083000023
+ },
+ {
+ "questionId": "q118",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1061.0794160000514
+ },
+ {
+ "questionId": "q118",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1512.5596659999574
+ },
+ {
+ "questionId": "q118",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 138,
+ "latencyMs": 2445.1606250000186
+ },
+ {
+ "questionId": "q118",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1296.5266660000198
+ },
+ {
+ "questionId": "q118",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1136.09",
+ "actual": "1136.09",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 1523.473083000048
+ },
+ {
+ "questionId": "q119",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 392,
+ "latencyMs": 4885.794165999978
+ },
+ {
+ "questionId": "q119",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 958.9109579999931
+ },
+ {
+ "questionId": "q119",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 2268.0900839999085
+ },
+ {
+ "questionId": "q119",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 648,
+ "latencyMs": 12410.339000000036
+ },
+ {
+ "questionId": "q119",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 1124.1954169999808
+ },
+ {
+ "questionId": "q119",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 1842.937042000005
+ },
+ {
+ "questionId": "q119",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 200,
+ "latencyMs": 14746.862250000006
+ },
+ {
+ "questionId": "q119",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 1070.885459000012
+ },
+ {
+ "questionId": "q119",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 2808.225791999954
+ },
+ {
+ "questionId": "q119",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 4423,
+ "outputTokens": 264,
+ "latencyMs": 2815.092042000033
+ },
+ {
+ "questionId": "q119",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 1285.6015419999603
+ },
+ {
+ "questionId": "q119",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 1620.0065000000177
+ },
+ {
+ "questionId": "q119",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 136,
+ "latencyMs": 3353.4782089999644
+ },
+ {
+ "questionId": "q119",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 1281.6234170000535
+ },
+ {
+ "questionId": "q119",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "4689",
+ "actual": "4689",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1903.9000839999644
+ },
+ {
+ "questionId": "q120",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 330,
+ "latencyMs": 3469.9373749999795
+ },
+ {
+ "questionId": "q120",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 1129.299417000031
+ },
+ {
+ "questionId": "q120",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 1843.423833000008
+ },
+ {
+ "questionId": "q120",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 74,
+ "latencyMs": 3029.9955000000773
+ },
+ {
+ "questionId": "q120",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 1508,
+ "outputTokens": 8,
+ "latencyMs": 976.265458000009
+ },
+ {
+ "questionId": "q120",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 1941.5176659999415
+ },
+ {
+ "questionId": "q120",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 1440,
+ "outputTokens": 138,
+ "latencyMs": 2326.60387500003
+ },
+ {
+ "questionId": "q120",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 1444,
+ "outputTokens": 8,
+ "latencyMs": 1340.7505420000525
+ },
+ {
+ "questionId": "q120",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 3061.3734159999294
+ },
+ {
+ "questionId": "q120",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 4422,
+ "outputTokens": 330,
+ "latencyMs": 18444.37216700008
+ },
+ {
+ "questionId": "q120",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 4786,
+ "outputTokens": 8,
+ "latencyMs": 1472.8980000000447
+ },
+ {
+ "questionId": "q120",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1203.1091250000754
+ },
+ {
+ "questionId": "q120",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 2984,
+ "outputTokens": 266,
+ "latencyMs": 6852.723041999969
+ },
+ {
+ "questionId": "q120",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 3109,
+ "outputTokens": 8,
+ "latencyMs": 1186.3190000000177
+ },
+ {
+ "questionId": "q120",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "2637.73",
+ "actual": "2637.73",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 2720.8557080000173
+ },
+ {
+ "questionId": "q121",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 3712,
+ "outputTokens": 200,
+ "latencyMs": 9941.250375000061
+ },
+ {
+ "questionId": "q121",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 4080,
+ "outputTokens": 6,
+ "latencyMs": 1254.0278750000289
+ },
+ {
+ "questionId": "q121",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 4784,
+ "outputTokens": 4,
+ "latencyMs": 3998.6611660000635
+ },
+ {
+ "questionId": "q121",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 1563,
+ "outputTokens": 72,
+ "latencyMs": 2154.672750000027
+ },
+ {
+ "questionId": "q121",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 1509,
+ "outputTokens": 6,
+ "latencyMs": 1019.1613750000251
+ },
+ {
+ "questionId": "q121",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 2271,
+ "outputTokens": 4,
+ "latencyMs": 1623.1509579999838
+ },
+ {
+ "questionId": "q121",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 1441,
+ "outputTokens": 200,
+ "latencyMs": 5643.6689169999445
+ },
+ {
+ "questionId": "q121",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 1445,
+ "outputTokens": 6,
+ "latencyMs": 908.8649170000572
+ },
+ {
+ "questionId": "q121",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 2208,
+ "outputTokens": 4,
+ "latencyMs": 1939.4002079999773
+ },
+ {
+ "questionId": "q121",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "5685",
+ "actual": "7409",
+ "isCorrect": false,
+ "inputTokens": 4423,
+ "outputTokens": 392,
+ "latencyMs": 18020.185499999905
+ },
+ {
+ "questionId": "q121",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 4787,
+ "outputTokens": 6,
+ "latencyMs": 1167.9574999999022
+ },
+ {
+ "questionId": "q121",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 5431,
+ "outputTokens": 4,
+ "latencyMs": 2516.0782500000205
+ },
+ {
+ "questionId": "q121",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 2985,
+ "outputTokens": 136,
+ "latencyMs": 3538.66266599996
+ },
+ {
+ "questionId": "q121",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 3110,
+ "outputTokens": 6,
+ "latencyMs": 1074.641707999981
+ },
+ {
+ "questionId": "q121",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "5685",
+ "actual": "5685",
+ "isCorrect": true,
+ "inputTokens": 3814,
+ "outputTokens": 4,
+ "latencyMs": 1611.2575829999987
+ },
+ {
+ "questionId": "q122",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 3711,
+ "outputTokens": 202,
+ "latencyMs": 3097.4197080000304
+ },
+ {
+ "questionId": "q122",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 4079,
+ "outputTokens": 8,
+ "latencyMs": 1068.923999999999
+ },
+ {
+ "questionId": "q122",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 4783,
+ "outputTokens": 7,
+ "latencyMs": 1952.0416250000708
+ },
+ {
+ "questionId": "q122",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 1562,
+ "outputTokens": 906,
+ "latencyMs": 11804.22670800006
},
{
"questionId": "q122",
@@ -13351,7 +20017,18 @@
"isCorrect": true,
"inputTokens": 1508,
"outputTokens": 8,
- "latencyMs": 1086.9569170000032
+ "latencyMs": 1140.642707999912
+ },
+ {
+ "questionId": "q122",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 2270,
+ "outputTokens": 7,
+ "latencyMs": 3323.8447500000475
},
{
"questionId": "q122",
@@ -13362,7 +20039,7 @@
"isCorrect": true,
"inputTokens": 1440,
"outputTokens": 202,
- "latencyMs": 2683.73904200003
+ "latencyMs": 5759.3412499999395
},
{
"questionId": "q122",
@@ -13373,29 +20050,51 @@
"isCorrect": true,
"inputTokens": 1444,
"outputTokens": 8,
- "latencyMs": 2289.0300419999985
+ "latencyMs": 1174.6347079999978
},
{
"questionId": "q122",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 2207,
+ "outputTokens": 7,
+ "latencyMs": 1816.737458000076
+ },
+ {
+ "questionId": "q122",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "3421.06",
"actual": "3421.06",
"isCorrect": true,
- "inputTokens": 3828,
- "outputTokens": 74,
- "latencyMs": 1877.1760409999988
+ "inputTokens": 4422,
+ "outputTokens": 138,
+ "latencyMs": 14154.70395799994
},
{
"questionId": "q122",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "3421.06",
"actual": "3421.06",
"isCorrect": true,
- "inputTokens": 3414,
+ "inputTokens": 4786,
"outputTokens": 8,
- "latencyMs": 1460.1729160000104
+ "latencyMs": 1000.3886250000214
+ },
+ {
+ "questionId": "q122",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 5430,
+ "outputTokens": 7,
+ "latencyMs": 1258.68512499996
},
{
"questionId": "q122",
@@ -13405,8 +20104,8 @@
"actual": "3421.06",
"isCorrect": true,
"inputTokens": 2984,
- "outputTokens": 138,
- "latencyMs": 2582.983708999993
+ "outputTokens": 202,
+ "latencyMs": 2957.2190829999745
},
{
"questionId": "q122",
@@ -13417,7 +20116,18 @@
"isCorrect": true,
"inputTokens": 3109,
"outputTokens": 8,
- "latencyMs": 1014.1320839999826
+ "latencyMs": 1128.0480420000385
+ },
+ {
+ "questionId": "q122",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "3421.06",
+ "actual": "3421.06",
+ "isCorrect": true,
+ "inputTokens": 3813,
+ "outputTokens": 7,
+ "latencyMs": 1714.4717499999097
},
{
"questionId": "q123",
@@ -13427,8 +20137,8 @@
"actual": "344498",
"isCorrect": true,
"inputTokens": 3709,
- "outputTokens": 2376,
- "latencyMs": 26290.846458000015
+ "outputTokens": 2632,
+ "latencyMs": 31555.039709000033
},
{
"questionId": "q123",
@@ -13439,7 +20149,18 @@
"isCorrect": false,
"inputTokens": 4077,
"outputTokens": 7,
- "latencyMs": 1288.6627500000177
+ "latencyMs": 1094.905458000023
+ },
+ {
+ "questionId": "q123",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "344498",
+ "actual": "340900",
+ "isCorrect": false,
+ "inputTokens": 4777,
+ "outputTokens": 6,
+ "latencyMs": 11993.166834000032
},
{
"questionId": "q123",
@@ -13449,8 +20170,8 @@
"actual": "344498",
"isCorrect": true,
"inputTokens": 1560,
- "outputTokens": 1736,
- "latencyMs": 13565.930124999955
+ "outputTokens": 4360,
+ "latencyMs": 47190.18545800005
},
{
"questionId": "q123",
@@ -13461,7 +20182,18 @@
"isCorrect": false,
"inputTokens": 1506,
"outputTokens": 7,
- "latencyMs": 1190.8501249999972
+ "latencyMs": 1098.8443330000155
+ },
+ {
+ "questionId": "q123",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "344498",
+ "actual": "344900",
+ "isCorrect": false,
+ "inputTokens": 2264,
+ "outputTokens": 6,
+ "latencyMs": 5982.8935409999685
},
{
"questionId": "q123",
@@ -13471,8 +20203,8 @@
"actual": "344498",
"isCorrect": true,
"inputTokens": 1438,
- "outputTokens": 2888,
- "latencyMs": 21377.612083000015
+ "outputTokens": 3080,
+ "latencyMs": 27390.594666999998
},
{
"questionId": "q123",
@@ -13483,29 +20215,51 @@
"isCorrect": false,
"inputTokens": 1442,
"outputTokens": 7,
- "latencyMs": 931.349749999994
+ "latencyMs": 1168.8217080000322
},
{
"questionId": "q123",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "344498",
+ "actual": "349900",
+ "isCorrect": false,
+ "inputTokens": 2201,
+ "outputTokens": 6,
+ "latencyMs": 5658.501500000013
+ },
+ {
+ "questionId": "q123",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "344498",
"actual": "344498",
"isCorrect": true,
- "inputTokens": 3826,
- "outputTokens": 3208,
- "latencyMs": 18997.804958999972
+ "inputTokens": 4420,
+ "outputTokens": 3592,
+ "latencyMs": 25827.663583000074
},
{
"questionId": "q123",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "344498",
- "actual": "188,647",
+ "actual": "372,089",
"isCorrect": false,
- "inputTokens": 3412,
+ "inputTokens": 4784,
"outputTokens": 7,
- "latencyMs": 1185.3518330000225
+ "latencyMs": 1297.9579999999842
+ },
+ {
+ "questionId": "q123",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "344498",
+ "actual": "340900",
+ "isCorrect": false,
+ "inputTokens": 5424,
+ "outputTokens": 6,
+ "latencyMs": 7942.432666000095
},
{
"questionId": "q123",
@@ -13515,8 +20269,8 @@
"actual": "344498",
"isCorrect": true,
"inputTokens": 2982,
- "outputTokens": 2184,
- "latencyMs": 23924.366792000015
+ "outputTokens": 3144,
+ "latencyMs": 26846.991665999987
},
{
"questionId": "q123",
@@ -13527,7 +20281,18 @@
"isCorrect": false,
"inputTokens": 3107,
"outputTokens": 7,
- "latencyMs": 2958.913666999957
+ "latencyMs": 1012.253665999975
+ },
+ {
+ "questionId": "q123",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "344498",
+ "actual": "300900",
+ "isCorrect": false,
+ "inputTokens": 3807,
+ "outputTokens": 6,
+ "latencyMs": 1351.5872090000194
},
{
"questionId": "q124",
@@ -13537,8 +20302,8 @@
"actual": "312818.50",
"isCorrect": true,
"inputTokens": 3707,
- "outputTokens": 4170,
- "latencyMs": 29361.525874999992
+ "outputTokens": 4746,
+ "latencyMs": 38656.80637499993
},
{
"questionId": "q124",
@@ -13549,7 +20314,18 @@
"isCorrect": false,
"inputTokens": 4075,
"outputTokens": 9,
- "latencyMs": 1325.5311249999795
+ "latencyMs": 1336.5668340000557
+ },
+ {
+ "questionId": "q124",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "312818.50",
+ "actual": "300000.00",
+ "isCorrect": false,
+ "inputTokens": 4775,
+ "outputTokens": 9,
+ "latencyMs": 45570.00233399996
},
{
"questionId": "q124",
@@ -13559,8 +20335,8 @@
"actual": "312818.50",
"isCorrect": true,
"inputTokens": 1558,
- "outputTokens": 4106,
- "latencyMs": 37997.09958400001
+ "outputTokens": 3594,
+ "latencyMs": 36589.136415999965
},
{
"questionId": "q124",
@@ -13571,7 +20347,18 @@
"isCorrect": false,
"inputTokens": 1504,
"outputTokens": 9,
- "latencyMs": 1184.0957090000156
+ "latencyMs": 1009.5284579999279
+ },
+ {
+ "questionId": "q124",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "312818.50",
+ "actual": "320000.00",
+ "isCorrect": false,
+ "inputTokens": 2262,
+ "outputTokens": 9,
+ "latencyMs": 11883.04608400003
},
{
"questionId": "q124",
@@ -13581,8 +20368,8 @@
"actual": "312818.50",
"isCorrect": true,
"inputTokens": 1436,
- "outputTokens": 3658,
- "latencyMs": 26945.63508400001
+ "outputTokens": 3402,
+ "latencyMs": 209516.903208
},
{
"questionId": "q124",
@@ -13593,29 +20380,51 @@
"isCorrect": false,
"inputTokens": 1440,
"outputTokens": 9,
- "latencyMs": 1162.16949999996
+ "latencyMs": 1453.1753339999123
},
{
"questionId": "q124",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "312818.50",
+ "actual": "329999.99",
+ "isCorrect": false,
+ "inputTokens": 2199,
+ "outputTokens": 9,
+ "latencyMs": 12329.097540999996
+ },
+ {
+ "questionId": "q124",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "312818.50",
"actual": "312818.50",
"isCorrect": true,
- "inputTokens": 3824,
- "outputTokens": 3722,
- "latencyMs": 27321.698167000024
+ "inputTokens": 4418,
+ "outputTokens": 3274,
+ "latencyMs": 32337.936125000007
},
{
"questionId": "q124",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "312818.50",
- "actual": "381,968.89",
+ "actual": "381,847.89",
"isCorrect": false,
- "inputTokens": 3410,
+ "inputTokens": 4782,
"outputTokens": 9,
- "latencyMs": 2065.7583339999546
+ "latencyMs": 990.2755830000388
+ },
+ {
+ "questionId": "q124",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "312818.50",
+ "actual": "300000.00",
+ "isCorrect": false,
+ "inputTokens": 5422,
+ "outputTokens": 9,
+ "latencyMs": 12093.661916999961
},
{
"questionId": "q124",
@@ -13625,8 +20434,8 @@
"actual": "312818.50",
"isCorrect": true,
"inputTokens": 2980,
- "outputTokens": 3658,
- "latencyMs": 28778.99891600001
+ "outputTokens": 6730,
+ "latencyMs": 45238.25570800004
},
{
"questionId": "q124",
@@ -13637,7 +20446,18 @@
"isCorrect": false,
"inputTokens": 3105,
"outputTokens": 9,
- "latencyMs": 1233.4267090000212
+ "latencyMs": 1242.9971659999574
+ },
+ {
+ "questionId": "q124",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "312818.50",
+ "actual": "369000.00",
+ "isCorrect": false,
+ "inputTokens": 3805,
+ "outputTokens": 9,
+ "latencyMs": 1604.1214169999585
},
{
"questionId": "q125",
@@ -13647,8 +20467,8 @@
"actual": "1811",
"isCorrect": true,
"inputTokens": 3709,
- "outputTokens": 2568,
- "latencyMs": 28626.692666999996
+ "outputTokens": 2184,
+ "latencyMs": 22585.809791999985
},
{
"questionId": "q125",
@@ -13659,7 +20479,18 @@
"isCorrect": false,
"inputTokens": 4078,
"outputTokens": 7,
- "latencyMs": 1133.735584000009
+ "latencyMs": 1230.1040829999838
+ },
+ {
+ "questionId": "q125",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "1811",
+ "actual": "1811",
+ "isCorrect": true,
+ "inputTokens": 4777,
+ "outputTokens": 4,
+ "latencyMs": 9357.454415999935
},
{
"questionId": "q125",
@@ -13669,8 +20500,8 @@
"actual": "1811",
"isCorrect": true,
"inputTokens": 1560,
- "outputTokens": 1672,
- "latencyMs": 14898.688125000044
+ "outputTokens": 2888,
+ "latencyMs": 19966.08491700003
},
{
"questionId": "q125",
@@ -13681,7 +20512,18 @@
"isCorrect": false,
"inputTokens": 1507,
"outputTokens": 7,
- "latencyMs": 1178.2744999999995
+ "latencyMs": 961.2437919999938
+ },
+ {
+ "questionId": "q125",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "1811",
+ "actual": "1811",
+ "isCorrect": true,
+ "inputTokens": 2264,
+ "outputTokens": 4,
+ "latencyMs": 9139.956667000079
},
{
"questionId": "q125",
@@ -13691,8 +20533,8 @@
"actual": "1811",
"isCorrect": true,
"inputTokens": 1438,
- "outputTokens": 1864,
- "latencyMs": 15225.964540999965
+ "outputTokens": 2504,
+ "latencyMs": 21066.86054100003
},
{
"questionId": "q125",
@@ -13703,29 +20545,51 @@
"isCorrect": false,
"inputTokens": 1443,
"outputTokens": 7,
- "latencyMs": 1077.2695419999654
+ "latencyMs": 902.673208000022
},
{
"questionId": "q125",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "1811",
+ "actual": "1811",
+ "isCorrect": true,
+ "inputTokens": 2201,
+ "outputTokens": 4,
+ "latencyMs": 7727.039290999994
+ },
+ {
+ "questionId": "q125",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "1811",
"actual": "1811",
"isCorrect": true,
- "inputTokens": 3826,
- "outputTokens": 1928,
- "latencyMs": 14057.434583000024
+ "inputTokens": 4420,
+ "outputTokens": 1864,
+ "latencyMs": 15644.210124999983
},
{
"questionId": "q125",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "1811",
- "actual": "1,454",
+ "actual": "1,532",
"isCorrect": false,
- "inputTokens": 3413,
+ "inputTokens": 4785,
"outputTokens": 7,
- "latencyMs": 1177.537500000035
+ "latencyMs": 1311.9297919999808
+ },
+ {
+ "questionId": "q125",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "1811",
+ "actual": "1811",
+ "isCorrect": true,
+ "inputTokens": 5424,
+ "outputTokens": 4,
+ "latencyMs": 11031.984583999962
},
{
"questionId": "q125",
@@ -13735,8 +20599,8 @@
"actual": "1811",
"isCorrect": true,
"inputTokens": 2982,
- "outputTokens": 2312,
- "latencyMs": 19125.74099999998
+ "outputTokens": 1928,
+ "latencyMs": 26268.215167000075
},
{
"questionId": "q125",
@@ -13747,7 +20611,18 @@
"isCorrect": false,
"inputTokens": 3108,
"outputTokens": 7,
- "latencyMs": 1047.243833000015
+ "latencyMs": 1283.3860000000568
+ },
+ {
+ "questionId": "q125",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "1811",
+ "actual": "1560",
+ "isCorrect": false,
+ "inputTokens": 3807,
+ "outputTokens": 4,
+ "latencyMs": 1390.9544999999925
},
{
"questionId": "q126",
@@ -13757,8 +20632,8 @@
"actual": "42",
"isCorrect": true,
"inputTokens": 3709,
- "outputTokens": 1735,
- "latencyMs": 14875.021707999986
+ "outputTokens": 1671,
+ "latencyMs": 18722.413541999995
},
{
"questionId": "q126",
@@ -13769,7 +20644,18 @@
"isCorrect": true,
"inputTokens": 4078,
"outputTokens": 5,
- "latencyMs": 1076.5694999999832
+ "latencyMs": 957.5536249999423
+ },
+ {
+ "questionId": "q126",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "42",
+ "actual": "47",
+ "isCorrect": false,
+ "inputTokens": 4779,
+ "outputTokens": 2,
+ "latencyMs": 1718.3615829999326
},
{
"questionId": "q126",
@@ -13779,8 +20665,8 @@
"actual": "42",
"isCorrect": true,
"inputTokens": 1560,
- "outputTokens": 2823,
- "latencyMs": 22604.422416999994
+ "outputTokens": 2439,
+ "latencyMs": 20739.166833000025
},
{
"questionId": "q126",
@@ -13791,7 +20677,18 @@
"isCorrect": true,
"inputTokens": 1507,
"outputTokens": 5,
- "latencyMs": 1451.705666999973
+ "latencyMs": 1305.5439999999944
+ },
+ {
+ "questionId": "q126",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "42",
+ "actual": "42",
+ "isCorrect": true,
+ "inputTokens": 2266,
+ "outputTokens": 2,
+ "latencyMs": 13351.089582999935
},
{
"questionId": "q126",
@@ -13801,8 +20698,8 @@
"actual": "42",
"isCorrect": true,
"inputTokens": 1438,
- "outputTokens": 2183,
- "latencyMs": 16916.007042000012
+ "outputTokens": 2567,
+ "latencyMs": 23067.457167000044
},
{
"questionId": "q126",
@@ -13813,29 +20710,51 @@
"isCorrect": true,
"inputTokens": 1443,
"outputTokens": 5,
- "latencyMs": 1103.1098750000237
+ "latencyMs": 1073.1606669999892
},
{
"questionId": "q126",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "42",
+ "actual": "42",
+ "isCorrect": true,
+ "inputTokens": 2203,
+ "outputTokens": 2,
+ "latencyMs": 22770.808125000098
+ },
+ {
+ "questionId": "q126",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "42",
"actual": "42",
"isCorrect": true,
- "inputTokens": 3826,
- "outputTokens": 2055,
- "latencyMs": 17162.629124999978
+ "inputTokens": 4420,
+ "outputTokens": 2439,
+ "latencyMs": 28125.872208000044
},
{
"questionId": "q126",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "42",
- "actual": "47",
+ "actual": "54",
"isCorrect": false,
- "inputTokens": 3413,
+ "inputTokens": 4785,
"outputTokens": 5,
- "latencyMs": 1150.0435000000289
+ "latencyMs": 1046.3992919999873
+ },
+ {
+ "questionId": "q126",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "42",
+ "actual": "42",
+ "isCorrect": true,
+ "inputTokens": 5426,
+ "outputTokens": 2,
+ "latencyMs": 12982.094000000041
},
{
"questionId": "q126",
@@ -13845,8 +20764,8 @@
"actual": "42",
"isCorrect": true,
"inputTokens": 2982,
- "outputTokens": 1607,
- "latencyMs": 14835.323333000008
+ "outputTokens": 2631,
+ "latencyMs": 31181.451875000028
},
{
"questionId": "q126",
@@ -13857,7 +20776,18 @@
"isCorrect": false,
"inputTokens": 3108,
"outputTokens": 5,
- "latencyMs": 1206.8219590000226
+ "latencyMs": 1418.826708000037
+ },
+ {
+ "questionId": "q126",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "42",
+ "actual": "49",
+ "isCorrect": false,
+ "inputTokens": 3809,
+ "outputTokens": 2,
+ "latencyMs": 2009.2083750000456
},
{
"questionId": "q127",
@@ -13867,8 +20797,8 @@
"actual": "28",
"isCorrect": true,
"inputTokens": 3709,
- "outputTokens": 1479,
- "latencyMs": 11560.967958000023
+ "outputTokens": 2503,
+ "latencyMs": 26827.34341699991
},
{
"questionId": "q127",
@@ -13879,7 +20809,18 @@
"isCorrect": false,
"inputTokens": 4078,
"outputTokens": 5,
- "latencyMs": 1151.9984169999952
+ "latencyMs": 1093.9559999998892
+ },
+ {
+ "questionId": "q127",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "28",
+ "actual": "28",
+ "isCorrect": true,
+ "inputTokens": 4779,
+ "outputTokens": 2,
+ "latencyMs": 18861.496042000013
},
{
"questionId": "q127",
@@ -13889,8 +20830,8 @@
"actual": "28",
"isCorrect": true,
"inputTokens": 1560,
- "outputTokens": 1927,
- "latencyMs": 15431.08262499998
+ "outputTokens": 1799,
+ "latencyMs": 18378.229374999995
},
{
"questionId": "q127",
@@ -13901,7 +20842,18 @@
"isCorrect": false,
"inputTokens": 1507,
"outputTokens": 5,
- "latencyMs": 1032.7485419999575
+ "latencyMs": 1111.1742920000106
+ },
+ {
+ "questionId": "q127",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "28",
+ "actual": "28",
+ "isCorrect": true,
+ "inputTokens": 2266,
+ "outputTokens": 2,
+ "latencyMs": 12380.956957999966
},
{
"questionId": "q127",
@@ -13911,8 +20863,8 @@
"actual": "28",
"isCorrect": true,
"inputTokens": 1438,
- "outputTokens": 1607,
- "latencyMs": 9425.883957999991
+ "outputTokens": 2055,
+ "latencyMs": 112325.29683300003
},
{
"questionId": "q127",
@@ -13923,29 +20875,51 @@
"isCorrect": false,
"inputTokens": 1443,
"outputTokens": 5,
- "latencyMs": 943.5942919999943
+ "latencyMs": 1231.2409169999883
},
{
"questionId": "q127",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "28",
+ "actual": "28",
+ "isCorrect": true,
+ "inputTokens": 2203,
+ "outputTokens": 2,
+ "latencyMs": 20394.07720900001
+ },
+ {
+ "questionId": "q127",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "28",
"actual": "28",
"isCorrect": true,
- "inputTokens": 3826,
- "outputTokens": 1927,
- "latencyMs": 16529.66529199999
+ "inputTokens": 4420,
+ "outputTokens": 1799,
+ "latencyMs": 22818.38325000007
},
{
"questionId": "q127",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "28",
"actual": "24",
"isCorrect": false,
- "inputTokens": 3413,
+ "inputTokens": 4785,
"outputTokens": 5,
- "latencyMs": 1107.5635419999599
+ "latencyMs": 1324.3675420000218
+ },
+ {
+ "questionId": "q127",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "28",
+ "actual": "28",
+ "isCorrect": true,
+ "inputTokens": 5426,
+ "outputTokens": 2,
+ "latencyMs": 14308.32895799994
},
{
"questionId": "q127",
@@ -13955,8 +20929,8 @@
"actual": "28",
"isCorrect": true,
"inputTokens": 2982,
- "outputTokens": 1863,
- "latencyMs": 21071.067082999973
+ "outputTokens": 2055,
+ "latencyMs": 22493.268166999915
},
{
"questionId": "q127",
@@ -13967,7 +20941,18 @@
"isCorrect": false,
"inputTokens": 3108,
"outputTokens": 5,
- "latencyMs": 1018.46212500002
+ "latencyMs": 1449.5348340000492
+ },
+ {
+ "questionId": "q127",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "28",
+ "actual": "31",
+ "isCorrect": false,
+ "inputTokens": 3809,
+ "outputTokens": 2,
+ "latencyMs": 1329.5626659999834
},
{
"questionId": "q128",
@@ -13977,8 +20962,8 @@
"actual": "11",
"isCorrect": true,
"inputTokens": 3709,
- "outputTokens": 1223,
- "latencyMs": 8242.37608300004
+ "outputTokens": 2183,
+ "latencyMs": 20410.59154199995
},
{
"questionId": "q128",
@@ -13989,7 +20974,18 @@
"isCorrect": true,
"inputTokens": 4078,
"outputTokens": 5,
- "latencyMs": 1052.7201249999925
+ "latencyMs": 1137.8916250000475
+ },
+ {
+ "questionId": "q128",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 4779,
+ "outputTokens": 2,
+ "latencyMs": 15306.355875000008
},
{
"questionId": "q128",
@@ -13999,8 +20995,8 @@
"actual": "11",
"isCorrect": true,
"inputTokens": 1560,
- "outputTokens": 903,
- "latencyMs": 5430.806291999994
+ "outputTokens": 967,
+ "latencyMs": 9355.326041999971
},
{
"questionId": "q128",
@@ -14011,7 +21007,18 @@
"isCorrect": false,
"inputTokens": 1507,
"outputTokens": 5,
- "latencyMs": 2354.328999999969
+ "latencyMs": 970.5706669999054
+ },
+ {
+ "questionId": "q128",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2266,
+ "outputTokens": 2,
+ "latencyMs": 12738.58170900005
},
{
"questionId": "q128",
@@ -14021,8 +21028,8 @@
"actual": "11",
"isCorrect": true,
"inputTokens": 1438,
- "outputTokens": 1607,
- "latencyMs": 21944.211458000005
+ "outputTokens": 1095,
+ "latencyMs": 11532.495875000022
},
{
"questionId": "q128",
@@ -14033,29 +21040,51 @@
"isCorrect": true,
"inputTokens": 1443,
"outputTokens": 5,
- "latencyMs": 1249.9959590000217
+ "latencyMs": 1092.326875000028
},
{
"questionId": "q128",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 2203,
+ "outputTokens": 2,
+ "latencyMs": 9477.962708000094
+ },
+ {
+ "questionId": "q128",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "11",
"actual": "11",
"isCorrect": true,
- "inputTokens": 3826,
- "outputTokens": 1415,
- "latencyMs": 15465.409875000012
+ "inputTokens": 4420,
+ "outputTokens": 1287,
+ "latencyMs": 12363.918167000054
},
{
"questionId": "q128",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "11",
"actual": "11",
"isCorrect": true,
- "inputTokens": 3413,
+ "inputTokens": 4785,
"outputTokens": 5,
- "latencyMs": 1131.9575830000103
+ "latencyMs": 1086.439250000054
+ },
+ {
+ "questionId": "q128",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 5426,
+ "outputTokens": 2,
+ "latencyMs": 13847.167500000098
},
{
"questionId": "q128",
@@ -14065,8 +21094,8 @@
"actual": "11",
"isCorrect": true,
"inputTokens": 2982,
- "outputTokens": 2503,
- "latencyMs": 24744.971958999988
+ "outputTokens": 1607,
+ "latencyMs": 18025.304333999986
},
{
"questionId": "q128",
@@ -14077,7 +21106,18 @@
"isCorrect": true,
"inputTokens": 3108,
"outputTokens": 5,
- "latencyMs": 1274.6952499999898
+ "latencyMs": 1525.7963329999475
+ },
+ {
+ "questionId": "q128",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "11",
+ "actual": "11",
+ "isCorrect": true,
+ "inputTokens": 3809,
+ "outputTokens": 2,
+ "latencyMs": 11297.281415999983
},
{
"questionId": "q129",
@@ -14087,8 +21127,8 @@
"actual": "58",
"isCorrect": true,
"inputTokens": 3708,
- "outputTokens": 1351,
- "latencyMs": 12546.867542000022
+ "outputTokens": 1607,
+ "latencyMs": 16793.02033300011
},
{
"questionId": "q129",
@@ -14099,7 +21139,18 @@
"isCorrect": false,
"inputTokens": 4078,
"outputTokens": 5,
- "latencyMs": 1231.453749999986
+ "latencyMs": 1524.2867090000072
+ },
+ {
+ "questionId": "q129",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "58",
+ "actual": "58",
+ "isCorrect": true,
+ "inputTokens": 4777,
+ "outputTokens": 2,
+ "latencyMs": 20291.370166999986
},
{
"questionId": "q129",
@@ -14109,8 +21160,8 @@
"actual": "58",
"isCorrect": true,
"inputTokens": 1559,
- "outputTokens": 1543,
- "latencyMs": 16593.402166999993
+ "outputTokens": 2631,
+ "latencyMs": 31767.777667000075
},
{
"questionId": "q129",
@@ -14121,7 +21172,18 @@
"isCorrect": false,
"inputTokens": 1507,
"outputTokens": 5,
- "latencyMs": 1079.0991659999709
+ "latencyMs": 1128.108874999918
+ },
+ {
+ "questionId": "q129",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "58",
+ "actual": "58",
+ "isCorrect": true,
+ "inputTokens": 2264,
+ "outputTokens": 2,
+ "latencyMs": 17774.151832999894
},
{
"questionId": "q129",
@@ -14131,8 +21193,8 @@
"actual": "58",
"isCorrect": true,
"inputTokens": 1437,
- "outputTokens": 1543,
- "latencyMs": 10956.456084000005
+ "outputTokens": 2887,
+ "latencyMs": 24058.048583999975
},
{
"questionId": "q129",
@@ -14143,249 +21205,381 @@
"isCorrect": false,
"inputTokens": 1443,
"outputTokens": 5,
- "latencyMs": 2018.3774170000106
+ "latencyMs": 833.2049999999581
},
{
"questionId": "q129",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "58",
+ "actual": "58",
+ "isCorrect": true,
+ "inputTokens": 2201,
+ "outputTokens": 2,
+ "latencyMs": 7901.533541000099
+ },
+ {
+ "questionId": "q129",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "58",
"actual": "58",
"isCorrect": true,
- "inputTokens": 3825,
- "outputTokens": 1351,
- "latencyMs": 10537.598500000022
- },
- {
- "questionId": "q129",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "58",
- "actual": "47",
- "isCorrect": false,
- "inputTokens": 3413,
- "outputTokens": 5,
- "latencyMs": 1039.2452080000076
- },
- {
- "questionId": "q129",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "58",
- "actual": "58",
- "isCorrect": true,
- "inputTokens": 2981,
- "outputTokens": 839,
- "latencyMs": 8039.237708000001
- },
- {
- "questionId": "q129",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "58",
- "actual": "54",
- "isCorrect": false,
- "inputTokens": 3108,
- "outputTokens": 5,
- "latencyMs": 1264.6740829999908
- },
- {
- "questionId": "q130",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "41",
- "actual": "41",
- "isCorrect": true,
- "inputTokens": 3708,
- "outputTokens": 1863,
- "latencyMs": 14310.697374999989
- },
- {
- "questionId": "q130",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "41",
- "actual": "31",
- "isCorrect": false,
- "inputTokens": 4078,
- "outputTokens": 5,
- "latencyMs": 1138.4443339999998
- },
- {
- "questionId": "q130",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "41",
- "actual": "41",
- "isCorrect": true,
- "inputTokens": 1559,
- "outputTokens": 1927,
- "latencyMs": 16487.508375000034
- },
- {
- "questionId": "q130",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "41",
- "actual": "38",
- "isCorrect": false,
- "inputTokens": 1507,
- "outputTokens": 5,
- "latencyMs": 1104.2365410000202
- },
- {
- "questionId": "q130",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "41",
- "actual": "41",
- "isCorrect": true,
- "inputTokens": 1437,
- "outputTokens": 3015,
- "latencyMs": 23688.737208999984
- },
- {
- "questionId": "q130",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "41",
- "actual": "38",
- "isCorrect": false,
- "inputTokens": 1443,
- "outputTokens": 5,
- "latencyMs": 1026.8166249999776
- },
- {
- "questionId": "q130",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "41",
- "actual": "41",
- "isCorrect": true,
- "inputTokens": 3825,
- "outputTokens": 1671,
- "latencyMs": 12415.87070899998
- },
- {
- "questionId": "q130",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "41",
- "actual": "31",
- "isCorrect": false,
- "inputTokens": 3413,
- "outputTokens": 5,
- "latencyMs": 1062.2278749999823
- },
- {
- "questionId": "q130",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "41",
- "actual": "41",
- "isCorrect": true,
- "inputTokens": 2981,
- "outputTokens": 1799,
- "latencyMs": 15901.829415999993
- },
- {
- "questionId": "q130",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "41",
- "actual": "31",
- "isCorrect": false,
- "inputTokens": 3108,
- "outputTokens": 5,
- "latencyMs": 1051.6962910000002
- },
- {
- "questionId": "q131",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "23",
- "actual": "23",
- "isCorrect": true,
- "inputTokens": 3708,
- "outputTokens": 1863,
- "latencyMs": 15216.926500000001
- },
- {
- "questionId": "q131",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "23",
- "actual": "20",
- "isCorrect": false,
- "inputTokens": 4078,
- "outputTokens": 5,
- "latencyMs": 1460.9212079999852
- },
- {
- "questionId": "q131",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "23",
- "actual": "23",
- "isCorrect": true,
- "inputTokens": 1559,
- "outputTokens": 2567,
- "latencyMs": 27103.083999999973
- },
- {
- "questionId": "q131",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "23",
- "actual": "20",
- "isCorrect": false,
- "inputTokens": 1507,
- "outputTokens": 5,
- "latencyMs": 1101.5416669999831
- },
- {
- "questionId": "q131",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "23",
- "actual": "23",
- "isCorrect": true,
- "inputTokens": 1437,
- "outputTokens": 1543,
- "latencyMs": 14598.558207999973
- },
- {
- "questionId": "q131",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "23",
- "actual": "20",
- "isCorrect": false,
- "inputTokens": 1443,
- "outputTokens": 5,
- "latencyMs": 1270.7722910000011
- },
- {
- "questionId": "q131",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "23",
- "actual": "23",
- "isCorrect": true,
- "inputTokens": 3825,
+ "inputTokens": 4419,
"outputTokens": 1415,
- "latencyMs": 14102.604708999977
+ "latencyMs": 13345.296500000055
+ },
+ {
+ "questionId": "q129",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "58",
+ "actual": "54",
+ "isCorrect": false,
+ "inputTokens": 4785,
+ "outputTokens": 5,
+ "latencyMs": 1001.3450419999426
+ },
+ {
+ "questionId": "q129",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "58",
+ "actual": "55",
+ "isCorrect": false,
+ "inputTokens": 5424,
+ "outputTokens": 2,
+ "latencyMs": 2326.790707999957
+ },
+ {
+ "questionId": "q129",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "58",
+ "actual": "58",
+ "isCorrect": true,
+ "inputTokens": 2981,
+ "outputTokens": 1287,
+ "latencyMs": 14444.245874999906
+ },
+ {
+ "questionId": "q129",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "58",
+ "actual": "54",
+ "isCorrect": false,
+ "inputTokens": 3108,
+ "outputTokens": 5,
+ "latencyMs": 1060.1971249999478
+ },
+ {
+ "questionId": "q129",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "58",
+ "actual": "59",
+ "isCorrect": false,
+ "inputTokens": 3807,
+ "outputTokens": 2,
+ "latencyMs": 2816.4778749999823
+ },
+ {
+ "questionId": "q130",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 3708,
+ "outputTokens": 3015,
+ "latencyMs": 190630.39133400004
+ },
+ {
+ "questionId": "q130",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "41",
+ "actual": "31",
+ "isCorrect": false,
+ "inputTokens": 4078,
+ "outputTokens": 5,
+ "latencyMs": 5375.239707999979
+ },
+ {
+ "questionId": "q130",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 4777,
+ "outputTokens": 2,
+ "latencyMs": 19789.381042000023
+ },
+ {
+ "questionId": "q130",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 1559,
+ "outputTokens": 2055,
+ "latencyMs": 16472.23841599992
+ },
+ {
+ "questionId": "q130",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "41",
+ "actual": "38",
+ "isCorrect": false,
+ "inputTokens": 1507,
+ "outputTokens": 5,
+ "latencyMs": 1042.922583000036
+ },
+ {
+ "questionId": "q130",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 2264,
+ "outputTokens": 2,
+ "latencyMs": 13095.397083000047
+ },
+ {
+ "questionId": "q130",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 1437,
+ "outputTokens": 2311,
+ "latencyMs": 26893.475125000114
+ },
+ {
+ "questionId": "q130",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "41",
+ "actual": "38",
+ "isCorrect": false,
+ "inputTokens": 1443,
+ "outputTokens": 5,
+ "latencyMs": 1042.875250000041
+ },
+ {
+ "questionId": "q130",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 2201,
+ "outputTokens": 2,
+ "latencyMs": 28097.87474999996
+ },
+ {
+ "questionId": "q130",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "41",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 4419,
+ "outputTokens": 1735,
+ "latencyMs": 14091.963709000032
+ },
+ {
+ "questionId": "q130",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "41",
+ "actual": "31",
+ "isCorrect": false,
+ "inputTokens": 4785,
+ "outputTokens": 5,
+ "latencyMs": 1151.6397919999436
+ },
+ {
+ "questionId": "q130",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 5424,
+ "outputTokens": 2,
+ "latencyMs": 15769.612874999992
+ },
+ {
+ "questionId": "q130",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 2981,
+ "outputTokens": 1799,
+ "latencyMs": 18804.838290999993
+ },
+ {
+ "questionId": "q130",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "41",
+ "actual": "31",
+ "isCorrect": false,
+ "inputTokens": 3108,
+ "outputTokens": 5,
+ "latencyMs": 1030.810417000088
+ },
+ {
+ "questionId": "q130",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "41",
+ "actual": "41",
+ "isCorrect": true,
+ "inputTokens": 3807,
+ "outputTokens": 2,
+ "latencyMs": 14482.474917000043
},
{
"questionId": "q131",
- "format": "markdown-kv",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 3708,
+ "outputTokens": 1351,
+ "latencyMs": 21887.844958
+ },
+ {
+ "questionId": "q131",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "23",
+ "actual": "20",
+ "isCorrect": false,
+ "inputTokens": 4078,
+ "outputTokens": 5,
+ "latencyMs": 1332.5089160000207
+ },
+ {
+ "questionId": "q131",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 4777,
+ "outputTokens": 2,
+ "latencyMs": 17226.03358399996
+ },
+ {
+ "questionId": "q131",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 1559,
+ "outputTokens": 2055,
+ "latencyMs": 20772.763792000012
+ },
+ {
+ "questionId": "q131",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "23",
+ "actual": "20",
+ "isCorrect": false,
+ "inputTokens": 1507,
+ "outputTokens": 5,
+ "latencyMs": 966.6354170000413
+ },
+ {
+ "questionId": "q131",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 2264,
+ "outputTokens": 2,
+ "latencyMs": 10442.985291999998
+ },
+ {
+ "questionId": "q131",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 1437,
+ "outputTokens": 1095,
+ "latencyMs": 10072.030124999932
+ },
+ {
+ "questionId": "q131",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "23",
+ "actual": "20",
+ "isCorrect": false,
+ "inputTokens": 1443,
+ "outputTokens": 5,
+ "latencyMs": 1233.0955420000246
+ },
+ {
+ "questionId": "q131",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 2201,
+ "outputTokens": 2,
+ "latencyMs": 18590.031917000073
+ },
+ {
+ "questionId": "q131",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 4419,
+ "outputTokens": 1735,
+ "latencyMs": 17035.41470799991
+ },
+ {
+ "questionId": "q131",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "23",
"actual": "21",
"isCorrect": false,
- "inputTokens": 3413,
+ "inputTokens": 4785,
"outputTokens": 5,
- "latencyMs": 1251.4159170000348
+ "latencyMs": 994.0176249999786
+ },
+ {
+ "questionId": "q131",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 5424,
+ "outputTokens": 2,
+ "latencyMs": 12477.123250000062
},
{
"questionId": "q131",
@@ -14395,8 +21589,8 @@
"actual": "23",
"isCorrect": true,
"inputTokens": 2981,
- "outputTokens": 1799,
- "latencyMs": 18696.684999999998
+ "outputTokens": 1479,
+ "latencyMs": 14346.053416999988
},
{
"questionId": "q131",
@@ -14407,7 +21601,18 @@
"isCorrect": false,
"inputTokens": 3108,
"outputTokens": 5,
- "latencyMs": 1170.9401669999934
+ "latencyMs": 1269.5552920000628
+ },
+ {
+ "questionId": "q131",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "23",
+ "actual": "23",
+ "isCorrect": true,
+ "inputTokens": 3807,
+ "outputTokens": 2,
+ "latencyMs": 13739.479209000012
},
{
"questionId": "q132",
@@ -14418,7 +21623,7 @@
"isCorrect": true,
"inputTokens": 15187,
"outputTokens": 136,
- "latencyMs": 2872.1482499999693
+ "latencyMs": 3680.113916000002
},
{
"questionId": "q132",
@@ -14429,7 +21634,18 @@
"isCorrect": true,
"inputTokens": 17409,
"outputTokens": 6,
- "latencyMs": 1382.586333000043
+ "latencyMs": 1548.528917000047
+ },
+ {
+ "questionId": "q132",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "430828",
+ "actual": "430828",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 6,
+ "latencyMs": 1637.454792000004
},
{
"questionId": "q132",
@@ -14439,8 +21655,8 @@
"actual": "430828",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 904,
- "latencyMs": 9130.657125000027
+ "outputTokens": 776,
+ "latencyMs": 8918.199665999971
},
{
"questionId": "q132",
@@ -14451,7 +21667,18 @@
"isCorrect": true,
"inputTokens": 9279,
"outputTokens": 6,
- "latencyMs": 1164.3372080000117
+ "latencyMs": 1900.8446669999976
+ },
+ {
+ "questionId": "q132",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "430828",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12337,
+ "outputTokens": 1,
+ "latencyMs": 2677.7128749999683
},
{
"questionId": "q132",
@@ -14461,8 +21688,8 @@
"actual": "430828",
"isCorrect": true,
"inputTokens": 8556,
- "outputTokens": 648,
- "latencyMs": 7763.659999999974
+ "outputTokens": 712,
+ "latencyMs": 10733.462500000023
},
{
"questionId": "q132",
@@ -14473,29 +21700,51 @@
"isCorrect": true,
"inputTokens": 9125,
"outputTokens": 6,
- "latencyMs": 1331.3139999999548
+ "latencyMs": 1135.363000000012
},
{
"questionId": "q132",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "430828",
+ "actual": "430828",
+ "isCorrect": true,
+ "inputTokens": 12207,
+ "outputTokens": 6,
+ "latencyMs": 1007.8897500000894
+ },
+ {
+ "questionId": "q132",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "430828",
"actual": "430828",
"isCorrect": true,
- "inputTokens": 15481,
- "outputTokens": 584,
- "latencyMs": 9411.661499999987
+ "inputTokens": 17138,
+ "outputTokens": 328,
+ "latencyMs": 7708.789500000072
},
{
"questionId": "q132",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "430828",
"actual": "430828",
"isCorrect": true,
- "inputTokens": 15367,
+ "inputTokens": 19804,
"outputTokens": 6,
- "latencyMs": 1272.1991249999846
+ "latencyMs": 1477.8527500000782
+ },
+ {
+ "questionId": "q132",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "430828",
+ "actual": "430828",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 6,
+ "latencyMs": 2380.750500000082
},
{
"questionId": "q132",
@@ -14505,8 +21754,8 @@
"actual": "430828",
"isCorrect": true,
"inputTokens": 13171,
- "outputTokens": 200,
- "latencyMs": 3587.8712090000045
+ "outputTokens": 328,
+ "latencyMs": 9429.131750000059
},
{
"questionId": "q132",
@@ -14517,7 +21766,18 @@
"isCorrect": true,
"inputTokens": 14483,
"outputTokens": 6,
- "latencyMs": 1710.5899999999674
+ "latencyMs": 1359.2385419999482
+ },
+ {
+ "questionId": "q132",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "430828",
+ "actual": "430828",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 6,
+ "latencyMs": 1939.293042000034
},
{
"questionId": "q133",
@@ -14527,8 +21787,8 @@
"actual": "11798",
"isCorrect": true,
"inputTokens": 15189,
- "outputTokens": 328,
- "latencyMs": 3625.780167000019
+ "outputTokens": 392,
+ "latencyMs": 6479.065457999939
},
{
"questionId": "q133",
@@ -14539,7 +21799,18 @@
"isCorrect": true,
"inputTokens": 17410,
"outputTokens": 6,
- "latencyMs": 1785.2782080000034
+ "latencyMs": 1155.017041999963
+ },
+ {
+ "questionId": "q133",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "11798",
+ "actual": "11798",
+ "isCorrect": true,
+ "inputTokens": 19992,
+ "outputTokens": 5,
+ "latencyMs": 2049.621832999983
},
{
"questionId": "q133",
@@ -14549,8 +21820,8 @@
"actual": "11798",
"isCorrect": true,
"inputTokens": 8790,
- "outputTokens": 712,
- "latencyMs": 6381.770374999964
+ "outputTokens": 648,
+ "latencyMs": 11672.019874999998
},
{
"questionId": "q133",
@@ -14561,7 +21832,18 @@
"isCorrect": true,
"inputTokens": 9280,
"outputTokens": 6,
- "latencyMs": 1352.5436660000123
+ "latencyMs": 1597.3725000000559
+ },
+ {
+ "questionId": "q133",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "11798",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12338,
+ "outputTokens": 1,
+ "latencyMs": 11414.63520800008
},
{
"questionId": "q133",
@@ -14571,8 +21853,8 @@
"actual": "11798",
"isCorrect": true,
"inputTokens": 8558,
- "outputTokens": 520,
- "latencyMs": 27916.417874999985
+ "outputTokens": 584,
+ "latencyMs": 15138.947667
},
{
"questionId": "q133",
@@ -14583,29 +21865,51 @@
"isCorrect": true,
"inputTokens": 9126,
"outputTokens": 6,
- "latencyMs": 2073.8068330000388
+ "latencyMs": 1173.9259160000365
},
{
"questionId": "q133",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "11798",
+ "actual": "11798",
+ "isCorrect": true,
+ "inputTokens": 12208,
+ "outputTokens": 5,
+ "latencyMs": 2788.6645000000717
+ },
+ {
+ "questionId": "q133",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "11798",
"actual": "11798",
"isCorrect": true,
- "inputTokens": 15483,
+ "inputTokens": 17140,
"outputTokens": 328,
- "latencyMs": 5943.872542000026
+ "latencyMs": 4541.789875000017
},
{
"questionId": "q133",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "11798",
"actual": "11798",
"isCorrect": true,
- "inputTokens": 15368,
+ "inputTokens": 19805,
"outputTokens": 6,
- "latencyMs": 1767.4393339999951
+ "latencyMs": 1787.0144160001073
+ },
+ {
+ "questionId": "q133",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "11798",
+ "actual": "11798",
+ "isCorrect": true,
+ "inputTokens": 21882,
+ "outputTokens": 5,
+ "latencyMs": 3930.188833000022
},
{
"questionId": "q133",
@@ -14616,7 +21920,7 @@
"isCorrect": true,
"inputTokens": 13173,
"outputTokens": 264,
- "latencyMs": 3115.895124999981
+ "latencyMs": 4459.655541999964
},
{
"questionId": "q133",
@@ -14627,7 +21931,18 @@
"isCorrect": true,
"inputTokens": 14484,
"outputTokens": 6,
- "latencyMs": 1183.2249999999767
+ "latencyMs": 1239.003000000026
+ },
+ {
+ "questionId": "q133",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "11798",
+ "actual": "11798",
+ "isCorrect": true,
+ "inputTokens": 17077,
+ "outputTokens": 5,
+ "latencyMs": 4828.425707999966
},
{
"questionId": "q134",
@@ -14637,8 +21952,8 @@
"actual": "183631",
"isCorrect": true,
"inputTokens": 15192,
- "outputTokens": 392,
- "latencyMs": 4991.646125000028
+ "outputTokens": 200,
+ "latencyMs": 4039.568958000047
},
{
"questionId": "q134",
@@ -14649,7 +21964,18 @@
"isCorrect": true,
"inputTokens": 17412,
"outputTokens": 6,
- "latencyMs": 1835.4077919999836
+ "latencyMs": 1455.9585000000661
+ },
+ {
+ "questionId": "q134",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "183631",
+ "actual": "183631",
+ "isCorrect": true,
+ "inputTokens": 19995,
+ "outputTokens": 6,
+ "latencyMs": 1600.7708750000456
},
{
"questionId": "q134",
@@ -14659,8 +21985,8 @@
"actual": "183631",
"isCorrect": true,
"inputTokens": 8793,
- "outputTokens": 712,
- "latencyMs": 7788.013291999989
+ "outputTokens": 456,
+ "latencyMs": 5973.896042000037
},
{
"questionId": "q134",
@@ -14671,7 +21997,18 @@
"isCorrect": true,
"inputTokens": 9282,
"outputTokens": 6,
- "latencyMs": 1082.4066669999738
+ "latencyMs": 2000.6470419999678
+ },
+ {
+ "questionId": "q134",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "183631",
+ "actual": "183631",
+ "isCorrect": true,
+ "inputTokens": 12341,
+ "outputTokens": 6,
+ "latencyMs": 2543.431542000035
},
{
"questionId": "q134",
@@ -14681,8 +22018,8 @@
"actual": "183631",
"isCorrect": true,
"inputTokens": 8561,
- "outputTokens": 520,
- "latencyMs": 5664.896500000032
+ "outputTokens": 648,
+ "latencyMs": 6973.037040999974
},
{
"questionId": "q134",
@@ -14693,29 +22030,51 @@
"isCorrect": true,
"inputTokens": 9128,
"outputTokens": 6,
- "latencyMs": 1215.8875830000034
+ "latencyMs": 1655.6718330000294
},
{
"questionId": "q134",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "183631",
+ "actual": "183631",
+ "isCorrect": true,
+ "inputTokens": 12211,
+ "outputTokens": 6,
+ "latencyMs": 2357.3444590000436
+ },
+ {
+ "questionId": "q134",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "183631",
"actual": "183631",
"isCorrect": true,
- "inputTokens": 15486,
- "outputTokens": 456,
- "latencyMs": 5141.449292000034
+ "inputTokens": 17143,
+ "outputTokens": 392,
+ "latencyMs": 6136.790167000028
},
{
"questionId": "q134",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "183631",
"actual": "183631",
"isCorrect": true,
- "inputTokens": 15370,
+ "inputTokens": 19807,
"outputTokens": 6,
- "latencyMs": 1483.2090420000022
+ "latencyMs": 2510.24762499996
+ },
+ {
+ "questionId": "q134",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "183631",
+ "actual": "183631",
+ "isCorrect": true,
+ "inputTokens": 21885,
+ "outputTokens": 6,
+ "latencyMs": 1737.0276670000749
},
{
"questionId": "q134",
@@ -14725,8 +22084,8 @@
"actual": "183631",
"isCorrect": true,
"inputTokens": 13176,
- "outputTokens": 328,
- "latencyMs": 7532.760624999995
+ "outputTokens": 520,
+ "latencyMs": 5081.17487499991
},
{
"questionId": "q134",
@@ -14737,7 +22096,18 @@
"isCorrect": true,
"inputTokens": 14486,
"outputTokens": 6,
- "latencyMs": 1458.0657500000088
+ "latencyMs": 1191.4632079999428
+ },
+ {
+ "questionId": "q134",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "183631",
+ "actual": "183631",
+ "isCorrect": true,
+ "inputTokens": 17080,
+ "outputTokens": 6,
+ "latencyMs": 1325.217249999987
},
{
"questionId": "q135",
@@ -14747,8 +22117,8 @@
"actual": "29246",
"isCorrect": true,
"inputTokens": 15191,
- "outputTokens": 392,
- "latencyMs": 7922.4705829999875
+ "outputTokens": 328,
+ "latencyMs": 3314.1483749999898
},
{
"questionId": "q135",
@@ -14759,7 +22129,18 @@
"isCorrect": true,
"inputTokens": 17412,
"outputTokens": 6,
- "latencyMs": 1510.0054579999996
+ "latencyMs": 1204.2171249999665
+ },
+ {
+ "questionId": "q135",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "29246",
+ "actual": "29246",
+ "isCorrect": true,
+ "inputTokens": 19994,
+ "outputTokens": 5,
+ "latencyMs": 2558.019417000003
},
{
"questionId": "q135",
@@ -14769,8 +22150,8 @@
"actual": "29246",
"isCorrect": true,
"inputTokens": 8792,
- "outputTokens": 776,
- "latencyMs": 8475.77466699999
+ "outputTokens": 968,
+ "latencyMs": 11319.296415999997
},
{
"questionId": "q135",
@@ -14781,7 +22162,18 @@
"isCorrect": true,
"inputTokens": 9282,
"outputTokens": 6,
- "latencyMs": 1203.3620419999934
+ "latencyMs": 1324.4548749999376
+ },
+ {
+ "questionId": "q135",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "29246",
+ "actual": "29246",
+ "isCorrect": true,
+ "inputTokens": 12340,
+ "outputTokens": 5,
+ "latencyMs": 2740.4004170000553
},
{
"questionId": "q135",
@@ -14791,8 +22183,8 @@
"actual": "29246",
"isCorrect": true,
"inputTokens": 8560,
- "outputTokens": 776,
- "latencyMs": 7283.84258300002
+ "outputTokens": 392,
+ "latencyMs": 7471.323291999986
},
{
"questionId": "q135",
@@ -14803,29 +22195,51 @@
"isCorrect": true,
"inputTokens": 9128,
"outputTokens": 6,
- "latencyMs": 1365.2434169999906
+ "latencyMs": 1267.6016660000896
},
{
"questionId": "q135",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "29246",
+ "actual": "29246",
+ "isCorrect": true,
+ "inputTokens": 12210,
+ "outputTokens": 5,
+ "latencyMs": 28672.12370799994
+ },
+ {
+ "questionId": "q135",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "29246",
"actual": "29246",
"isCorrect": true,
- "inputTokens": 15485,
- "outputTokens": 520,
- "latencyMs": 5846.538916999998
+ "inputTokens": 17142,
+ "outputTokens": 392,
+ "latencyMs": 12836.502833000035
},
{
"questionId": "q135",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "29246",
"actual": "29246",
"isCorrect": true,
- "inputTokens": 15370,
+ "inputTokens": 19807,
"outputTokens": 6,
- "latencyMs": 1203.6220829999656
+ "latencyMs": 2346.9032910000533
+ },
+ {
+ "questionId": "q135",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "29246",
+ "actual": "29246",
+ "isCorrect": true,
+ "inputTokens": 21884,
+ "outputTokens": 5,
+ "latencyMs": 2969.614082999993
},
{
"questionId": "q135",
@@ -14835,8 +22249,8 @@
"actual": "29246",
"isCorrect": true,
"inputTokens": 13175,
- "outputTokens": 456,
- "latencyMs": 5973.848832999996
+ "outputTokens": 392,
+ "latencyMs": 5687.641541999998
},
{
"questionId": "q135",
@@ -14847,7 +22261,18 @@
"isCorrect": true,
"inputTokens": 14486,
"outputTokens": 6,
- "latencyMs": 1189.811875000014
+ "latencyMs": 1316.798792000045
+ },
+ {
+ "questionId": "q135",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "29246",
+ "actual": "29246",
+ "isCorrect": true,
+ "inputTokens": 17079,
+ "outputTokens": 5,
+ "latencyMs": 2823.280541000073
},
{
"questionId": "q136",
@@ -14857,8 +22282,8 @@
"actual": "135306",
"isCorrect": true,
"inputTokens": 15187,
- "outputTokens": 328,
- "latencyMs": 8872.252957999997
+ "outputTokens": 392,
+ "latencyMs": 5053.899791999953
},
{
"questionId": "q136",
@@ -14869,7 +22294,18 @@
"isCorrect": true,
"inputTokens": 17407,
"outputTokens": 6,
- "latencyMs": 1775.476083000016
+ "latencyMs": 2537.008167000022
+ },
+ {
+ "questionId": "q136",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "135306",
+ "actual": "135306",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 6,
+ "latencyMs": 1954.4713340000017
},
{
"questionId": "q136",
@@ -14879,8 +22315,8 @@
"actual": "135306",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 648,
- "latencyMs": 7149.649291000038
+ "outputTokens": 3208,
+ "latencyMs": 26572.223459
},
{
"questionId": "q136",
@@ -14891,7 +22327,18 @@
"isCorrect": true,
"inputTokens": 9277,
"outputTokens": 6,
- "latencyMs": 1577.2079999999842
+ "latencyMs": 1112.2888329999987
+ },
+ {
+ "questionId": "q136",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "135306",
+ "actual": "135306",
+ "isCorrect": true,
+ "inputTokens": 12337,
+ "outputTokens": 6,
+ "latencyMs": 2422.114500000025
},
{
"questionId": "q136",
@@ -14901,8 +22348,8 @@
"actual": "135306",
"isCorrect": true,
"inputTokens": 8556,
- "outputTokens": 1288,
- "latencyMs": 11344.462834000005
+ "outputTokens": 1352,
+ "latencyMs": 15821.266082999995
},
{
"questionId": "q136",
@@ -14913,29 +22360,51 @@
"isCorrect": true,
"inputTokens": 9123,
"outputTokens": 6,
- "latencyMs": 1340.27887499996
+ "latencyMs": 1033.3786669999827
},
{
"questionId": "q136",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "135306",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12207,
+ "outputTokens": 1,
+ "latencyMs": 1657.3498749999562
+ },
+ {
+ "questionId": "q136",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "135306",
"actual": "135306",
"isCorrect": true,
- "inputTokens": 15481,
- "outputTokens": 392,
- "latencyMs": 6256.696250000037
+ "inputTokens": 17138,
+ "outputTokens": 328,
+ "latencyMs": 4357.477583000087
},
{
"questionId": "q136",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "135306",
"actual": "135306",
"isCorrect": true,
- "inputTokens": 15365,
+ "inputTokens": 19802,
"outputTokens": 6,
- "latencyMs": 1604.6909999999916
+ "latencyMs": 1578.6591250000056
+ },
+ {
+ "questionId": "q136",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "135306",
+ "actual": "135306",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 6,
+ "latencyMs": 16684.568500000052
},
{
"questionId": "q136",
@@ -14945,8 +22414,8 @@
"actual": "135306",
"isCorrect": true,
"inputTokens": 13171,
- "outputTokens": 456,
- "latencyMs": 5982.022666999954
+ "outputTokens": 712,
+ "latencyMs": 7845.738333999994
},
{
"questionId": "q136",
@@ -14957,18 +22426,29 @@
"isCorrect": true,
"inputTokens": 14481,
"outputTokens": 6,
- "latencyMs": 1259.2409589999588
+ "latencyMs": 1408.234832999995
+ },
+ {
+ "questionId": "q136",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "135306",
+ "actual": "135306",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 6,
+ "latencyMs": 3420.9656670000404
},
{
"questionId": "q137",
"format": "json",
"model": "gpt-5-nano",
"expected": "24914",
- "actual": "24914",
- "isCorrect": true,
+ "actual": "not found",
+ "isCorrect": false,
"inputTokens": 15186,
- "outputTokens": 200,
- "latencyMs": 2858.1693749999977
+ "outputTokens": 1608,
+ "latencyMs": 16271.314957999974
},
{
"questionId": "q137",
@@ -14979,7 +22459,18 @@
"isCorrect": true,
"inputTokens": 17408,
"outputTokens": 6,
- "latencyMs": 1786.5725000000093
+ "latencyMs": 1741.4425829999382
+ },
+ {
+ "questionId": "q137",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "24914",
+ "actual": "24914",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 5,
+ "latencyMs": 4409.774542000028
},
{
"questionId": "q137",
@@ -14989,8 +22480,8 @@
"actual": "24914",
"isCorrect": true,
"inputTokens": 8787,
- "outputTokens": 2696,
- "latencyMs": 23868.72975
+ "outputTokens": 1736,
+ "latencyMs": 16616.36137499998
},
{
"questionId": "q137",
@@ -15001,18 +22492,29 @@
"isCorrect": true,
"inputTokens": 9278,
"outputTokens": 6,
- "latencyMs": 1116.0275000000256
+ "latencyMs": 1489.443333000061
+ },
+ {
+ "questionId": "q137",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "24914",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12337,
+ "outputTokens": 1,
+ "latencyMs": 2424.8680840000743
},
{
"questionId": "q137",
"format": "csv",
"model": "gpt-5-nano",
"expected": "24914",
- "actual": "0",
- "isCorrect": false,
+ "actual": "24914",
+ "isCorrect": true,
"inputTokens": 8555,
- "outputTokens": 1543,
- "latencyMs": 17006.341916999954
+ "outputTokens": 2952,
+ "latencyMs": 26078.49774999998
},
{
"questionId": "q137",
@@ -15023,29 +22525,51 @@
"isCorrect": true,
"inputTokens": 9124,
"outputTokens": 6,
- "latencyMs": 1425.7799160000286
+ "latencyMs": 1111.9479170000413
},
{
"questionId": "q137",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
"expected": "24914",
"actual": "24914",
"isCorrect": true,
- "inputTokens": 15480,
- "outputTokens": 648,
- "latencyMs": 8414.583791000012
+ "inputTokens": 12207,
+ "outputTokens": 5,
+ "latencyMs": 2661.1345420000143
},
{
"questionId": "q137",
- "format": "markdown-kv",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "24914",
+ "actual": "not found",
+ "isCorrect": false,
+ "inputTokens": 17137,
+ "outputTokens": 3464,
+ "latencyMs": 36029.06325000001
+ },
+ {
+ "questionId": "q137",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "24914",
"actual": "24914",
"isCorrect": true,
- "inputTokens": 15366,
+ "inputTokens": 19803,
"outputTokens": 6,
- "latencyMs": 1374.9217920000083
+ "latencyMs": 1756.511334000039
+ },
+ {
+ "questionId": "q137",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "24914",
+ "actual": "24914",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 5,
+ "latencyMs": 1706.1073340000585
},
{
"questionId": "q137",
@@ -15055,8 +22579,8 @@
"actual": "24914",
"isCorrect": true,
"inputTokens": 13170,
- "outputTokens": 456,
- "latencyMs": 6113.31808300002
+ "outputTokens": 968,
+ "latencyMs": 8245.267290999996
},
{
"questionId": "q137",
@@ -15067,7 +22591,18 @@
"isCorrect": true,
"inputTokens": 14482,
"outputTokens": 6,
- "latencyMs": 1374.9246660000063
+ "latencyMs": 1405.9593330000062
+ },
+ {
+ "questionId": "q137",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "24914",
+ "actual": "24914",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 5,
+ "latencyMs": 2634.141583000077
},
{
"questionId": "q138",
@@ -15077,8 +22612,8 @@
"actual": "111683",
"isCorrect": true,
"inputTokens": 15186,
- "outputTokens": 392,
- "latencyMs": 5410.596499999985
+ "outputTokens": 520,
+ "latencyMs": 6238.670834000106
},
{
"questionId": "q138",
@@ -15089,7 +22624,18 @@
"isCorrect": true,
"inputTokens": 17407,
"outputTokens": 6,
- "latencyMs": 1607.6261659999727
+ "latencyMs": 1915.2061669999966
+ },
+ {
+ "questionId": "q138",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "111683",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 19990,
+ "outputTokens": 1,
+ "latencyMs": 15802.735749999993
},
{
"questionId": "q138",
@@ -15099,8 +22645,8 @@
"actual": "111683",
"isCorrect": true,
"inputTokens": 8787,
- "outputTokens": 520,
- "latencyMs": 6469.81479199999
+ "outputTokens": 840,
+ "latencyMs": 9492.533834000002
},
{
"questionId": "q138",
@@ -15111,7 +22657,18 @@
"isCorrect": true,
"inputTokens": 9277,
"outputTokens": 6,
- "latencyMs": 1103.9521250000107
+ "latencyMs": 1264.6480839999858
+ },
+ {
+ "questionId": "q138",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "111683",
+ "actual": "111683",
+ "isCorrect": true,
+ "inputTokens": 12336,
+ "outputTokens": 6,
+ "latencyMs": 2581.858165999991
},
{
"questionId": "q138",
@@ -15121,8 +22678,8 @@
"actual": "111683",
"isCorrect": true,
"inputTokens": 8555,
- "outputTokens": 904,
- "latencyMs": 8993.236791000003
+ "outputTokens": 1736,
+ "latencyMs": 20963.487291999976
},
{
"questionId": "q138",
@@ -15133,29 +22690,51 @@
"isCorrect": true,
"inputTokens": 9123,
"outputTokens": 6,
- "latencyMs": 1118.0249590000021
+ "latencyMs": 2031.7733340000268
},
{
"questionId": "q138",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "111683",
+ "actual": "111683",
+ "isCorrect": true,
+ "inputTokens": 12206,
+ "outputTokens": 6,
+ "latencyMs": 2651.7060409999685
+ },
+ {
+ "questionId": "q138",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "111683",
"actual": "111683",
"isCorrect": true,
- "inputTokens": 15480,
- "outputTokens": 392,
- "latencyMs": 4705.902084000001
+ "inputTokens": 17137,
+ "outputTokens": 520,
+ "latencyMs": 5960.176208000048
},
{
"questionId": "q138",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "111683",
"actual": "111683",
"isCorrect": true,
- "inputTokens": 15365,
+ "inputTokens": 19802,
"outputTokens": 6,
- "latencyMs": 1454.1250839999993
+ "latencyMs": 1636.6764170000097
+ },
+ {
+ "questionId": "q138",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "111683",
+ "actual": "111683",
+ "isCorrect": true,
+ "inputTokens": 21880,
+ "outputTokens": 6,
+ "latencyMs": 1322.0868340000743
},
{
"questionId": "q138",
@@ -15165,8 +22744,8 @@
"actual": "111683",
"isCorrect": true,
"inputTokens": 13170,
- "outputTokens": 456,
- "latencyMs": 5041.734750000003
+ "outputTokens": 264,
+ "latencyMs": 5836.014208000037
},
{
"questionId": "q138",
@@ -15177,7 +22756,18 @@
"isCorrect": true,
"inputTokens": 14481,
"outputTokens": 6,
- "latencyMs": 1199.9473330000183
+ "latencyMs": 1280.6878750000615
+ },
+ {
+ "questionId": "q138",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "111683",
+ "actual": "111683",
+ "isCorrect": true,
+ "inputTokens": 17075,
+ "outputTokens": 6,
+ "latencyMs": 3788.612332999939
},
{
"questionId": "q139",
@@ -15187,8 +22777,8 @@
"actual": "13364",
"isCorrect": true,
"inputTokens": 15193,
- "outputTokens": 328,
- "latencyMs": 4364.900083000015
+ "outputTokens": 456,
+ "latencyMs": 6374.532041999977
},
{
"questionId": "q139",
@@ -15199,7 +22789,18 @@
"isCorrect": true,
"inputTokens": 17412,
"outputTokens": 6,
- "latencyMs": 1320.7056250000023
+ "latencyMs": 1435.1170410000486
+ },
+ {
+ "questionId": "q139",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "13364",
+ "actual": "13364",
+ "isCorrect": true,
+ "inputTokens": 19995,
+ "outputTokens": 5,
+ "latencyMs": 2480.6709170000395
},
{
"questionId": "q139",
@@ -15210,7 +22811,7 @@
"isCorrect": true,
"inputTokens": 8794,
"outputTokens": 904,
- "latencyMs": 8590.36599999998
+ "latencyMs": 10770.860708000022
},
{
"questionId": "q139",
@@ -15221,7 +22822,18 @@
"isCorrect": true,
"inputTokens": 9282,
"outputTokens": 6,
- "latencyMs": 1166.0237089999719
+ "latencyMs": 1362.2076670000097
+ },
+ {
+ "questionId": "q139",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "13364",
+ "actual": "13364",
+ "isCorrect": true,
+ "inputTokens": 12341,
+ "outputTokens": 5,
+ "latencyMs": 1725.4546669999836
},
{
"questionId": "q139",
@@ -15231,8 +22843,8 @@
"actual": "13364",
"isCorrect": true,
"inputTokens": 8562,
- "outputTokens": 648,
- "latencyMs": 6442.057417000004
+ "outputTokens": 776,
+ "latencyMs": 7485.538915999932
},
{
"questionId": "q139",
@@ -15243,29 +22855,51 @@
"isCorrect": true,
"inputTokens": 9128,
"outputTokens": 6,
- "latencyMs": 1342.8652910000528
+ "latencyMs": 1517.6439580000006
},
{
"questionId": "q139",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "13364",
+ "actual": "13364",
+ "isCorrect": true,
+ "inputTokens": 12211,
+ "outputTokens": 5,
+ "latencyMs": 3422.7879589999793
+ },
+ {
+ "questionId": "q139",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "13364",
"actual": "13364",
"isCorrect": true,
- "inputTokens": 15487,
- "outputTokens": 264,
- "latencyMs": 4450.340833000024
+ "inputTokens": 17144,
+ "outputTokens": 456,
+ "latencyMs": 9032.850083000027
},
{
"questionId": "q139",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "13364",
"actual": "13364",
"isCorrect": true,
- "inputTokens": 15370,
+ "inputTokens": 19807,
"outputTokens": 6,
- "latencyMs": 1551.4001249999856
+ "latencyMs": 1400.4656250000698
+ },
+ {
+ "questionId": "q139",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "13364",
+ "actual": "13364",
+ "isCorrect": true,
+ "inputTokens": 21885,
+ "outputTokens": 5,
+ "latencyMs": 1666.045665999991
},
{
"questionId": "q139",
@@ -15275,8 +22909,8 @@
"actual": "13364",
"isCorrect": true,
"inputTokens": 13177,
- "outputTokens": 520,
- "latencyMs": 5858.679374999949
+ "outputTokens": 264,
+ "latencyMs": 3696.009834000026
},
{
"questionId": "q139",
@@ -15287,7 +22921,18 @@
"isCorrect": true,
"inputTokens": 14486,
"outputTokens": 6,
- "latencyMs": 1173.6422499999753
+ "latencyMs": 1177.9945420000004
+ },
+ {
+ "questionId": "q139",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "13364",
+ "actual": "13364",
+ "isCorrect": true,
+ "inputTokens": 17080,
+ "outputTokens": 5,
+ "latencyMs": 1399.2657909999834
},
{
"questionId": "q140",
@@ -15297,8 +22942,8 @@
"actual": "98464",
"isCorrect": true,
"inputTokens": 15185,
- "outputTokens": 456,
- "latencyMs": 6377.878708000004
+ "outputTokens": 520,
+ "latencyMs": 8902.311666999944
},
{
"questionId": "q140",
@@ -15309,7 +22954,18 @@
"isCorrect": true,
"inputTokens": 17405,
"outputTokens": 6,
- "latencyMs": 1312.9188750000321
+ "latencyMs": 1588.589624999906
+ },
+ {
+ "questionId": "q140",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "98464",
+ "actual": "98464",
+ "isCorrect": true,
+ "inputTokens": 19989,
+ "outputTokens": 5,
+ "latencyMs": 2070.6354159999173
},
{
"questionId": "q140",
@@ -15319,8 +22975,8 @@
"actual": "98464",
"isCorrect": true,
"inputTokens": 8786,
- "outputTokens": 4680,
- "latencyMs": 36395.80937499995
+ "outputTokens": 1736,
+ "latencyMs": 19399.512374999933
},
{
"questionId": "q140",
@@ -15331,18 +22987,29 @@
"isCorrect": true,
"inputTokens": 9275,
"outputTokens": 6,
- "latencyMs": 2024.6539580000099
+ "latencyMs": 1322.7961249999935
+ },
+ {
+ "questionId": "q140",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "98464",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12335,
+ "outputTokens": 1,
+ "latencyMs": 2467.938582999981
},
{
"questionId": "q140",
"format": "csv",
"model": "gpt-5-nano",
"expected": "98464",
- "actual": "98464",
- "isCorrect": true,
+ "actual": "Not found",
+ "isCorrect": false,
"inputTokens": 8554,
- "outputTokens": 3784,
- "latencyMs": 30336.309707999986
+ "outputTokens": 4808,
+ "latencyMs": 46970.624375000014
},
{
"questionId": "q140",
@@ -15353,29 +23020,51 @@
"isCorrect": true,
"inputTokens": 9121,
"outputTokens": 6,
- "latencyMs": 1237.6976249999716
+ "latencyMs": 1310.4520839999896
},
{
"questionId": "q140",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "98464",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12205,
+ "outputTokens": 1,
+ "latencyMs": 3555.658332999912
+ },
+ {
+ "questionId": "q140",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "98464",
- "actual": "98464",
- "isCorrect": true,
- "inputTokens": 15479,
- "outputTokens": 264,
- "latencyMs": 5297.444375000021
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 17136,
+ "outputTokens": 1735,
+ "latencyMs": 16477.424583000015
},
{
"questionId": "q140",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "98464",
"actual": "98464",
"isCorrect": true,
- "inputTokens": 15363,
+ "inputTokens": 19800,
"outputTokens": 6,
- "latencyMs": 1775.3334170000162
+ "latencyMs": 1970.4299579999642
+ },
+ {
+ "questionId": "q140",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "98464",
+ "actual": "98464",
+ "isCorrect": true,
+ "inputTokens": 21879,
+ "outputTokens": 5,
+ "latencyMs": 26671.477541
},
{
"questionId": "q140",
@@ -15385,8 +23074,8 @@
"actual": "98464",
"isCorrect": true,
"inputTokens": 13169,
- "outputTokens": 392,
- "latencyMs": 8030.958958000003
+ "outputTokens": 1096,
+ "latencyMs": 10919.952667000005
},
{
"questionId": "q140",
@@ -15397,7 +23086,18 @@
"isCorrect": true,
"inputTokens": 14479,
"outputTokens": 6,
- "latencyMs": 1401.1453330000513
+ "latencyMs": 1168.6287909999955
+ },
+ {
+ "questionId": "q140",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "98464",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 17074,
+ "outputTokens": 1,
+ "latencyMs": 2765.029874999891
},
{
"questionId": "q141",
@@ -15407,8 +23107,8 @@
"actual": "6378",
"isCorrect": true,
"inputTokens": 15187,
- "outputTokens": 264,
- "latencyMs": 6193.845583000046
+ "outputTokens": 200,
+ "latencyMs": 6004.068291999982
},
{
"questionId": "q141",
@@ -15419,7 +23119,18 @@
"isCorrect": true,
"inputTokens": 17408,
"outputTokens": 6,
- "latencyMs": 2449.4082920000073
+ "latencyMs": 1499.0042079999112
+ },
+ {
+ "questionId": "q141",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "6378",
+ "actual": "6378",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 4,
+ "latencyMs": 2506.4855830000015
},
{
"questionId": "q141",
@@ -15429,8 +23140,8 @@
"actual": "6378",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 2568,
- "latencyMs": 25386.850749999983
+ "outputTokens": 1032,
+ "latencyMs": 16463.560791999917
},
{
"questionId": "q141",
@@ -15441,7 +23152,18 @@
"isCorrect": true,
"inputTokens": 9278,
"outputTokens": 6,
- "latencyMs": 1351.401165999996
+ "latencyMs": 1441.4096249999711
+ },
+ {
+ "questionId": "q141",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "6378",
+ "actual": "6378",
+ "isCorrect": true,
+ "inputTokens": 12337,
+ "outputTokens": 4,
+ "latencyMs": 2663.2737919999054
},
{
"questionId": "q141",
@@ -15451,8 +23173,8 @@
"actual": "6378",
"isCorrect": true,
"inputTokens": 8556,
- "outputTokens": 456,
- "latencyMs": 5087.453167000029
+ "outputTokens": 904,
+ "latencyMs": 9668.898624999914
},
{
"questionId": "q141",
@@ -15463,29 +23185,51 @@
"isCorrect": true,
"inputTokens": 9124,
"outputTokens": 6,
- "latencyMs": 1229.4187500000116
+ "latencyMs": 1173.9928749999963
},
{
"questionId": "q141",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "6378",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12207,
+ "outputTokens": 1,
+ "latencyMs": 9857.754333000048
+ },
+ {
+ "questionId": "q141",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "6378",
"actual": "6378",
"isCorrect": true,
- "inputTokens": 15481,
- "outputTokens": 520,
- "latencyMs": 6781.348249999981
+ "inputTokens": 17138,
+ "outputTokens": 392,
+ "latencyMs": 9638.438333999948
},
{
"questionId": "q141",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "6378",
"actual": "6378",
"isCorrect": true,
- "inputTokens": 15366,
+ "inputTokens": 19803,
"outputTokens": 6,
- "latencyMs": 1411.0081670000218
+ "latencyMs": 1636.777374999947
+ },
+ {
+ "questionId": "q141",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "6378",
+ "actual": "6378",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 4,
+ "latencyMs": 1841.5572499999544
},
{
"questionId": "q141",
@@ -15496,7 +23240,7 @@
"isCorrect": true,
"inputTokens": 13171,
"outputTokens": 328,
- "latencyMs": 9405.325083000003
+ "latencyMs": 5539.711917000008
},
{
"questionId": "q141",
@@ -15507,7 +23251,18 @@
"isCorrect": true,
"inputTokens": 14482,
"outputTokens": 6,
- "latencyMs": 1575.9942499999888
+ "latencyMs": 1485.2025829999475
+ },
+ {
+ "questionId": "q141",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "6378",
+ "actual": "6378",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 4,
+ "latencyMs": 1622.3209579999093
},
{
"questionId": "q142",
@@ -15518,7 +23273,7 @@
"isCorrect": true,
"inputTokens": 15189,
"outputTokens": 456,
- "latencyMs": 7723.79820900003
+ "latencyMs": 5173.022708000033
},
{
"questionId": "q142",
@@ -15529,7 +23284,18 @@
"isCorrect": true,
"inputTokens": 17409,
"outputTokens": 6,
- "latencyMs": 1496.878625000012
+ "latencyMs": 1700.1781669999473
+ },
+ {
+ "questionId": "q142",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "254916",
+ "actual": "254916",
+ "isCorrect": true,
+ "inputTokens": 19992,
+ "outputTokens": 6,
+ "latencyMs": 2883.810959000024
},
{
"questionId": "q142",
@@ -15539,8 +23305,8 @@
"actual": "254916",
"isCorrect": true,
"inputTokens": 8790,
- "outputTokens": 328,
- "latencyMs": 5231.312959000003
+ "outputTokens": 1352,
+ "latencyMs": 14519.361791000003
},
{
"questionId": "q142",
@@ -15551,7 +23317,18 @@
"isCorrect": true,
"inputTokens": 9279,
"outputTokens": 6,
- "latencyMs": 1145.5107919999864
+ "latencyMs": 1391.6377499999944
+ },
+ {
+ "questionId": "q142",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "254916",
+ "actual": "254916",
+ "isCorrect": true,
+ "inputTokens": 12338,
+ "outputTokens": 6,
+ "latencyMs": 2150.8105409999844
},
{
"questionId": "q142",
@@ -15561,8 +23338,8 @@
"actual": "254916",
"isCorrect": true,
"inputTokens": 8558,
- "outputTokens": 392,
- "latencyMs": 4585.943417000002
+ "outputTokens": 968,
+ "latencyMs": 12890.400166000007
},
{
"questionId": "q142",
@@ -15573,29 +23350,51 @@
"isCorrect": true,
"inputTokens": 9125,
"outputTokens": 6,
- "latencyMs": 1386.1237079999992
+ "latencyMs": 1352.297750000027
},
{
"questionId": "q142",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "254916",
+ "actual": "254916",
+ "isCorrect": true,
+ "inputTokens": 12208,
+ "outputTokens": 6,
+ "latencyMs": 3035.361290999921
+ },
+ {
+ "questionId": "q142",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "254916",
"actual": "254916",
"isCorrect": true,
- "inputTokens": 15483,
- "outputTokens": 328,
- "latencyMs": 9374.248917000019
+ "inputTokens": 17140,
+ "outputTokens": 648,
+ "latencyMs": 26188.04208299995
},
{
"questionId": "q142",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "254916",
"actual": "254916",
"isCorrect": true,
- "inputTokens": 15367,
+ "inputTokens": 19804,
"outputTokens": 6,
- "latencyMs": 1332.4388340000296
+ "latencyMs": 1935.45787500008
+ },
+ {
+ "questionId": "q142",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "254916",
+ "actual": "254916",
+ "isCorrect": true,
+ "inputTokens": 21882,
+ "outputTokens": 6,
+ "latencyMs": 5415.2192920000525
},
{
"questionId": "q142",
@@ -15605,8 +23404,8 @@
"actual": "254916",
"isCorrect": true,
"inputTokens": 13173,
- "outputTokens": 200,
- "latencyMs": 3953.8284580000327
+ "outputTokens": 648,
+ "latencyMs": 6512.995166999986
},
{
"questionId": "q142",
@@ -15617,7 +23416,18 @@
"isCorrect": true,
"inputTokens": 14483,
"outputTokens": 6,
- "latencyMs": 1294.3535840000259
+ "latencyMs": 1957.1825840000529
+ },
+ {
+ "questionId": "q142",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "254916",
+ "actual": "254916",
+ "isCorrect": true,
+ "inputTokens": 17077,
+ "outputTokens": 6,
+ "latencyMs": 1273.1987079998944
},
{
"questionId": "q143",
@@ -15627,8 +23437,8 @@
"actual": "32413",
"isCorrect": true,
"inputTokens": 15187,
- "outputTokens": 584,
- "latencyMs": 8515.676582999993
+ "outputTokens": 712,
+ "latencyMs": 7402.821666999953
},
{
"questionId": "q143",
@@ -15639,7 +23449,18 @@
"isCorrect": true,
"inputTokens": 17410,
"outputTokens": 6,
- "latencyMs": 2508.0940420000115
+ "latencyMs": 1297.3980420000153
+ },
+ {
+ "questionId": "q143",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "32413",
+ "actual": "32413",
+ "isCorrect": true,
+ "inputTokens": 19993,
+ "outputTokens": 5,
+ "latencyMs": 1398.1769159999676
},
{
"questionId": "q143",
@@ -15649,8 +23470,8 @@
"actual": "32413",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 584,
- "latencyMs": 6331.0320000000065
+ "outputTokens": 520,
+ "latencyMs": 8047.9024590000045
},
{
"questionId": "q143",
@@ -15661,7 +23482,18 @@
"isCorrect": true,
"inputTokens": 9280,
"outputTokens": 6,
- "latencyMs": 1249.4856250000303
+ "latencyMs": 1149.3695000000298
+ },
+ {
+ "questionId": "q143",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "32413",
+ "actual": "32413",
+ "isCorrect": true,
+ "inputTokens": 12339,
+ "outputTokens": 5,
+ "latencyMs": 3275.751125000068
},
{
"questionId": "q143",
@@ -15671,8 +23503,8 @@
"actual": "32413",
"isCorrect": true,
"inputTokens": 8556,
- "outputTokens": 648,
- "latencyMs": 8463.519499999995
+ "outputTokens": 520,
+ "latencyMs": 10626.252958000056
},
{
"questionId": "q143",
@@ -15683,29 +23515,51 @@
"isCorrect": true,
"inputTokens": 9126,
"outputTokens": 6,
- "latencyMs": 1035.4223750000237
+ "latencyMs": 1084.1253329999745
},
{
"questionId": "q143",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
"expected": "32413",
"actual": "32413",
"isCorrect": true,
- "inputTokens": 15481,
- "outputTokens": 520,
- "latencyMs": 9625.975833999983
+ "inputTokens": 12209,
+ "outputTokens": 5,
+ "latencyMs": 2478.551666000043
},
{
"questionId": "q143",
- "format": "markdown-kv",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "32413",
+ "actual": "43222",
+ "isCorrect": false,
+ "inputTokens": 17138,
+ "outputTokens": 2248,
+ "latencyMs": 24645.130125000025
+ },
+ {
+ "questionId": "q143",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "32413",
"actual": "32413",
"isCorrect": true,
- "inputTokens": 15368,
+ "inputTokens": 19805,
"outputTokens": 6,
- "latencyMs": 1460.7396250000456
+ "latencyMs": 1504.6681670000544
+ },
+ {
+ "questionId": "q143",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "32413",
+ "actual": "32413",
+ "isCorrect": true,
+ "inputTokens": 21883,
+ "outputTokens": 5,
+ "latencyMs": 1577.2633330000099
},
{
"questionId": "q143",
@@ -15715,8 +23569,8 @@
"actual": "32413",
"isCorrect": true,
"inputTokens": 13171,
- "outputTokens": 712,
- "latencyMs": 7525.112709000008
+ "outputTokens": 776,
+ "latencyMs": 8342.271167000057
},
{
"questionId": "q143",
@@ -15727,18 +23581,29 @@
"isCorrect": true,
"inputTokens": 14484,
"outputTokens": 6,
- "latencyMs": 1488.0029170000344
+ "latencyMs": 1397.2225839999737
+ },
+ {
+ "questionId": "q143",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "32413",
+ "actual": "32413",
+ "isCorrect": true,
+ "inputTokens": 17078,
+ "outputTokens": 5,
+ "latencyMs": 2600.8139589999337
},
{
"questionId": "q144",
"format": "json",
"model": "gpt-5-nano",
"expected": "240059",
- "actual": "not found",
- "isCorrect": false,
+ "actual": "240059",
+ "isCorrect": true,
"inputTokens": 15185,
- "outputTokens": 1352,
- "latencyMs": 8303.157542
+ "outputTokens": 648,
+ "latencyMs": 10642.901458999957
},
{
"questionId": "q144",
@@ -15749,18 +23614,29 @@
"isCorrect": true,
"inputTokens": 17405,
"outputTokens": 6,
- "latencyMs": 1515.7900000000373
+ "latencyMs": 1309.3054169999668
+ },
+ {
+ "questionId": "q144",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "240059",
+ "actual": "240059",
+ "isCorrect": true,
+ "inputTokens": 19989,
+ "outputTokens": 6,
+ "latencyMs": 1797.455083000008
},
{
"questionId": "q144",
"format": "toon",
"model": "gpt-5-nano",
"expected": "240059",
- "actual": "0",
- "isCorrect": false,
+ "actual": "240059",
+ "isCorrect": true,
"inputTokens": 8786,
- "outputTokens": 2503,
- "latencyMs": 20915.808583000035
+ "outputTokens": 1096,
+ "latencyMs": 11485.876249999972
},
{
"questionId": "q144",
@@ -15771,18 +23647,29 @@
"isCorrect": true,
"inputTokens": 9275,
"outputTokens": 6,
- "latencyMs": 1193.4237079999875
+ "latencyMs": 1909.1485000000102
+ },
+ {
+ "questionId": "q144",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "240059",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12335,
+ "outputTokens": 1,
+ "latencyMs": 2114.457832999993
},
{
"questionId": "q144",
"format": "csv",
"model": "gpt-5-nano",
"expected": "240059",
- "actual": "240059",
- "isCorrect": true,
+ "actual": "Not found",
+ "isCorrect": false,
"inputTokens": 8554,
- "outputTokens": 4360,
- "latencyMs": 34760.80329100002
+ "outputTokens": 2760,
+ "latencyMs": 36680.54220799997
},
{
"questionId": "q144",
@@ -15793,40 +23680,62 @@
"isCorrect": true,
"inputTokens": 9121,
"outputTokens": 6,
- "latencyMs": 3022.242749999976
+ "latencyMs": 1069.4299589999719
},
{
"questionId": "q144",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "240059",
+ "actual": "240059",
+ "isCorrect": true,
+ "inputTokens": 12205,
+ "outputTokens": 6,
+ "latencyMs": 2047.3995000000577
+ },
+ {
+ "questionId": "q144",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "240059",
- "actual": "0",
- "isCorrect": false,
- "inputTokens": 15479,
- "outputTokens": 2567,
- "latencyMs": 15901.546999999962
+ "actual": "240059",
+ "isCorrect": true,
+ "inputTokens": 17136,
+ "outputTokens": 456,
+ "latencyMs": 8763.321875000023
},
{
"questionId": "q144",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "240059",
"actual": "240059",
"isCorrect": true,
- "inputTokens": 15363,
+ "inputTokens": 19800,
"outputTokens": 6,
- "latencyMs": 1358.283374999999
+ "latencyMs": 1591.410208000103
+ },
+ {
+ "questionId": "q144",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "240059",
+ "actual": "240059",
+ "isCorrect": true,
+ "inputTokens": 21879,
+ "outputTokens": 6,
+ "latencyMs": 1814.5240000000922
},
{
"questionId": "q144",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "240059",
- "actual": "240059",
- "isCorrect": true,
+ "actual": "0",
+ "isCorrect": false,
"inputTokens": 13169,
- "outputTokens": 584,
- "latencyMs": 10520.349042000016
+ "outputTokens": 2951,
+ "latencyMs": 28527.662250000052
},
{
"questionId": "q144",
@@ -15837,7 +23746,18 @@
"isCorrect": true,
"inputTokens": 14479,
"outputTokens": 6,
- "latencyMs": 1426.0678330000374
+ "latencyMs": 1341.8624169999966
+ },
+ {
+ "questionId": "q144",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "240059",
+ "actual": "240059",
+ "isCorrect": true,
+ "inputTokens": 17074,
+ "outputTokens": 6,
+ "latencyMs": 2672.0011249999516
},
{
"questionId": "q145",
@@ -15847,8 +23767,8 @@
"actual": "48986",
"isCorrect": true,
"inputTokens": 15186,
- "outputTokens": 712,
- "latencyMs": 7069.827042000019
+ "outputTokens": 1288,
+ "latencyMs": 11650.464916000026
},
{
"questionId": "q145",
@@ -15859,7 +23779,18 @@
"isCorrect": true,
"inputTokens": 17406,
"outputTokens": 6,
- "latencyMs": 1507.9525419999845
+ "latencyMs": 1736.123957999982
+ },
+ {
+ "questionId": "q145",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "48986",
+ "actual": "48986",
+ "isCorrect": true,
+ "inputTokens": 19989,
+ "outputTokens": 5,
+ "latencyMs": 2115.1809580000117
},
{
"questionId": "q145",
@@ -15869,8 +23800,8 @@
"actual": "undefined",
"isCorrect": false,
"inputTokens": 8787,
- "outputTokens": 2311,
- "latencyMs": 18257.385332999984
+ "outputTokens": 2119,
+ "latencyMs": 22429.965708000003
},
{
"questionId": "q145",
@@ -15881,7 +23812,18 @@
"isCorrect": true,
"inputTokens": 9276,
"outputTokens": 6,
- "latencyMs": 1397.3040420000325
+ "latencyMs": 1280.45074999996
+ },
+ {
+ "questionId": "q145",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "48986",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12335,
+ "outputTokens": 1,
+ "latencyMs": 2039.6975419999799
},
{
"questionId": "q145",
@@ -15891,8 +23833,8 @@
"actual": "48986",
"isCorrect": true,
"inputTokens": 8555,
- "outputTokens": 3976,
- "latencyMs": 29865.140291999967
+ "outputTokens": 1352,
+ "latencyMs": 13713.023125000065
},
{
"questionId": "q145",
@@ -15903,29 +23845,51 @@
"isCorrect": true,
"inputTokens": 9122,
"outputTokens": 6,
- "latencyMs": 1218.4357079999754
+ "latencyMs": 1190.7314999999944
},
{
"questionId": "q145",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "48986",
+ "actual": "None",
+ "isCorrect": false,
+ "inputTokens": 12205,
+ "outputTokens": 1,
+ "latencyMs": 3054.557584000053
+ },
+ {
+ "questionId": "q145",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "48986",
"actual": "48986",
"isCorrect": true,
- "inputTokens": 15480,
- "outputTokens": 904,
- "latencyMs": 8906.708750000049
+ "inputTokens": 17137,
+ "outputTokens": 456,
+ "latencyMs": 8163.3440420000115
},
{
"questionId": "q145",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "48986",
"actual": "48986",
"isCorrect": true,
- "inputTokens": 15364,
+ "inputTokens": 19801,
"outputTokens": 6,
- "latencyMs": 1917.3721249999944
+ "latencyMs": 2508.831208000076
+ },
+ {
+ "questionId": "q145",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "48986",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 21879,
+ "outputTokens": 1,
+ "latencyMs": 13907.184875000035
},
{
"questionId": "q145",
@@ -15935,8 +23899,8 @@
"actual": "48986",
"isCorrect": true,
"inputTokens": 13170,
- "outputTokens": 1160,
- "latencyMs": 9665.802708000003
+ "outputTokens": 968,
+ "latencyMs": 9999.614625000046
},
{
"questionId": "q145",
@@ -15947,18 +23911,29 @@
"isCorrect": true,
"inputTokens": 14480,
"outputTokens": 6,
- "latencyMs": 1342.7929170000134
+ "latencyMs": 1401.668834000011
+ },
+ {
+ "questionId": "q145",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "48986",
+ "actual": "48986",
+ "isCorrect": true,
+ "inputTokens": 17074,
+ "outputTokens": 5,
+ "latencyMs": 3342.504416999989
},
{
"questionId": "q146",
"format": "json",
"model": "gpt-5-nano",
"expected": "209624",
- "actual": "209624",
- "isCorrect": true,
+ "actual": "0",
+ "isCorrect": false,
"inputTokens": 15185,
- "outputTokens": 648,
- "latencyMs": 6259.387500000012
+ "outputTokens": 1607,
+ "latencyMs": 14253.204374999972
},
{
"questionId": "q146",
@@ -15969,7 +23944,18 @@
"isCorrect": true,
"inputTokens": 17405,
"outputTokens": 6,
- "latencyMs": 1860.1597499999916
+ "latencyMs": 1633.1817499999888
+ },
+ {
+ "questionId": "q146",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "209624",
+ "actual": "209624",
+ "isCorrect": true,
+ "inputTokens": 19989,
+ "outputTokens": 6,
+ "latencyMs": 4013.2274579999503
},
{
"questionId": "q146",
@@ -15979,8 +23965,8 @@
"actual": "209624",
"isCorrect": true,
"inputTokens": 8786,
- "outputTokens": 3336,
- "latencyMs": 23288.63820799999
+ "outputTokens": 1864,
+ "latencyMs": 18068.214749999926
},
{
"questionId": "q146",
@@ -15991,7 +23977,18 @@
"isCorrect": true,
"inputTokens": 9275,
"outputTokens": 6,
- "latencyMs": 1180.5804169999901
+ "latencyMs": 2633.8406670000404
+ },
+ {
+ "questionId": "q146",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "209624",
+ "actual": "209624",
+ "isCorrect": true,
+ "inputTokens": 12335,
+ "outputTokens": 6,
+ "latencyMs": 2308.719957999885
},
{
"questionId": "q146",
@@ -16001,8 +23998,8 @@
"actual": "209624",
"isCorrect": true,
"inputTokens": 8554,
- "outputTokens": 840,
- "latencyMs": 6988.782166000048
+ "outputTokens": 3592,
+ "latencyMs": 34956.612250000006
},
{
"questionId": "q146",
@@ -16013,29 +24010,51 @@
"isCorrect": true,
"inputTokens": 9121,
"outputTokens": 6,
- "latencyMs": 1391.326041000022
+ "latencyMs": 1042.174875000026
},
{
"questionId": "q146",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "209624",
+ "actual": "Not found",
+ "isCorrect": false,
+ "inputTokens": 12205,
+ "outputTokens": 2,
+ "latencyMs": 3570.2167079999344
+ },
+ {
+ "questionId": "q146",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "209624",
"actual": "209624",
"isCorrect": true,
- "inputTokens": 15479,
- "outputTokens": 648,
- "latencyMs": 6708.915624999965
+ "inputTokens": 17136,
+ "outputTokens": 584,
+ "latencyMs": 8155.267999999924
},
{
"questionId": "q146",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "209624",
"actual": "209624",
"isCorrect": true,
- "inputTokens": 15363,
+ "inputTokens": 19800,
"outputTokens": 6,
- "latencyMs": 1364.766833999951
+ "latencyMs": 1908.0532499999972
+ },
+ {
+ "questionId": "q146",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "209624",
+ "actual": "209624",
+ "isCorrect": true,
+ "inputTokens": 21879,
+ "outputTokens": 6,
+ "latencyMs": 4646.213583000004
},
{
"questionId": "q146",
@@ -16045,8 +24064,8 @@
"actual": "209624",
"isCorrect": true,
"inputTokens": 13169,
- "outputTokens": 328,
- "latencyMs": 3396.199416999996
+ "outputTokens": 392,
+ "latencyMs": 8023.040708000073
},
{
"questionId": "q146",
@@ -16057,7 +24076,18 @@
"isCorrect": true,
"inputTokens": 14479,
"outputTokens": 6,
- "latencyMs": 1378.3461249999818
+ "latencyMs": 1252.574666999979
+ },
+ {
+ "questionId": "q146",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "209624",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 17074,
+ "outputTokens": 1,
+ "latencyMs": 9256.544125000015
},
{
"questionId": "q147",
@@ -16067,8 +24097,8 @@
"actual": "58023",
"isCorrect": true,
"inputTokens": 15185,
- "outputTokens": 200,
- "latencyMs": 2947.7053750000196
+ "outputTokens": 328,
+ "latencyMs": 6800.243999999948
},
{
"questionId": "q147",
@@ -16079,7 +24109,18 @@
"isCorrect": true,
"inputTokens": 17406,
"outputTokens": 6,
- "latencyMs": 1512.1218329999829
+ "latencyMs": 1856.026916999952
+ },
+ {
+ "questionId": "q147",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "58023",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 19989,
+ "outputTokens": 1,
+ "latencyMs": 1783.4203330000164
},
{
"questionId": "q147",
@@ -16089,8 +24130,8 @@
"actual": "58023",
"isCorrect": true,
"inputTokens": 8786,
- "outputTokens": 840,
- "latencyMs": 7657.443458000023
+ "outputTokens": 904,
+ "latencyMs": 8408.46395799995
},
{
"questionId": "q147",
@@ -16101,7 +24142,18 @@
"isCorrect": true,
"inputTokens": 9276,
"outputTokens": 6,
- "latencyMs": 1119.6807499999995
+ "latencyMs": 1048.0284159999574
+ },
+ {
+ "questionId": "q147",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "58023",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12335,
+ "outputTokens": 1,
+ "latencyMs": 2309.89829199994
},
{
"questionId": "q147",
@@ -16111,8 +24163,8 @@
"actual": "58023",
"isCorrect": true,
"inputTokens": 8554,
- "outputTokens": 392,
- "latencyMs": 4410.906208000029
+ "outputTokens": 456,
+ "latencyMs": 7778.412583000027
},
{
"questionId": "q147",
@@ -16123,29 +24175,51 @@
"isCorrect": true,
"inputTokens": 9122,
"outputTokens": 6,
- "latencyMs": 1227.467249999987
+ "latencyMs": 1095.3032080000266
},
{
"questionId": "q147",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "58023",
+ "actual": "58023",
+ "isCorrect": true,
+ "inputTokens": 12205,
+ "outputTokens": 5,
+ "latencyMs": 2191.419332999969
+ },
+ {
+ "questionId": "q147",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "58023",
"actual": "58023",
"isCorrect": true,
- "inputTokens": 15479,
+ "inputTokens": 17136,
"outputTokens": 328,
- "latencyMs": 4168.014292000036
+ "latencyMs": 5028.444708000054
},
{
"questionId": "q147",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "58023",
"actual": "58023",
"isCorrect": true,
- "inputTokens": 15364,
+ "inputTokens": 19801,
"outputTokens": 6,
- "latencyMs": 1878.2624590000487
+ "latencyMs": 1697.0504170000786
+ },
+ {
+ "questionId": "q147",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "58023",
+ "actual": "58023",
+ "isCorrect": true,
+ "inputTokens": 21879,
+ "outputTokens": 5,
+ "latencyMs": 1800.0818329999456
},
{
"questionId": "q147",
@@ -16155,8 +24229,8 @@
"actual": "58023",
"isCorrect": true,
"inputTokens": 13169,
- "outputTokens": 456,
- "latencyMs": 4726.903416000016
+ "outputTokens": 712,
+ "latencyMs": 8022.871625000029
},
{
"questionId": "q147",
@@ -16167,7 +24241,18 @@
"isCorrect": true,
"inputTokens": 14480,
"outputTokens": 6,
- "latencyMs": 1665.950124999974
+ "latencyMs": 1105.1744999999646
+ },
+ {
+ "questionId": "q147",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "58023",
+ "actual": "58023",
+ "isCorrect": true,
+ "inputTokens": 17074,
+ "outputTokens": 5,
+ "latencyMs": 2765.7437500000233
},
{
"questionId": "q148",
@@ -16177,8 +24262,8 @@
"actual": "196024",
"isCorrect": true,
"inputTokens": 15188,
- "outputTokens": 456,
- "latencyMs": 5633.756834
+ "outputTokens": 328,
+ "latencyMs": 4684.178457999951
},
{
"questionId": "q148",
@@ -16189,7 +24274,18 @@
"isCorrect": true,
"inputTokens": 17407,
"outputTokens": 6,
- "latencyMs": 1482.6277910000063
+ "latencyMs": 1856.438208000036
+ },
+ {
+ "questionId": "q148",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "196024",
+ "actual": "196024",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 6,
+ "latencyMs": 4894.268209000002
},
{
"questionId": "q148",
@@ -16199,8 +24295,8 @@
"actual": "196024",
"isCorrect": true,
"inputTokens": 8789,
- "outputTokens": 1416,
- "latencyMs": 11371.267457999988
+ "outputTokens": 1608,
+ "latencyMs": 19985.54383400001
},
{
"questionId": "q148",
@@ -16211,18 +24307,29 @@
"isCorrect": true,
"inputTokens": 9277,
"outputTokens": 6,
- "latencyMs": 1690.2400420000195
+ "latencyMs": 1212.5407500000438
+ },
+ {
+ "questionId": "q148",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "196024",
+ "actual": "N/A",
+ "isCorrect": false,
+ "inputTokens": 12337,
+ "outputTokens": 3,
+ "latencyMs": 12548.686624999973
},
{
"questionId": "q148",
"format": "csv",
"model": "gpt-5-nano",
"expected": "196024",
- "actual": "Repo not found",
- "isCorrect": false,
+ "actual": "196024",
+ "isCorrect": true,
"inputTokens": 8557,
- "outputTokens": 3273,
- "latencyMs": 28731.530667000043
+ "outputTokens": 2760,
+ "latencyMs": 20131.88070800004
},
{
"questionId": "q148",
@@ -16233,29 +24340,51 @@
"isCorrect": true,
"inputTokens": 9123,
"outputTokens": 6,
- "latencyMs": 1070.5141670000157
+ "latencyMs": 1217.2275000000373
},
{
"questionId": "q148",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "196024",
+ "actual": "196024",
+ "isCorrect": true,
+ "inputTokens": 12207,
+ "outputTokens": 6,
+ "latencyMs": 2748.620916999993
+ },
+ {
+ "questionId": "q148",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "196024",
"actual": "196024",
"isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 520,
- "latencyMs": 7021.771125000028
+ "inputTokens": 17139,
+ "outputTokens": 392,
+ "latencyMs": 6418.833957999945
},
{
"questionId": "q148",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "196024",
"actual": "196024",
"isCorrect": true,
- "inputTokens": 15365,
+ "inputTokens": 19802,
"outputTokens": 6,
- "latencyMs": 1243.7466250000289
+ "latencyMs": 2019.8872089999495
+ },
+ {
+ "questionId": "q148",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "196024",
+ "actual": "196024",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 6,
+ "latencyMs": 2523.128167000017
},
{
"questionId": "q148",
@@ -16265,8 +24394,8 @@
"actual": "196024",
"isCorrect": true,
"inputTokens": 13172,
- "outputTokens": 456,
- "latencyMs": 5286.169750000001
+ "outputTokens": 584,
+ "latencyMs": 8212.874959000037
},
{
"questionId": "q148",
@@ -16277,7 +24406,18 @@
"isCorrect": true,
"inputTokens": 14481,
"outputTokens": 6,
- "latencyMs": 1450.456957999966
+ "latencyMs": 1151.26241700002
+ },
+ {
+ "questionId": "q148",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "196024",
+ "actual": "196024",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 6,
+ "latencyMs": 3479.8169999999227
},
{
"questionId": "q149",
@@ -16288,7 +24428,7 @@
"isCorrect": true,
"inputTokens": 15188,
"outputTokens": 456,
- "latencyMs": 5440.864250000042
+ "latencyMs": 6856.402957999962
},
{
"questionId": "q149",
@@ -16299,7 +24439,18 @@
"isCorrect": true,
"inputTokens": 17408,
"outputTokens": 6,
- "latencyMs": 1369.6618330000201
+ "latencyMs": 1727.7318750000559
+ },
+ {
+ "questionId": "q149",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "30919",
+ "actual": "30919",
+ "isCorrect": true,
+ "inputTokens": 19991,
+ "outputTokens": 5,
+ "latencyMs": 5595.708332999959
},
{
"questionId": "q149",
@@ -16309,8 +24460,8 @@
"actual": "30919",
"isCorrect": true,
"inputTokens": 8789,
- "outputTokens": 712,
- "latencyMs": 6130.9379999999655
+ "outputTokens": 584,
+ "latencyMs": 5889.62179200002
},
{
"questionId": "q149",
@@ -16321,18 +24472,29 @@
"isCorrect": true,
"inputTokens": 9278,
"outputTokens": 6,
- "latencyMs": 1635.81579100003
+ "latencyMs": 1206.469458000036
+ },
+ {
+ "questionId": "q149",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "30919",
+ "actual": "30919",
+ "isCorrect": true,
+ "inputTokens": 12337,
+ "outputTokens": 5,
+ "latencyMs": 2057.8787500000326
},
{
"questionId": "q149",
"format": "csv",
"model": "gpt-5-nano",
"expected": "30919",
- "actual": "N/A",
- "isCorrect": false,
+ "actual": "30919",
+ "isCorrect": true,
"inputTokens": 8557,
- "outputTokens": 1288,
- "latencyMs": 20319.653374999994
+ "outputTokens": 584,
+ "latencyMs": 6905.8247499999125
},
{
"questionId": "q149",
@@ -16343,29 +24505,51 @@
"isCorrect": true,
"inputTokens": 9124,
"outputTokens": 6,
- "latencyMs": 1381.8252079999656
+ "latencyMs": 1003.953542000032
},
{
"questionId": "q149",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "30919",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12207,
+ "outputTokens": 1,
+ "latencyMs": 2500.2377919999417
+ },
+ {
+ "questionId": "q149",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "30919",
"actual": "30919",
"isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 328,
- "latencyMs": 5951.751374999993
+ "inputTokens": 17139,
+ "outputTokens": 264,
+ "latencyMs": 4909.18979199999
},
{
"questionId": "q149",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "30919",
"actual": "30919",
"isCorrect": true,
- "inputTokens": 15366,
+ "inputTokens": 19803,
"outputTokens": 6,
- "latencyMs": 1367.1241670000018
+ "latencyMs": 2457.2324580000713
+ },
+ {
+ "questionId": "q149",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "30919",
+ "actual": "30919",
+ "isCorrect": true,
+ "inputTokens": 21881,
+ "outputTokens": 5,
+ "latencyMs": 1428.471666000085
},
{
"questionId": "q149",
@@ -16375,8 +24559,8 @@
"actual": "30919",
"isCorrect": true,
"inputTokens": 13172,
- "outputTokens": 328,
- "latencyMs": 3499.136334000039
+ "outputTokens": 392,
+ "latencyMs": 5668.693708000006
},
{
"questionId": "q149",
@@ -16387,7 +24571,18 @@
"isCorrect": true,
"inputTokens": 14482,
"outputTokens": 6,
- "latencyMs": 1573.7027499999967
+ "latencyMs": 1222.2983330000425
+ },
+ {
+ "questionId": "q149",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "30919",
+ "actual": "30919",
+ "isCorrect": true,
+ "inputTokens": 17076,
+ "outputTokens": 5,
+ "latencyMs": 3050.278290999937
},
{
"questionId": "q150",
@@ -16397,8 +24592,8 @@
"actual": "192220",
"isCorrect": true,
"inputTokens": 15187,
- "outputTokens": 392,
- "latencyMs": 7833.668625000049
+ "outputTokens": 456,
+ "latencyMs": 7561.326083000051
},
{
"questionId": "q150",
@@ -16409,7 +24604,18 @@
"isCorrect": true,
"inputTokens": 17405,
"outputTokens": 6,
- "latencyMs": 1477.048582999967
+ "latencyMs": 2041.015417000046
+ },
+ {
+ "questionId": "q150",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "192220",
+ "actual": "192220",
+ "isCorrect": true,
+ "inputTokens": 19989,
+ "outputTokens": 6,
+ "latencyMs": 1918.6380409999983
},
{
"questionId": "q150",
@@ -16419,8 +24625,8 @@
"actual": "192220",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 520,
- "latencyMs": 4880.817959000007
+ "outputTokens": 776,
+ "latencyMs": 7871.997415999998
},
{
"questionId": "q150",
@@ -16431,18 +24637,29 @@
"isCorrect": true,
"inputTokens": 9275,
"outputTokens": 6,
- "latencyMs": 1081.6979169999831
+ "latencyMs": 1578.9285829999717
+ },
+ {
+ "questionId": "q150",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "192220",
+ "actual": "192220",
+ "isCorrect": true,
+ "inputTokens": 12335,
+ "outputTokens": 6,
+ "latencyMs": 2032.75475000008
},
{
"questionId": "q150",
"format": "csv",
"model": "gpt-5-nano",
"expected": "192220",
- "actual": "192220",
- "isCorrect": true,
+ "actual": "0",
+ "isCorrect": false,
"inputTokens": 8556,
- "outputTokens": 1992,
- "latencyMs": 14180.11841699999
+ "outputTokens": 1159,
+ "latencyMs": 30959.83791699994
},
{
"questionId": "q150",
@@ -16453,29 +24670,51 @@
"isCorrect": true,
"inputTokens": 9121,
"outputTokens": 6,
- "latencyMs": 1393.665417000011
+ "latencyMs": 1389.4868339999812
},
{
"questionId": "q150",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "192220",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12205,
+ "outputTokens": 1,
+ "latencyMs": 3573.9437089998974
+ },
+ {
+ "questionId": "q150",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "192220",
"actual": "192220",
"isCorrect": true,
- "inputTokens": 15481,
+ "inputTokens": 17138,
"outputTokens": 392,
- "latencyMs": 4068.912416999985
+ "latencyMs": 6992.854374999995
},
{
"questionId": "q150",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "192220",
"actual": "192220",
"isCorrect": true,
- "inputTokens": 15363,
+ "inputTokens": 19800,
"outputTokens": 6,
- "latencyMs": 1687.0724170000176
+ "latencyMs": 1679.577958000009
+ },
+ {
+ "questionId": "q150",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "192220",
+ "actual": "192220",
+ "isCorrect": true,
+ "inputTokens": 21879,
+ "outputTokens": 6,
+ "latencyMs": 1553.5702499999898
},
{
"questionId": "q150",
@@ -16485,8 +24724,8 @@
"actual": "192220",
"isCorrect": true,
"inputTokens": 13171,
- "outputTokens": 392,
- "latencyMs": 4048.8707089999807
+ "outputTokens": 328,
+ "latencyMs": 4169.634166999953
},
{
"questionId": "q150",
@@ -16497,7 +24736,18 @@
"isCorrect": true,
"inputTokens": 14479,
"outputTokens": 6,
- "latencyMs": 1441.8594579999917
+ "latencyMs": 1384.3902089999756
+ },
+ {
+ "questionId": "q150",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "192220",
+ "actual": "192220",
+ "isCorrect": true,
+ "inputTokens": 17074,
+ "outputTokens": 6,
+ "latencyMs": 2953.2877919999883
},
{
"questionId": "q151",
@@ -16507,8 +24757,8 @@
"actual": "11763",
"isCorrect": true,
"inputTokens": 15190,
- "outputTokens": 392,
- "latencyMs": 4563.366041000001
+ "outputTokens": 584,
+ "latencyMs": 6612.153208000003
},
{
"questionId": "q151",
@@ -16519,7 +24769,18 @@
"isCorrect": true,
"inputTokens": 17414,
"outputTokens": 6,
- "latencyMs": 1361.9952920000069
+ "latencyMs": 2259.919874999905
+ },
+ {
+ "questionId": "q151",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "11763",
+ "actual": "11763",
+ "isCorrect": true,
+ "inputTokens": 19997,
+ "outputTokens": 5,
+ "latencyMs": 4557.873041000101
},
{
"questionId": "q151",
@@ -16529,8 +24790,8 @@
"actual": "11763",
"isCorrect": true,
"inputTokens": 8791,
- "outputTokens": 904,
- "latencyMs": 9523.924416000023
+ "outputTokens": 712,
+ "latencyMs": 7556.261375000002
},
{
"questionId": "q151",
@@ -16541,7 +24802,18 @@
"isCorrect": true,
"inputTokens": 9284,
"outputTokens": 6,
- "latencyMs": 1235.863416999986
+ "latencyMs": 1012.9206669999985
+ },
+ {
+ "questionId": "q151",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "11763",
+ "actual": "11763",
+ "isCorrect": true,
+ "inputTokens": 12343,
+ "outputTokens": 5,
+ "latencyMs": 6754.191916999989
},
{
"questionId": "q151",
@@ -16551,8 +24823,8 @@
"actual": "11763",
"isCorrect": true,
"inputTokens": 8559,
- "outputTokens": 584,
- "latencyMs": 5264.637583000003
+ "outputTokens": 712,
+ "latencyMs": 7742.647875000024
},
{
"questionId": "q151",
@@ -16563,29 +24835,51 @@
"isCorrect": true,
"inputTokens": 9130,
"outputTokens": 6,
- "latencyMs": 1307.1584169999696
+ "latencyMs": 1578.1971669999184
},
{
"questionId": "q151",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "11763",
+ "actual": "11763",
+ "isCorrect": true,
+ "inputTokens": 12213,
+ "outputTokens": 5,
+ "latencyMs": 7366.954833999975
+ },
+ {
+ "questionId": "q151",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "11763",
"actual": "11763",
"isCorrect": true,
- "inputTokens": 15484,
+ "inputTokens": 17141,
"outputTokens": 328,
- "latencyMs": 8621.355207999994
+ "latencyMs": 6099.567540999968
},
{
"questionId": "q151",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "11763",
"actual": "11763",
"isCorrect": true,
- "inputTokens": 15372,
+ "inputTokens": 19809,
"outputTokens": 6,
- "latencyMs": 1464.8200829999987
+ "latencyMs": 1278.9319580000592
+ },
+ {
+ "questionId": "q151",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "11763",
+ "actual": "11763",
+ "isCorrect": true,
+ "inputTokens": 21887,
+ "outputTokens": 5,
+ "latencyMs": 4035.024666000041
},
{
"questionId": "q151",
@@ -16595,8 +24889,8 @@
"actual": "11763",
"isCorrect": true,
"inputTokens": 13174,
- "outputTokens": 264,
- "latencyMs": 3034.7359999999753
+ "outputTokens": 456,
+ "latencyMs": 4068.7430829999503
},
{
"questionId": "q151",
@@ -16607,18 +24901,29 @@
"isCorrect": true,
"inputTokens": 14488,
"outputTokens": 6,
- "latencyMs": 1959.3285000000033
+ "latencyMs": 1183.168624999933
+ },
+ {
+ "questionId": "q151",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "11763",
+ "actual": "11763",
+ "isCorrect": true,
+ "inputTokens": 17082,
+ "outputTokens": 5,
+ "latencyMs": 1311.251791000017
},
{
"questionId": "q152",
"format": "json",
"model": "gpt-5-nano",
"expected": "100",
- "actual": "100",
- "isCorrect": true,
+ "actual": "114",
+ "isCorrect": false,
"inputTokens": 15187,
- "outputTokens": 2055,
- "latencyMs": 16430.930082999985
+ "outputTokens": 3271,
+ "latencyMs": 26292.3486250001
},
{
"questionId": "q152",
@@ -16629,7 +24934,18 @@
"isCorrect": false,
"inputTokens": 17406,
"outputTokens": 5,
- "latencyMs": 1730.124458999955
+ "latencyMs": 1269.8386670000618
+ },
+ {
+ "questionId": "q152",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "2",
+ "isCorrect": false,
+ "inputTokens": 19990,
+ "outputTokens": 1,
+ "latencyMs": 1418.8326250000391
},
{
"questionId": "q152",
@@ -16639,8 +24955,8 @@
"actual": "100",
"isCorrect": true,
"inputTokens": 8788,
- "outputTokens": 839,
- "latencyMs": 7275.640458000009
+ "outputTokens": 711,
+ "latencyMs": 7467.631458999938
},
{
"questionId": "q152",
@@ -16651,7 +24967,18 @@
"isCorrect": false,
"inputTokens": 9276,
"outputTokens": 5,
- "latencyMs": 1286.8315839999705
+ "latencyMs": 1310.1392090000445
+ },
+ {
+ "questionId": "q152",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12336,
+ "outputTokens": 1,
+ "latencyMs": 2714.426749999984
},
{
"questionId": "q152",
@@ -16661,8 +24988,8 @@
"actual": "0",
"isCorrect": false,
"inputTokens": 8556,
- "outputTokens": 2695,
- "latencyMs": 24177.570000000007
+ "outputTokens": 903,
+ "latencyMs": 10460.54125000001
},
{
"questionId": "q152",
@@ -16673,40 +25000,62 @@
"isCorrect": false,
"inputTokens": 9122,
"outputTokens": 5,
- "latencyMs": 1102.5337500000023
+ "latencyMs": 1165.5718329999363
},
{
"questionId": "q152",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 12206,
+ "outputTokens": 1,
+ "latencyMs": 6584.999583999976
+ },
+ {
+ "questionId": "q152",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 15481,
- "outputTokens": 1671,
- "latencyMs": 14929.856415999995
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 17138,
+ "outputTokens": 519,
+ "latencyMs": 7805.630750000011
},
{
"questionId": "q152",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 15364,
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 19801,
"outputTokens": 5,
- "latencyMs": 1227.103541999997
+ "latencyMs": 1370.0252500000643
+ },
+ {
+ "questionId": "q152",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "0",
+ "isCorrect": false,
+ "inputTokens": 21880,
+ "outputTokens": 1,
+ "latencyMs": 1457.9777079999913
},
{
"questionId": "q152",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "100",
- "actual": "0",
- "isCorrect": false,
+ "actual": "100",
+ "isCorrect": true,
"inputTokens": 13171,
- "outputTokens": 583,
- "latencyMs": 5785.248666999978
+ "outputTokens": 2055,
+ "latencyMs": 73627.54529200005
},
{
"questionId": "q152",
@@ -16717,18 +25066,29 @@
"isCorrect": false,
"inputTokens": 14480,
"outputTokens": 5,
- "latencyMs": 1959.456125000026
+ "latencyMs": 1786.1586249999236
+ },
+ {
+ "questionId": "q152",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "2",
+ "isCorrect": false,
+ "inputTokens": 17075,
+ "outputTokens": 1,
+ "latencyMs": 19150.725124999997
},
{
"questionId": "q153",
"format": "json",
"model": "gpt-5-nano",
"expected": "15404143",
- "actual": "19196630",
+ "actual": "13886916",
"isCorrect": false,
"inputTokens": 15188,
- "outputTokens": 13385,
- "latencyMs": 239619.323125
+ "outputTokens": 5833,
+ "latencyMs": 354484.18529200007
},
{
"questionId": "q153",
@@ -16739,7 +25099,18 @@
"isCorrect": false,
"inputTokens": 17407,
"outputTokens": 9,
- "latencyMs": 1838.8340420000022
+ "latencyMs": 1871.1713750000345
+ },
+ {
+ "questionId": "q153",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "15404143",
+ "actual": "12990000",
+ "isCorrect": false,
+ "inputTokens": 19991,
+ "outputTokens": 8,
+ "latencyMs": 155538.94058299996
},
{
"questionId": "q153",
@@ -16749,8 +25120,8 @@
"actual": "15404143",
"isCorrect": true,
"inputTokens": 8789,
- "outputTokens": 12169,
- "latencyMs": 109453.991416
+ "outputTokens": 5577,
+ "latencyMs": 46411.59825000004
},
{
"questionId": "q153",
@@ -16761,7 +25132,18 @@
"isCorrect": false,
"inputTokens": 9277,
"outputTokens": 9,
- "latencyMs": 1443.470417000004
+ "latencyMs": 1184.7457910000812
+ },
+ {
+ "questionId": "q153",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "15404143",
+ "actual": "14371343",
+ "isCorrect": false,
+ "inputTokens": 12337,
+ "outputTokens": 8,
+ "latencyMs": 27093.977375000017
},
{
"questionId": "q153",
@@ -16771,8 +25153,8 @@
"actual": "15404143",
"isCorrect": true,
"inputTokens": 8557,
- "outputTokens": 6281,
- "latencyMs": 45474.442209
+ "outputTokens": 5321,
+ "latencyMs": 40838.23450000002
},
{
"questionId": "q153",
@@ -16783,40 +25165,62 @@
"isCorrect": false,
"inputTokens": 9123,
"outputTokens": 9,
- "latencyMs": 1361.6022089999751
+ "latencyMs": 1243.0417080000043
},
{
"questionId": "q153",
- "format": "markdown-kv",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "15404143",
+ "actual": "10000000",
+ "isCorrect": false,
+ "inputTokens": 12207,
+ "outputTokens": 8,
+ "latencyMs": 1697.566125000012
+ },
+ {
+ "questionId": "q153",
+ "format": "xml",
"model": "gpt-5-nano",
"expected": "15404143",
- "actual": "15404143",
- "isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 4489,
- "latencyMs": 29654.25554099999
+ "actual": "11887802",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 3465,
+ "latencyMs": 35017.48091599997
},
{
"questionId": "q153",
- "format": "markdown-kv",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "15404143",
- "actual": "13,847,892",
+ "actual": "10,847,892",
"isCorrect": false,
- "inputTokens": 15365,
+ "inputTokens": 19802,
"outputTokens": 9,
- "latencyMs": 1796.0902500000084
+ "latencyMs": 1783.1710419999436
+ },
+ {
+ "questionId": "q153",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "15404143",
+ "actual": "14000000",
+ "isCorrect": false,
+ "inputTokens": 21881,
+ "outputTokens": 8,
+ "latencyMs": 20208.78741599992
},
{
"questionId": "q153",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "15404143",
- "actual": "15404143",
- "isCorrect": true,
+ "actual": "14012139",
+ "isCorrect": false,
"inputTokens": 13172,
- "outputTokens": 6409,
- "latencyMs": 70234.84133299999
+ "outputTokens": 14601,
+ "latencyMs": 139937.6586659999
},
{
"questionId": "q153",
@@ -16827,384 +25231,87 @@
"isCorrect": false,
"inputTokens": 14481,
"outputTokens": 9,
- "latencyMs": 1965.7452919999487
+ "latencyMs": 1949.8563330000034
},
{
- "questionId": "q154",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "60",
+ "questionId": "q153",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "15404143",
+ "actual": "10999999",
"isCorrect": false,
- "inputTokens": 15188,
- "outputTokens": 7495,
- "latencyMs": 72992.43658400001
+ "inputTokens": 17076,
+ "outputTokens": 8,
+ "latencyMs": 1061.2076249999227
},
{
"questionId": "q154",
"format": "json",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 17408,
- "outputTokens": 5,
- "latencyMs": 1772.3059999999823
- },
- {
- "questionId": "q154",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 8789,
- "outputTokens": 2759,
- "latencyMs": 19214.133417000005
- },
- {
- "questionId": "q154",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 9278,
- "outputTokens": 5,
- "latencyMs": 1115.5979170000064
- },
- {
- "questionId": "q154",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 8557,
- "outputTokens": 2439,
- "latencyMs": 27365.987334000005
- },
- {
- "questionId": "q154",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 9124,
- "outputTokens": 5,
- "latencyMs": 1322.4322910000337
- },
- {
- "questionId": "q154",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 5767,
- "latencyMs": 60524.90554200002
- },
- {
- "questionId": "q154",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 15366,
- "outputTokens": 5,
- "latencyMs": 1597.7364170000073
- },
- {
- "questionId": "q154",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 13172,
- "outputTokens": 4039,
- "latencyMs": 28819.869999999995
- },
- {
- "questionId": "q154",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 14482,
- "outputTokens": 5,
- "latencyMs": 1798.9455409999937
- },
- {
- "questionId": "q155",
- "format": "json",
"model": "gpt-5-nano",
"expected": "100",
"actual": "86",
"isCorrect": false,
"inputTokens": 15188,
- "outputTokens": 2375,
- "latencyMs": 23963.549916999997
+ "outputTokens": 3591,
+ "latencyMs": 186054.49916699994
},
{
- "questionId": "q155",
+ "questionId": "q154",
"format": "json",
"model": "claude-haiku-4-5",
"expected": "100",
- "actual": "71",
- "isCorrect": false,
+ "actual": "100",
+ "isCorrect": true,
"inputTokens": 17408,
"outputTokens": 5,
- "latencyMs": 1836.1375000000116
+ "latencyMs": 1541.018458000035
},
{
- "questionId": "q155",
+ "questionId": "q154",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "59",
+ "isCorrect": false,
+ "inputTokens": 19994,
+ "outputTokens": 2,
+ "latencyMs": 1209.527832999942
+ },
+ {
+ "questionId": "q154",
"format": "toon",
"model": "gpt-5-nano",
"expected": "100",
"actual": "100",
"isCorrect": true,
"inputTokens": 8789,
- "outputTokens": 3079,
- "latencyMs": 26957.04420799995
+ "outputTokens": 2311,
+ "latencyMs": 20000.66104200005
},
{
- "questionId": "q155",
+ "questionId": "q154",
"format": "toon",
"model": "claude-haiku-4-5",
"expected": "100",
- "actual": "42",
- "isCorrect": false,
+ "actual": "100",
+ "isCorrect": true,
"inputTokens": 9278,
"outputTokens": 5,
- "latencyMs": 1209.7997920000344
+ "latencyMs": 1125.2787499999395
},
{
- "questionId": "q155",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 8557,
- "outputTokens": 2887,
- "latencyMs": 27174.970375000034
- },
- {
- "questionId": "q155",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "47",
- "isCorrect": false,
- "inputTokens": 9124,
- "outputTokens": 5,
- "latencyMs": 1293.6252920000115
- },
- {
- "questionId": "q155",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "98",
- "isCorrect": false,
- "inputTokens": 15482,
- "outputTokens": 2567,
- "latencyMs": 29565.065250000043
- },
- {
- "questionId": "q155",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "71",
- "isCorrect": false,
- "inputTokens": 15366,
- "outputTokens": 5,
- "latencyMs": 1230.7459160000435
- },
- {
- "questionId": "q155",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 13172,
- "outputTokens": 2695,
- "latencyMs": 20706.84841700003
- },
- {
- "questionId": "q155",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "71",
- "isCorrect": false,
- "inputTokens": 14482,
- "outputTokens": 5,
- "latencyMs": 1743.1536249999772
- },
- {
- "questionId": "q156",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "76",
- "actual": "41",
- "isCorrect": false,
- "inputTokens": 15188,
- "outputTokens": 8263,
- "latencyMs": 60899.858959000034
- },
- {
- "questionId": "q156",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "76",
- "actual": "100",
- "isCorrect": false,
- "inputTokens": 17408,
- "outputTokens": 5,
- "latencyMs": 1350.1540420000092
- },
- {
- "questionId": "q156",
+ "questionId": "q154",
"format": "toon",
- "model": "gpt-5-nano",
- "expected": "76",
- "actual": "76",
- "isCorrect": true,
- "inputTokens": 8789,
- "outputTokens": 3847,
- "latencyMs": 30491.779582999996
- },
- {
- "questionId": "q156",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "76",
- "actual": "100",
- "isCorrect": false,
- "inputTokens": 9278,
- "outputTokens": 5,
- "latencyMs": 1513.2665410000482
- },
- {
- "questionId": "q156",
- "format": "csv",
- "model": "gpt-5-nano",
- "expected": "76",
- "actual": "76",
- "isCorrect": true,
- "inputTokens": 8557,
- "outputTokens": 3847,
- "latencyMs": 25522.397125000018
- },
- {
- "questionId": "q156",
- "format": "csv",
- "model": "claude-haiku-4-5",
- "expected": "76",
- "actual": "100",
- "isCorrect": false,
- "inputTokens": 9124,
- "outputTokens": 5,
- "latencyMs": 1150.7281660000444
- },
- {
- "questionId": "q156",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "76",
- "actual": "76",
- "isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 2631,
- "latencyMs": 22525.465083000017
- },
- {
- "questionId": "q156",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "76",
- "actual": "100",
- "isCorrect": false,
- "inputTokens": 15366,
- "outputTokens": 5,
- "latencyMs": 1438.5829169999924
- },
- {
- "questionId": "q156",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "76",
- "actual": "62",
- "isCorrect": false,
- "inputTokens": 13172,
- "outputTokens": 1351,
- "latencyMs": 11162.623291999975
- },
- {
- "questionId": "q156",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "76",
- "actual": "100",
- "isCorrect": false,
- "inputTokens": 14482,
- "outputTokens": 5,
- "latencyMs": 1305.162249999994
- },
- {
- "questionId": "q157",
- "format": "json",
- "model": "gpt-5-nano",
+ "model": "gemini-2.5-flash",
"expected": "100",
- "actual": "129",
+ "actual": "50",
"isCorrect": false,
- "inputTokens": 15188,
- "outputTokens": 6599,
- "latencyMs": 49590.68900000001
+ "inputTokens": 12340,
+ "outputTokens": 2,
+ "latencyMs": 2061.19062499993
},
{
- "questionId": "q157",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "89",
- "isCorrect": false,
- "inputTokens": 17409,
- "outputTokens": 5,
- "latencyMs": 1750.9506249999977
- },
- {
- "questionId": "q157",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "100",
- "isCorrect": true,
- "inputTokens": 8789,
- "outputTokens": 8903,
- "latencyMs": 68556.36550000001
- },
- {
- "questionId": "q157",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "73",
- "isCorrect": false,
- "inputTokens": 9279,
- "outputTokens": 5,
- "latencyMs": 1148.3701669999864
- },
- {
- "questionId": "q157",
+ "questionId": "q154",
"format": "csv",
"model": "gpt-5-nano",
"expected": "100",
@@ -17212,7 +25319,502 @@
"isCorrect": true,
"inputTokens": 8557,
"outputTokens": 3271,
- "latencyMs": 36128.254709
+ "latencyMs": 29091.357792000053
+ },
+ {
+ "questionId": "q154",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 9124,
+ "outputTokens": 5,
+ "latencyMs": 1029.3966670000227
+ },
+ {
+ "questionId": "q154",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "59",
+ "isCorrect": false,
+ "inputTokens": 12210,
+ "outputTokens": 2,
+ "latencyMs": 2304.6412080000155
+ },
+ {
+ "questionId": "q154",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "88",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 2375,
+ "latencyMs": 25588.054458
+ },
+ {
+ "questionId": "q154",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 19803,
+ "outputTokens": 5,
+ "latencyMs": 1378.1570839999476
+ },
+ {
+ "questionId": "q154",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 21884,
+ "outputTokens": 3,
+ "latencyMs": 28098.016750000068
+ },
+ {
+ "questionId": "q154",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "88",
+ "isCorrect": false,
+ "inputTokens": 13172,
+ "outputTokens": 4359,
+ "latencyMs": 47106.68116699997
+ },
+ {
+ "questionId": "q154",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 14482,
+ "outputTokens": 5,
+ "latencyMs": 2077.1985829999903
+ },
+ {
+ "questionId": "q154",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "50",
+ "isCorrect": false,
+ "inputTokens": 17079,
+ "outputTokens": 2,
+ "latencyMs": 1049.9515410000458
+ },
+ {
+ "questionId": "q155",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 15188,
+ "outputTokens": 5639,
+ "latencyMs": 52034.31104199996
+ },
+ {
+ "questionId": "q155",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "71",
+ "isCorrect": false,
+ "inputTokens": 17408,
+ "outputTokens": 5,
+ "latencyMs": 1774.2209169999696
+ },
+ {
+ "questionId": "q155",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 19994,
+ "outputTokens": 2,
+ "latencyMs": 1397.8998329999158
+ },
+ {
+ "questionId": "q155",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 8789,
+ "outputTokens": 2823,
+ "latencyMs": 26509.484792000032
+ },
+ {
+ "questionId": "q155",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 9278,
+ "outputTokens": 5,
+ "latencyMs": 1028.7182500000345
+ },
+ {
+ "questionId": "q155",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 12340,
+ "outputTokens": 3,
+ "latencyMs": 21919.32149999996
+ },
+ {
+ "questionId": "q155",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 8557,
+ "outputTokens": 2631,
+ "latencyMs": 32920.081041999976
+ },
+ {
+ "questionId": "q155",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "47",
+ "isCorrect": false,
+ "inputTokens": 9124,
+ "outputTokens": 5,
+ "latencyMs": 1246.9641250000568
+ },
+ {
+ "questionId": "q155",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 12210,
+ "outputTokens": 3,
+ "latencyMs": 17704.908124999958
+ },
+ {
+ "questionId": "q155",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "79",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 4359,
+ "latencyMs": 36706.952500000014
+ },
+ {
+ "questionId": "q155",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "50",
+ "isCorrect": false,
+ "inputTokens": 19803,
+ "outputTokens": 5,
+ "latencyMs": 1653.922874999931
+ },
+ {
+ "questionId": "q155",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 21884,
+ "outputTokens": 3,
+ "latencyMs": 18907.825375000015
+ },
+ {
+ "questionId": "q155",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "88",
+ "isCorrect": false,
+ "inputTokens": 13172,
+ "outputTokens": 2567,
+ "latencyMs": 29826.266333999927
+ },
+ {
+ "questionId": "q155",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "71",
+ "isCorrect": false,
+ "inputTokens": 14482,
+ "outputTokens": 5,
+ "latencyMs": 1877.8078329999698
+ },
+ {
+ "questionId": "q155",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 17079,
+ "outputTokens": 2,
+ "latencyMs": 1709.5576250000158
+ },
+ {
+ "questionId": "q156",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "76",
+ "actual": "61",
+ "isCorrect": false,
+ "inputTokens": 15188,
+ "outputTokens": 3015,
+ "latencyMs": 27373.73904200003
+ },
+ {
+ "questionId": "q156",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "76",
+ "actual": "100",
+ "isCorrect": false,
+ "inputTokens": 17408,
+ "outputTokens": 5,
+ "latencyMs": 2553.873874999932
+ },
+ {
+ "questionId": "q156",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "76",
+ "actual": "50",
+ "isCorrect": false,
+ "inputTokens": 19995,
+ "outputTokens": 2,
+ "latencyMs": 1292.7788750000764
+ },
+ {
+ "questionId": "q156",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "76",
+ "actual": "76",
+ "isCorrect": true,
+ "inputTokens": 8789,
+ "outputTokens": 3911,
+ "latencyMs": 38466.93025000009
+ },
+ {
+ "questionId": "q156",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "76",
+ "actual": "100",
+ "isCorrect": false,
+ "inputTokens": 9278,
+ "outputTokens": 5,
+ "latencyMs": 1207.3981249999488
+ },
+ {
+ "questionId": "q156",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "76",
+ "actual": "76",
+ "isCorrect": true,
+ "inputTokens": 12341,
+ "outputTokens": 2,
+ "latencyMs": 21904.33095799992
+ },
+ {
+ "questionId": "q156",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "76",
+ "actual": "75",
+ "isCorrect": false,
+ "inputTokens": 8557,
+ "outputTokens": 2951,
+ "latencyMs": 38943.062832999974
+ },
+ {
+ "questionId": "q156",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "76",
+ "actual": "100",
+ "isCorrect": false,
+ "inputTokens": 9124,
+ "outputTokens": 5,
+ "latencyMs": 1096.0891670000274
+ },
+ {
+ "questionId": "q156",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "76",
+ "actual": "76",
+ "isCorrect": true,
+ "inputTokens": 12211,
+ "outputTokens": 2,
+ "latencyMs": 16468.647499999963
+ },
+ {
+ "questionId": "q156",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "76",
+ "actual": "64",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 1863,
+ "latencyMs": 18473.753917000024
+ },
+ {
+ "questionId": "q156",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "76",
+ "actual": "100",
+ "isCorrect": false,
+ "inputTokens": 19803,
+ "outputTokens": 5,
+ "latencyMs": 1316.2989590000361
+ },
+ {
+ "questionId": "q156",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "76",
+ "actual": "47",
+ "isCorrect": false,
+ "inputTokens": 21885,
+ "outputTokens": 2,
+ "latencyMs": 1786.060832999996
+ },
+ {
+ "questionId": "q156",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "76",
+ "actual": "72",
+ "isCorrect": false,
+ "inputTokens": 13172,
+ "outputTokens": 8711,
+ "latencyMs": 86456.99716699996
+ },
+ {
+ "questionId": "q156",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "76",
+ "actual": "100",
+ "isCorrect": false,
+ "inputTokens": 14482,
+ "outputTokens": 5,
+ "latencyMs": 1337.9467500000028
+ },
+ {
+ "questionId": "q156",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "76",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 17080,
+ "outputTokens": 2,
+ "latencyMs": 1272.1261659999145
+ },
+ {
+ "questionId": "q157",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "139",
+ "isCorrect": false,
+ "inputTokens": 15188,
+ "outputTokens": 8199,
+ "latencyMs": 117751.80679199996
+ },
+ {
+ "questionId": "q157",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "89",
+ "isCorrect": false,
+ "inputTokens": 17409,
+ "outputTokens": 5,
+ "latencyMs": 6994.20404099999
+ },
+ {
+ "questionId": "q157",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 19993,
+ "outputTokens": 2,
+ "latencyMs": 1664.0891249999404
+ },
+ {
+ "questionId": "q157",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 8789,
+ "outputTokens": 4103,
+ "latencyMs": 33535.55912499991
+ },
+ {
+ "questionId": "q157",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "73",
+ "isCorrect": false,
+ "inputTokens": 9279,
+ "outputTokens": 5,
+ "latencyMs": 1228.1867499999935
+ },
+ {
+ "questionId": "q157",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 12339,
+ "outputTokens": 2,
+ "latencyMs": 1517.6247079999885
+ },
+ {
+ "questionId": "q157",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "87",
+ "isCorrect": false,
+ "inputTokens": 8557,
+ "outputTokens": 3079,
+ "latencyMs": 27126.57024999999
},
{
"questionId": "q157",
@@ -17223,172 +25825,260 @@
"isCorrect": false,
"inputTokens": 9125,
"outputTokens": 5,
- "latencyMs": 1137.2578750000102
+ "latencyMs": 949.5018749999581
},
{
"questionId": "q157",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "79",
- "isCorrect": false,
- "inputTokens": 15482,
- "outputTokens": 3527,
- "latencyMs": 35526.23958300002
- },
- {
- "questionId": "q157",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "95",
- "isCorrect": false,
- "inputTokens": 15367,
- "outputTokens": 5,
- "latencyMs": 1501.6561670000083
- },
- {
- "questionId": "q157",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "100",
- "actual": "99",
- "isCorrect": false,
- "inputTokens": 13172,
- "outputTokens": 3143,
- "latencyMs": 26700.229333000025
- },
- {
- "questionId": "q157",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "100",
- "actual": "95",
- "isCorrect": false,
- "inputTokens": 14483,
- "outputTokens": 5,
- "latencyMs": 1159.0904580000206
- },
- {
- "questionId": "q158",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "95",
- "actual": "94",
- "isCorrect": false,
- "inputTokens": 15188,
- "outputTokens": 4999,
- "latencyMs": 32710.407750000013
- },
- {
- "questionId": "q158",
- "format": "json",
- "model": "claude-haiku-4-5",
- "expected": "95",
- "actual": "42",
- "isCorrect": false,
- "inputTokens": 17409,
- "outputTokens": 5,
- "latencyMs": 1451.6710420000018
- },
- {
- "questionId": "q158",
- "format": "toon",
- "model": "gpt-5-nano",
- "expected": "95",
- "actual": "82",
- "isCorrect": false,
- "inputTokens": 8789,
- "outputTokens": 3143,
- "latencyMs": 18360.73424999998
- },
- {
- "questionId": "q158",
- "format": "toon",
- "model": "claude-haiku-4-5",
- "expected": "95",
- "actual": "42",
- "isCorrect": false,
- "inputTokens": 9279,
- "outputTokens": 5,
- "latencyMs": 1035.2159160000156
- },
- {
- "questionId": "q158",
"format": "csv",
- "model": "gpt-5-nano",
- "expected": "95",
- "actual": "95",
- "isCorrect": true,
- "inputTokens": 8557,
- "outputTokens": 4487,
- "latencyMs": 28020.044915999984
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 12209,
+ "outputTokens": 2,
+ "latencyMs": 2366.7855419999687
},
{
- "questionId": "q158",
- "format": "csv",
+ "questionId": "q157",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "69",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 2183,
+ "latencyMs": 35555.629874999984
+ },
+ {
+ "questionId": "q157",
+ "format": "xml",
"model": "claude-haiku-4-5",
- "expected": "95",
- "actual": "42",
- "isCorrect": false,
- "inputTokens": 9125,
- "outputTokens": 5,
- "latencyMs": 1175.8671249999898
- },
- {
- "questionId": "q158",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
- "expected": "95",
- "actual": "77",
- "isCorrect": false,
- "inputTokens": 15482,
- "outputTokens": 2887,
- "latencyMs": 24031.185459
- },
- {
- "questionId": "q158",
- "format": "markdown-kv",
- "model": "claude-haiku-4-5",
- "expected": "95",
- "actual": "47",
- "isCorrect": false,
- "inputTokens": 15367,
- "outputTokens": 5,
- "latencyMs": 1724.9393750000163
- },
- {
- "questionId": "q158",
- "format": "yaml",
- "model": "gpt-5-nano",
- "expected": "95",
- "actual": "81",
- "isCorrect": false,
- "inputTokens": 13172,
- "outputTokens": 4359,
- "latencyMs": 35723.19641699997
- },
- {
- "questionId": "q158",
- "format": "yaml",
- "model": "claude-haiku-4-5",
- "expected": "95",
- "actual": "47",
- "isCorrect": false,
- "inputTokens": 14483,
- "outputTokens": 5,
- "latencyMs": 1663.259167000011
- },
- {
- "questionId": "q159",
- "format": "json",
- "model": "gpt-5-nano",
- "expected": "83",
+ "expected": "100",
"actual": "71",
"isCorrect": false,
+ "inputTokens": 19804,
+ "outputTokens": 5,
+ "latencyMs": 1865.6005420000292
+ },
+ {
+ "questionId": "q157",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 21883,
+ "outputTokens": 3,
+ "latencyMs": 22966.85654200008
+ },
+ {
+ "questionId": "q157",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "100",
+ "actual": "100",
+ "isCorrect": true,
+ "inputTokens": 13172,
+ "outputTokens": 2503,
+ "latencyMs": 23299.811666000052
+ },
+ {
+ "questionId": "q157",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "100",
+ "actual": "95",
+ "isCorrect": false,
+ "inputTokens": 14483,
+ "outputTokens": 5,
+ "latencyMs": 1111.9951249998994
+ },
+ {
+ "questionId": "q157",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "100",
+ "actual": "50",
+ "isCorrect": false,
+ "inputTokens": 17078,
+ "outputTokens": 2,
+ "latencyMs": 1229.8220420000143
+ },
+ {
+ "questionId": "q158",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "95",
+ "actual": "60",
+ "isCorrect": false,
"inputTokens": 15188,
"outputTokens": 2439,
- "latencyMs": 18168.518166999973
+ "latencyMs": 23952.90112500009
+ },
+ {
+ "questionId": "q158",
+ "format": "json",
+ "model": "claude-haiku-4-5",
+ "expected": "95",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 17409,
+ "outputTokens": 5,
+ "latencyMs": 2635.0509999999776
+ },
+ {
+ "questionId": "q158",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "95",
+ "actual": "59",
+ "isCorrect": false,
+ "inputTokens": 19993,
+ "outputTokens": 2,
+ "latencyMs": 1382.6497909999453
+ },
+ {
+ "questionId": "q158",
+ "format": "toon",
+ "model": "gpt-5-nano",
+ "expected": "95",
+ "actual": "95",
+ "isCorrect": true,
+ "inputTokens": 8789,
+ "outputTokens": 5255,
+ "latencyMs": 52427.638499999885
+ },
+ {
+ "questionId": "q158",
+ "format": "toon",
+ "model": "claude-haiku-4-5",
+ "expected": "95",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 9279,
+ "outputTokens": 5,
+ "latencyMs": 1752.1665410000132
+ },
+ {
+ "questionId": "q158",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "95",
+ "actual": "95",
+ "isCorrect": true,
+ "inputTokens": 12339,
+ "outputTokens": 2,
+ "latencyMs": 30665.240666999947
+ },
+ {
+ "questionId": "q158",
+ "format": "csv",
+ "model": "gpt-5-nano",
+ "expected": "95",
+ "actual": "96",
+ "isCorrect": false,
+ "inputTokens": 8557,
+ "outputTokens": 4999,
+ "latencyMs": 52545.94787500007
+ },
+ {
+ "questionId": "q158",
+ "format": "csv",
+ "model": "claude-haiku-4-5",
+ "expected": "95",
+ "actual": "42",
+ "isCorrect": false,
+ "inputTokens": 9125,
+ "outputTokens": 5,
+ "latencyMs": 1330.860624999972
+ },
+ {
+ "questionId": "q158",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
+ "expected": "95",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 12209,
+ "outputTokens": 2,
+ "latencyMs": 2559.635125000146
+ },
+ {
+ "questionId": "q158",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "95",
+ "actual": "96",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 13447,
+ "latencyMs": 177292.60950000002
+ },
+ {
+ "questionId": "q158",
+ "format": "xml",
+ "model": "claude-haiku-4-5",
+ "expected": "95",
+ "actual": "32",
+ "isCorrect": false,
+ "inputTokens": 19804,
+ "outputTokens": 5,
+ "latencyMs": 1816.5423749999609
+ },
+ {
+ "questionId": "q158",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "95",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 21883,
+ "outputTokens": 2,
+ "latencyMs": 3004.8347500001546
+ },
+ {
+ "questionId": "q158",
+ "format": "yaml",
+ "model": "gpt-5-nano",
+ "expected": "95",
+ "actual": "96",
+ "isCorrect": false,
+ "inputTokens": 13172,
+ "outputTokens": 3975,
+ "latencyMs": 42573.26512499992
+ },
+ {
+ "questionId": "q158",
+ "format": "yaml",
+ "model": "claude-haiku-4-5",
+ "expected": "95",
+ "actual": "47",
+ "isCorrect": false,
+ "inputTokens": 14483,
+ "outputTokens": 5,
+ "latencyMs": 1499.2267080000602
+ },
+ {
+ "questionId": "q158",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "95",
+ "actual": "60",
+ "isCorrect": false,
+ "inputTokens": 17078,
+ "outputTokens": 2,
+ "latencyMs": 1173.8084579999559
+ },
+ {
+ "questionId": "q159",
+ "format": "json",
+ "model": "gpt-5-nano",
+ "expected": "83",
+ "actual": "50",
+ "isCorrect": false,
+ "inputTokens": 15188,
+ "outputTokens": 11719,
+ "latencyMs": 109516.51062500011
},
{
"questionId": "q159",
@@ -17399,18 +26089,29 @@
"isCorrect": false,
"inputTokens": 17409,
"outputTokens": 5,
- "latencyMs": 1390.1757499999949
+ "latencyMs": 1886.0561250001192
+ },
+ {
+ "questionId": "q159",
+ "format": "json",
+ "model": "gemini-2.5-flash",
+ "expected": "83",
+ "actual": "59",
+ "isCorrect": false,
+ "inputTokens": 19994,
+ "outputTokens": 2,
+ "latencyMs": 2211.6038330001757
},
{
"questionId": "q159",
"format": "toon",
"model": "gpt-5-nano",
"expected": "83",
- "actual": "57",
- "isCorrect": false,
+ "actual": "83",
+ "isCorrect": true,
"inputTokens": 8789,
- "outputTokens": 4423,
- "latencyMs": 41240.42016700003
+ "outputTokens": 3463,
+ "latencyMs": 36709.80866700015
},
{
"questionId": "q159",
@@ -17421,7 +26122,18 @@
"isCorrect": false,
"inputTokens": 9279,
"outputTokens": 5,
- "latencyMs": 1066.675458999991
+ "latencyMs": 1961.9631250000093
+ },
+ {
+ "questionId": "q159",
+ "format": "toon",
+ "model": "gemini-2.5-flash",
+ "expected": "83",
+ "actual": "83",
+ "isCorrect": true,
+ "inputTokens": 12340,
+ "outputTokens": 2,
+ "latencyMs": 18972.830374999903
},
{
"questionId": "q159",
@@ -17431,8 +26143,8 @@
"actual": "83",
"isCorrect": true,
"inputTokens": 8557,
- "outputTokens": 5831,
- "latencyMs": 40638.93858400005
+ "outputTokens": 6919,
+ "latencyMs": 69083.2129579999
},
{
"questionId": "q159",
@@ -17443,40 +26155,62 @@
"isCorrect": false,
"inputTokens": 9125,
"outputTokens": 5,
- "latencyMs": 1394.1952499999898
+ "latencyMs": 1200.284708000021
},
{
"questionId": "q159",
- "format": "markdown-kv",
- "model": "gpt-5-nano",
+ "format": "csv",
+ "model": "gemini-2.5-flash",
"expected": "83",
"actual": "83",
"isCorrect": true,
- "inputTokens": 15482,
- "outputTokens": 3591,
- "latencyMs": 25356.36183400004
+ "inputTokens": 12210,
+ "outputTokens": 2,
+ "latencyMs": 33046.47866699984
},
{
"questionId": "q159",
- "format": "markdown-kv",
+ "format": "xml",
+ "model": "gpt-5-nano",
+ "expected": "83",
+ "actual": "112",
+ "isCorrect": false,
+ "inputTokens": 17139,
+ "outputTokens": 6535,
+ "latencyMs": 62622.555124999955
+ },
+ {
+ "questionId": "q159",
+ "format": "xml",
"model": "claude-haiku-4-5",
"expected": "83",
- "actual": "71",
+ "actual": "47",
"isCorrect": false,
- "inputTokens": 15367,
+ "inputTokens": 19804,
"outputTokens": 5,
- "latencyMs": 1238.0827089999802
+ "latencyMs": 1500.2770829999354
+ },
+ {
+ "questionId": "q159",
+ "format": "xml",
+ "model": "gemini-2.5-flash",
+ "expected": "83",
+ "actual": "49",
+ "isCorrect": false,
+ "inputTokens": 21884,
+ "outputTokens": 2,
+ "latencyMs": 2811.6203749999404
},
{
"questionId": "q159",
"format": "yaml",
"model": "gpt-5-nano",
"expected": "83",
- "actual": "72",
+ "actual": "90",
"isCorrect": false,
"inputTokens": 13172,
- "outputTokens": 2567,
- "latencyMs": 25124.520583999984
+ "outputTokens": 25095,
+ "latencyMs": 237521.54700000002
},
{
"questionId": "q159",
@@ -17487,6 +26221,17 @@
"isCorrect": false,
"inputTokens": 14483,
"outputTokens": 5,
- "latencyMs": 2058.834957999992
+ "latencyMs": 1567.613791000098
+ },
+ {
+ "questionId": "q159",
+ "format": "yaml",
+ "model": "gemini-2.5-flash",
+ "expected": "83",
+ "actual": "49",
+ "isCorrect": false,
+ "inputTokens": 17079,
+ "outputTokens": 2,
+ "latencyMs": 1373.2515409998596
}
]
diff --git a/benchmarks/results/accuracy/report.md b/benchmarks/results/accuracy/report.md
index eccae66..9cb96ae 100644
--- a/benchmarks/results/accuracy/report.md
+++ b/benchmarks/results/accuracy/report.md
@@ -1,24 +1,31 @@
### Retrieval Accuracy
-Tested across **2 LLMs** with data retrieval tasks:
+Tested across **3 LLMs** with data retrieval tasks:
```
gpt-5-nano
- toon ███████████████████░ 97.5% (155/159)
- markdown-kv ███████████████████░ 95.6% (152/159)
- yaml ███████████████████░ 94.3% (150/159)
- json ███████████████████░ 93.7% (149/159)
- csv ███████████████████░ 93.7% (149/159)
+ toon ████████████████████ 99.4% (158/159)
+ yaml ███████████████████░ 95.0% (151/159)
+ csv ██████████████████░░ 92.5% (147/159)
+ json ██████████████████░░ 92.5% (147/159)
+ xml ██████████████████░░ 91.2% (145/159)
claude-haiku-4-5
- markdown-kv ███████████████░░░░░ 76.7% (122/159)
toon ███████████████░░░░░ 75.5% (120/159)
- json ███████████████░░░░░ 75.5% (120/159)
+ xml ███████████████░░░░░ 75.5% (120/159)
csv ███████████████░░░░░ 75.5% (120/159)
- yaml ███████████████░░░░░ 74.8% (119/159)
+ json ███████████████░░░░░ 75.5% (120/159)
+ yaml ███████████████░░░░░ 74.2% (118/159)
+
+gemini-2.5-flash
+ xml ██████████████████░░ 91.8% (146/159)
+ csv █████████████████░░░ 86.2% (137/159)
+ toon █████████████████░░░ 84.9% (135/159)
+ json ████████████████░░░░ 81.8% (130/159)
+ yaml ████████████████░░░░ 78.6% (125/159)
```
-**Advantage:** TOON achieves **86.5% accuracy** (vs JSON's 84.6%) while using **46.3% fewer tokens**.
+**Advantage:** TOON achieves **86.6% accuracy** (vs JSON's 83.2%) while using **46.3% fewer tokens**.
View detailed breakdown by dataset and model
@@ -29,41 +36,41 @@ claude-haiku-4-5
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `toon` | 86.2% | 2.483 | 100/116 |
-| `csv` | 80.2% | 2.337 | 93/116 |
-| `yaml` | 82.8% | 4.969 | 96/116 |
-| `markdown-kv` | 84.5% | 6.270 | 98/116 |
-| `json` | 84.5% | 6.347 | 98/116 |
+| `toon` | 87.4% | 2.483 | 152/174 |
+| `csv` | 82.8% | 2.337 | 144/174 |
+| `yaml` | 83.9% | 4.969 | 146/174 |
+| `json` | 83.9% | 6.347 | 146/174 |
+| `xml` | 88.5% | 7.314 | 154/174 |
##### E-commerce orders with nested structures
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `toon` | 90.9% | 5.967 | 80/88 |
-| `csv` | 90.9% | 6.735 | 80/88 |
-| `yaml` | 89.8% | 7.328 | 79/88 |
-| `markdown-kv` | 90.9% | 9.110 | 80/88 |
-| `json` | 89.8% | 9.694 | 79/88 |
+| `toon` | 90.9% | 5.967 | 120/132 |
+| `csv` | 93.9% | 6.735 | 124/132 |
+| `yaml` | 87.1% | 7.328 | 115/132 |
+| `json` | 87.9% | 9.694 | 116/132 |
+| `xml` | 93.2% | 10.992 | 123/132 |
##### Time-series analytics data
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `csv` | 87.9% | 1.393 | 51/58 |
-| `toon` | 86.2% | 1.515 | 50/58 |
-| `yaml` | 86.2% | 2.938 | 50/58 |
-| `json` | 87.9% | 3.665 | 51/58 |
-| `markdown-kv` | 86.2% | 3.779 | 50/58 |
+| `csv` | 89.7% | 1.393 | 78/87 |
+| `toon` | 88.5% | 1.515 | 77/87 |
+| `yaml` | 83.9% | 2.938 | 73/87 |
+| `json` | 88.5% | 3.665 | 77/87 |
+| `xml` | 85.1% | 4.376 | 74/87 |
##### Top 100 GitHub repositories
| Format | Accuracy | Tokens | Correct/Total |
| ------ | -------- | ------ | ------------- |
-| `csv` | 80.4% | 8.513 | 45/56 |
-| `toon` | 80.4% | 8.745 | 45/56 |
-| `yaml` | 78.6% | 13.129 | 44/56 |
-| `markdown-kv` | 82.1% | 15.436 | 46/56 |
-| `json` | 73.2% | 15.145 | 41/56 |
+| `toon` | 76.2% | 8.745 | 64/84 |
+| `csv` | 69.0% | 8.513 | 58/84 |
+| `yaml` | 71.4% | 13.129 | 60/84 |
+| `json` | 69.0% | 15.145 | 58/84 |
+| `xml` | 71.4% | 17.095 | 60/84 |
#### Performance by Model
@@ -71,27 +78,37 @@ claude-haiku-4-5
| Format | Accuracy | Correct/Total |
| ------ | -------- | ------------- |
-| `toon` | 97.5% | 155/159 |
-| `markdown-kv` | 95.6% | 152/159 |
-| `yaml` | 94.3% | 150/159 |
-| `json` | 93.7% | 149/159 |
-| `csv` | 93.7% | 149/159 |
+| `toon` | 99.4% | 158/159 |
+| `yaml` | 95.0% | 151/159 |
+| `csv` | 92.5% | 147/159 |
+| `json` | 92.5% | 147/159 |
+| `xml` | 91.2% | 145/159 |
##### claude-haiku-4-5
| Format | Accuracy | Correct/Total |
| ------ | -------- | ------------- |
-| `markdown-kv` | 76.7% | 122/159 |
| `toon` | 75.5% | 120/159 |
-| `json` | 75.5% | 120/159 |
+| `xml` | 75.5% | 120/159 |
| `csv` | 75.5% | 120/159 |
-| `yaml` | 74.8% | 119/159 |
+| `json` | 75.5% | 120/159 |
+| `yaml` | 74.2% | 118/159 |
+
+##### gemini-2.5-flash
+
+| Format | Accuracy | Correct/Total |
+| ------ | -------- | ------------- |
+| `xml` | 91.8% | 146/159 |
+| `csv` | 86.2% | 137/159 |
+| `toon` | 84.9% | 135/159 |
+| `json` | 81.8% | 130/159 |
+| `yaml` | 78.6% | 125/159 |
#### Methodology
- **Semantic validation**: LLM-as-judge validates responses semantically (not exact string matching).
- **Token counting**: Using `gpt-tokenizer` with `o200k_base` encoding.
-- **Question types**: Field retrieval, aggregation, and filtering tasks.
-- **Real data**: Faker.js-generated datasets + GitHub repositories.
+- **Question types**: ~160 questions across field retrieval, aggregation, and filtering tasks.
+- **Datasets**: Faker.js-generated datasets (seeded) + GitHub repositories.
diff --git a/benchmarks/results/accuracy/summary.json b/benchmarks/results/accuracy/summary.json
index abbcc06..688a296 100644
--- a/benchmarks/results/accuracy/summary.json
+++ b/benchmarks/results/accuracy/summary.json
@@ -2,49 +2,50 @@
"formatResults": [
{
"format": "toon",
- "accuracy": 0.8647798742138365,
+ "accuracy": 0.8658280922431866,
"totalTokens": 4678,
- "averageLatency": 5016,
- "correctCount": 275,
- "totalCount": 318
+ "averageLatency": 5321,
+ "correctCount": 413,
+ "totalCount": 477
},
{
- "format": "markdown-kv",
+ "format": "xml",
"accuracy": 0.8616352201257862,
- "totalTokens": 8649,
- "averageLatency": 4628,
- "correctCount": 274,
- "totalCount": 318
- },
- {
- "format": "json",
- "accuracy": 0.8459119496855346,
- "totalTokens": 8713,
- "averageLatency": 5369,
- "correctCount": 269,
- "totalCount": 318
+ "totalTokens": 9944,
+ "averageLatency": 6035,
+ "correctCount": 411,
+ "totalCount": 477
},
{
"format": "csv",
- "accuracy": 0.8459119496855346,
+ "accuracy": 0.8469601677148847,
"totalTokens": 4745,
- "averageLatency": 5168,
- "correctCount": 269,
- "totalCount": 318
+ "averageLatency": 6551,
+ "correctCount": 404,
+ "totalCount": 477
+ },
+ {
+ "format": "json",
+ "accuracy": 0.8322851153039832,
+ "totalTokens": 8713,
+ "averageLatency": 7981,
+ "correctCount": 397,
+ "totalCount": 477
},
{
"format": "yaml",
- "accuracy": 0.8459119496855346,
+ "accuracy": 0.8259958071278826,
"totalTokens": 7091,
- "averageLatency": 4299,
- "correctCount": 269,
- "totalCount": 318
+ "averageLatency": 5561,
+ "correctCount": 394,
+ "totalCount": 477
}
],
"questions": 159,
"models": [
"gpt-5-nano",
- "claude-haiku-4-5"
+ "claude-haiku-4-5",
+ "gemini-2.5-flash"
],
"datasets": [
{
@@ -77,14 +78,14 @@
"csv-nested": 6735,
"csv-analytics": 1393,
"csv-github": 8513,
- "markdown-kv-tabular": 6270,
- "markdown-kv-nested": 9110,
- "markdown-kv-analytics": 3779,
- "markdown-kv-github": 15436,
+ "xml-tabular": 7314,
+ "xml-nested": 10992,
+ "xml-analytics": 4376,
+ "xml-github": 17095,
"yaml-tabular": 4969,
"yaml-nested": 7328,
"yaml-analytics": 2938,
"yaml-github": 13129
},
- "timestamp": "2025-10-27T13:17:28.071Z"
+ "timestamp": "2025-10-27T15:01:57.523Z"
}
diff --git a/benchmarks/results/token-efficiency.md b/benchmarks/results/token-efficiency.md
index 85fb74d..8c382e8 100644
--- a/benchmarks/results/token-efficiency.md
+++ b/benchmarks/results/token-efficiency.md
@@ -1,13 +1,23 @@
### Token Efficiency
```
-⭐ GitHub Repositories ██████████████░░░░░░░░░░░ 8,745 tokens (JSON: 15,145) 💰 42.3% saved
-📈 Daily Analytics ██████████░░░░░░░░░░░░░░░ 3,630 tokens (JSON: 9,023) 💰 59.8% saved
-👥 API Response ██████████████░░░░░░░░░░░ 2,597 tokens (JSON: 4,589) 💰 43.4% saved
-🛒 E-Commerce Order ████████████████░░░░░░░░░ 164 tokens (JSON: 256) 💰 35.9% saved
-```
+⭐ GitHub Repositories ██████████████░░░░░░░░░░░ 8,745 tokens
+ vs JSON: 15,145 💰 42.3% saved
+ vs XML: 17,095 💰 48.8% saved
-**Total:** 15,136 tokens (TOON) vs 29,013 tokens (JSON) → 47.8% savings
+📈 Daily Analytics ██████████░░░░░░░░░░░░░░░ 4,507 tokens
+ vs JSON: 10,977 💰 58.9% saved
+ vs XML: 13,128 💰 65.7% saved
+
+🛒 E-Commerce Order ████████████████░░░░░░░░░ 166 tokens
+ vs JSON: 257 💰 35.4% saved
+ vs XML: 271 💰 38.7% saved
+
+─────────────────────────────────────────────────────────────────────
+Total ████████████░░░░░░░░░░░░░ 13,418 tokens
+ vs JSON: 26,379 💰 49.1% saved
+ vs XML: 30,494 💰 56.0% saved
+```
View detailed examples
@@ -16,7 +26,7 @@
**Configuration:** Top 100 GitHub repositories with stars, forks, and metadata
-**Savings:** 6,400 tokens (42.3% reduction)
+**Savings:** 6,400 tokens (42.3% reduction vs JSON)
**JSON** (15,145 tokens):
@@ -27,7 +37,7 @@
"id": 28457823,
"name": "freeCodeCamp",
"repo": "freeCodeCamp/freeCodeCamp",
- "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,...",
+ "description": "freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…",
"createdAt": "2014-12-24T17:49:19Z",
"updatedAt": "2025-10-27T07:40:58Z",
"pushedAt": "2025-10-26T11:31:08Z",
@@ -70,7 +80,7 @@
```
repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watchers,forks,defaultBranch}:
- 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,...","2014-12-24T17:49:19Z","2025-10-27T07:40:58Z","2025-10-26T11:31:08Z",430828,8582,42136,main
+ 28457823,freeCodeCamp,freeCodeCamp/freeCodeCamp,"freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming,…","2014-12-24T17:49:19Z","2025-10-27T07:40:58Z","2025-10-26T11:31:08Z",430828,8582,42136,main
132750724,build-your-own-x,codecrafters-io/build-your-own-x,Master programming by recreating your favorite technologies from scratch.,"2018-05-09T12:03:18Z","2025-10-27T07:43:25Z","2025-10-10T18:45:01Z",430102,6322,40388,master
21737465,awesome,sindresorhus/awesome,😎 Awesome lists about all kinds of interesting topics,"2014-07-11T13:42:37Z","2025-10-27T07:44:27Z","2025-10-23T17:26:53Z",409760,8016,32015,main
```
@@ -81,61 +91,66 @@ repositories[3]{id,name,repo,description,createdAt,updatedAt,pushedAt,stars,watc
**Configuration:** 180 days of web metrics (views, clicks, conversions, revenue)
-**Savings:** 5,393 tokens (59.8% reduction)
+**Savings:** 6,470 tokens (58.9% reduction vs JSON)
-**JSON** (9,023 tokens):
+**JSON** (10,977 tokens):
```json
{
"metrics": [
- {
- "date": "2024-12-31",
- "views": 1953,
- "clicks": 224,
- "conversions": 60,
- "revenue": 409.79
- },
{
"date": "2025-01-01",
- "views": 2981,
- "clicks": 242,
- "conversions": 109,
- "revenue": 467.73
+ "views": 6890,
+ "clicks": 401,
+ "conversions": 23,
+ "revenue": 6015.59,
+ "bounceRate": 0.63
},
{
"date": "2025-01-02",
- "views": 3842,
- "clicks": 100,
- "conversions": 15,
- "revenue": 569.44
+ "views": 6940,
+ "clicks": 323,
+ "conversions": 37,
+ "revenue": 9086.44,
+ "bounceRate": 0.36
},
{
"date": "2025-01-03",
- "views": 4083,
- "clicks": 161,
- "conversions": 73,
- "revenue": 444.75
+ "views": 4390,
+ "clicks": 346,
+ "conversions": 26,
+ "revenue": 6360.75,
+ "bounceRate": 0.48
},
{
"date": "2025-01-04",
- "views": 5382,
- "clicks": 257,
- "conversions": 63,
- "revenue": 457.28
+ "views": 3429,
+ "clicks": 231,
+ "conversions": 13,
+ "revenue": 2360.96,
+ "bounceRate": 0.65
+ },
+ {
+ "date": "2025-01-05",
+ "views": 5804,
+ "clicks": 186,
+ "conversions": 22,
+ "revenue": 2535.96,
+ "bounceRate": 0.37
}
]
}
```
-**TOON** (3,630 tokens):
+**TOON** (4,507 tokens):
```
-metrics[5]{date,views,clicks,conversions,revenue}:
- 2024-12-31,1953,224,60,409.79
- 2025-01-01,2981,242,109,467.73
- 2025-01-02,3842,100,15,569.44
- 2025-01-03,4083,161,73,444.75
- 2025-01-04,5382,257,63,457.28
+metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
+ 2025-01-01,6890,401,23,6015.59,0.63
+ 2025-01-02,6940,323,37,9086.44,0.36
+ 2025-01-03,4390,346,26,6360.75,0.48
+ 2025-01-04,3429,231,13,2360.96,0.65
+ 2025-01-05,5804,186,22,2535.96,0.37
```
diff --git a/benchmarks/scripts/accuracy-benchmark.ts b/benchmarks/scripts/accuracy-benchmark.ts
index b467c63..1f0e3ab 100644
--- a/benchmarks/scripts/accuracy-benchmark.ts
+++ b/benchmarks/scripts/accuracy-benchmark.ts
@@ -116,6 +116,7 @@ else {
formatName: task.formatName,
formattedData,
model,
+ modelName: task.modelName,
})
// Progress update after task completes
diff --git a/benchmarks/scripts/token-efficiency-benchmark.ts b/benchmarks/scripts/token-efficiency-benchmark.ts
index 85ed1f9..c110a12 100644
--- a/benchmarks/scripts/token-efficiency-benchmark.ts
+++ b/benchmarks/scripts/token-efficiency-benchmark.ts
@@ -7,16 +7,20 @@ import { encode } from '../../src/index'
import githubRepos from '../data/github-repos.json' with { type: 'json' }
import { BENCHMARKS_DIR, ROOT_DIR } from '../src/constants'
import { generateAnalyticsData } from '../src/datasets'
+import { formatters } from '../src/formatters'
interface BenchmarkResult {
name: string
emoji: string
description: string
- data: any
+ data: Record
jsonTokens: number
toonTokens: number
- savings: number
- savingsPercent: string
+ xmlTokens: number
+ jsonSavings: number
+ jsonSavingsPercent: string
+ xmlSavings: number
+ xmlSavingsPercent: string
showDetailed: boolean
}
@@ -37,13 +41,6 @@ const BENCHMARK_EXAMPLES = [
getData: () => generateAnalyticsData(180),
showDetailed: true,
},
- {
- name: 'API Response',
- emoji: '👥',
- description: '50 user records with metadata and timestamps',
- getData: () => generateUsers(50),
- showDetailed: false,
- },
{
name: 'E-Commerce Order',
emoji: '🛒',
@@ -56,6 +53,7 @@ const BENCHMARK_EXAMPLES = [
// Calculate total savings
let totalJsonTokens = 0
let totalToonTokens = 0
+let totalXmlTokens = 0
const results: BenchmarkResult[] = []
@@ -64,14 +62,21 @@ for (const example of BENCHMARK_EXAMPLES) {
const jsonString = JSON.stringify(data, undefined, 2)
const toonString = encode(data)
+ const xmlString = formatters.xml(data)
const jsonTokens = encodeTokens(jsonString).length
const toonTokens = encodeTokens(toonString).length
- const savings = jsonTokens - toonTokens
- const savingsPercent = ((savings / jsonTokens) * 100).toFixed(1)
+ const xmlTokens = encodeTokens(xmlString).length
+
+ const jsonSavings = jsonTokens - toonTokens
+ const jsonSavingsPercent = ((jsonSavings / jsonTokens) * 100).toFixed(1)
+
+ const xmlSavings = xmlTokens - toonTokens
+ const xmlSavingsPercent = ((xmlSavings / xmlTokens) * 100).toFixed(1)
totalJsonTokens += jsonTokens
totalToonTokens += toonTokens
+ totalXmlTokens += xmlTokens
results.push({
name: example.name,
@@ -80,25 +85,51 @@ for (const example of BENCHMARK_EXAMPLES) {
data,
jsonTokens,
toonTokens,
- savings,
- savingsPercent,
+ xmlTokens,
+ jsonSavings,
+ jsonSavingsPercent,
+ xmlSavings,
+ xmlSavingsPercent,
showDetailed: example.showDetailed,
})
}
-const totalSavings = totalJsonTokens - totalToonTokens
-const totalSavingsPercent = ((totalSavings / totalJsonTokens) * 100).toFixed(1)
+const totalJsonSavings = totalJsonTokens - totalToonTokens
+const totalJsonSavingsPercent = ((totalJsonSavings / totalJsonTokens) * 100).toFixed(1)
-// Generate ASCII bar chart visualization
-const barChartSection = results
+const totalXmlSavings = totalXmlTokens - totalToonTokens
+const totalXmlSavingsPercent = ((totalXmlSavings / totalXmlTokens) * 100).toFixed(1)
+
+// Generate ASCII bar chart visualization (stacked compact format)
+const datasetRows = results
.map((result) => {
- const percentage = Number.parseFloat(result.savingsPercent)
+ const percentage = Number.parseFloat(result.jsonSavingsPercent)
const bar = generateBarChart(100 - percentage) // Invert to show TOON tokens
- const jsonStr = result.jsonTokens.toLocaleString('en-US')
const toonStr = result.toonTokens.toLocaleString('en-US')
- return `${result.emoji} ${result.name.padEnd(25)} ${bar} ${toonStr.padStart(6)} tokens (JSON: ${jsonStr.padStart(6)}) 💰 ${result.savingsPercent}% saved`
+ const jsonStr = result.jsonTokens.toLocaleString('en-US')
+ const xmlStr = result.xmlTokens.toLocaleString('en-US')
+
+ const line1 = `${result.emoji} ${result.name.padEnd(25)} ${bar} ${toonStr.padStart(6)} tokens`
+ const line2 = ` vs JSON: ${jsonStr.padStart(6)} 💰 ${result.jsonSavingsPercent}% saved`
+ const line3 = ` vs XML: ${xmlStr.padStart(6)} 💰 ${result.xmlSavingsPercent}% saved`
+
+ return `${line1}\n${line2}\n${line3}`
})
- .join('\n')
+ .join('\n\n')
+
+// Add separator and totals row
+const separator = '─────────────────────────────────────────────────────────────────────'
+
+// Calculate bar for totals (TOON vs average of JSON+XML)
+const averageComparisonTokens = (totalJsonTokens + totalXmlTokens) / 2
+const totalPercentage = (totalToonTokens / averageComparisonTokens) * 100
+const totalBar = generateBarChart(totalPercentage)
+
+const totalLine1 = `Total ${totalBar} ${totalToonTokens.toLocaleString('en-US').padStart(6)} tokens`
+const totalLine2 = ` vs JSON: ${totalJsonTokens.toLocaleString('en-US').padStart(6)} 💰 ${totalJsonSavingsPercent}% saved`
+const totalLine3 = ` vs XML: ${totalXmlTokens.toLocaleString('en-US').padStart(6)} 💰 ${totalXmlSavingsPercent}% saved`
+
+const barChartSection = `${datasetRows}\n\n${separator}\n${totalLine1}\n${totalLine2}\n${totalLine3}`
// Generate detailed examples (only for selected examples)
const detailedExamples = results
@@ -108,9 +139,9 @@ const detailedExamples = results
let displayData = result.data
if (result.name === 'GitHub Repositories') {
displayData = {
- repositories: result.data.repositories.slice(0, 3).map((repo: any) => ({
+ repositories: result.data.repositories.slice(0, 3).map((repo: Record) => ({
...repo,
- description: repo.description?.slice(0, 80) + (repo.description?.length > 80 ? '...' : ''),
+ description: repo.description?.slice(0, 80) + (repo.description?.length > 80 ? '…' : ''),
})),
}
}
@@ -124,7 +155,7 @@ const detailedExamples = results
**Configuration:** ${result.description}
-**Savings:** ${result.savings.toLocaleString('en-US')} tokens (${result.savingsPercent}% reduction)
+**Savings:** ${result.jsonSavings.toLocaleString('en-US')} tokens (${result.jsonSavingsPercent}% reduction vs JSON)
**JSON** (${result.jsonTokens.toLocaleString('en-US')} tokens):
@@ -146,8 +177,6 @@ const markdown = `### Token Efficiency
${barChartSection}
\`\`\`
-**Total:** ${totalToonTokens.toLocaleString('en-US')} tokens (TOON) vs ${totalJsonTokens.toLocaleString('en-US')} tokens (JSON) → ${totalSavingsPercent}% savings
-
View detailed examples
@@ -170,23 +199,6 @@ function generateBarChart(percentage: number, maxWidth: number = 25): string {
return '█'.repeat(filled) + '░'.repeat(empty)
}
-// Generate user API response
-function generateUsers(count: number) {
- return {
- users: Array.from({ length: count }, (_, i) => ({
- id: i + 1,
- name: faker.person.fullName(),
- email: faker.internet.email(),
- role: faker.helpers.arrayElement(['admin', 'user', 'moderator']),
- active: faker.datatype.boolean(),
- createdAt: faker.date.past({ years: 2 }).toISOString(),
- lastLogin: faker.date.recent({ days: 30 }).toISOString(),
- })),
- total: count,
- page: 1,
- }
-}
-
// Generate nested e-commerce order
function generateOrder() {
return {
diff --git a/benchmarks/src/evaluate.ts b/benchmarks/src/evaluate.ts
index e6e490b..f3701d1 100644
--- a/benchmarks/src/evaluate.ts
+++ b/benchmarks/src/evaluate.ts
@@ -10,6 +10,7 @@
import type { LanguageModelV2 } from '@ai-sdk/provider'
import type { EvaluationResult, Question } from './types'
import { anthropic } from '@ai-sdk/anthropic'
+import { google } from '@ai-sdk/google'
import { openai } from '@ai-sdk/openai'
import { generateText } from 'ai'
import { consola } from 'consola'
@@ -20,16 +21,18 @@ import { consola } from 'consola'
export const models: Record = {
'gpt-5-nano': openai('gpt-5-nano'),
'claude-haiku-4-5': anthropic('claude-haiku-4-5-20251001'),
+ 'gemini-2.5-flash': google('gemini-2.5-flash'),
}
/**
* Evaluate a single question with a specific format and model
*/
export async function evaluateQuestion(
- { question, formatName, formattedData, model}:
- { question: Question, formatName: string, formattedData: string, model: LanguageModelV2 },
+ { question, formatName, formattedData, model, modelName}:
+ { question: Question, formatName: string, formattedData: string, model: LanguageModelV2, modelName: string },
): Promise {
- const prompt = `Given the following data in ${formatName} format:
+ const prompt = `
+Given the following data in ${formatName} format:
\`\`\`
${formattedData}
@@ -37,13 +40,14 @@ ${formattedData}
Question: ${question.prompt}
-Provide only the direct answer, without any additional explanation or formatting.`
+Provide only the direct answer, without any additional explanation or formatting.
+`.trim()
const startTime = performance.now()
const { text, usage } = await generateText({
model,
prompt,
- temperature: model.modelId.startsWith('gpt-') ? undefined : 0,
+ temperature: !model.modelId.startsWith('gpt-') ? 0 : undefined,
})
const latencyMs = performance.now() - startTime
@@ -56,7 +60,7 @@ Provide only the direct answer, without any additional explanation or formatting
return {
questionId: question.id,
format: formatName,
- model: model.modelId,
+ model: modelName,
expected: question.groundTruth,
actual: text.trim(),
isCorrect,
@@ -93,9 +97,8 @@ Respond with only "YES" or "NO".`
try {
const { text } = await generateText({
- model: models['claude-haiku-4-5']!,
+ model: models['gpt-5-nano']!,
prompt,
- temperature: 0,
})
return text.trim().toUpperCase() === 'YES'
diff --git a/benchmarks/src/report.ts b/benchmarks/src/report.ts
index 8da12b8..af41f26 100644
--- a/benchmarks/src/report.ts
+++ b/benchmarks/src/report.ts
@@ -201,8 +201,8 @@ ${modelPerformance}
- **Semantic validation**: LLM-as-judge validates responses semantically (not exact string matching).
- **Token counting**: Using \`gpt-tokenizer\` with \`o200k_base\` encoding.
-- **Question types**: Field retrieval, aggregation, and filtering tasks.
-- **Real data**: Faker.js-generated datasets + GitHub repositories.
+- **Question types**: ~160 questions across field retrieval, aggregation, and filtering tasks.
+- **Datasets**: Faker.js-generated datasets (seeded) + GitHub repositories.
`.trimStart()
diff --git a/benchmarks/src/types.ts b/benchmarks/src/types.ts
index 399a167..c6719e7 100644
--- a/benchmarks/src/types.ts
+++ b/benchmarks/src/types.ts
@@ -1,7 +1,7 @@
export interface Dataset {
name: string
description: string
- data: any
+ data: Record
}
export interface Question {