From e400e68ad69d4459c264ac22efa01f085104dbfd Mon Sep 17 00:00:00 2001
From: Johann Schopplich <mail@johannschopplich.com>
Date: Tue, 28 Oct 2025 20:22:51 +0100
Subject: [PATCH] docs: overhaul retrieval accuracy benchmark

---
 README.md | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/README.md b/README.md
index ff96795..eea1744 100644
--- a/README.md
+++ b/README.md
@@ -215,13 +215,6 @@ metrics[5]{date,views,clicks,conversions,revenue,bounceRate}:
 Accuracy across **3 LLMs** on **154 data retrieval questions**:
 
 ```
-gemini-2.5-flash
-  xml          ██████████████████░░  90.3% (139/154)
-  csv          ██████████████████░░  89.0% (137/154)
-  toon         █████████████████░░░  87.0% (134/154)
-  json         ████████████████░░░░  79.2% (122/154)
-  yaml         ███████████████░░░░░  76.0% (117/154)
-
 gpt-5-nano
   toon         ███████████████████░  96.1% (148/154)
   csv          ██████████████████░░  90.3% (139/154)
@@ -229,6 +222,13 @@ gpt-5-nano
   json         ██████████████████░░  87.7% (135/154)
   xml          █████████████████░░░  83.8% (129/154)
 
+gemini-2.5-flash
+  xml          ██████████████████░░  90.3% (139/154)
+  csv          ██████████████████░░  89.0% (137/154)
+  toon         █████████████████░░░  87.0% (134/154)
+  json         ████████████████░░░░  79.2% (122/154)
+  yaml         ███████████████░░░░░  76.0% (117/154)
+
 claude-haiku-4-5-20251001
   json         ██████████░░░░░░░░░░  48.7% (75/154)
   toon         ██████████░░░░░░░░░░  48.1% (74/154)
@@ -286,16 +286,6 @@ claude-haiku-4-5-20251001
 
 #### Performance by Model
 
-##### gemini-2.5-flash
-
-| Format | Accuracy | Correct/Total |
-| ------ | -------- | ------------- |
-| `xml` | 90.3% | 139/154 |
-| `csv` | 89.0% | 137/154 |
-| `toon` | 87.0% | 134/154 |
-| `json` | 79.2% | 122/154 |
-| `yaml` | 76.0% | 117/154 |
-
 ##### gpt-5-nano
 
 | Format | Accuracy | Correct/Total |
@@ -306,6 +296,16 @@ claude-haiku-4-5-20251001
 | `json` | 87.7% | 135/154 |
 | `xml` | 83.8% | 129/154 |
 
+##### gemini-2.5-flash
+
+| Format | Accuracy | Correct/Total |
+| ------ | -------- | ------------- |
+| `xml` | 90.3% | 139/154 |
+| `csv` | 89.0% | 137/154 |
+| `toon` | 87.0% | 134/154 |
+| `json` | 79.2% | 122/154 |
+| `yaml` | 76.0% | 117/154 |
+
 ##### claude-haiku-4-5-20251001
 
 | Format | Accuracy | Correct/Total |
@@ -360,7 +360,7 @@ Four datasets designed to test different structural patterns:
 
 #### Models & Configuration
 
-- **Models tested**: `gemini-2.5-flash`, `gpt-5-nano`, `claude-haiku-4-5-20251001`
+- **Models tested**: `claude-haiku-4-5-20251001`, `gemini-2.5-flash`, `gpt-5-nano`
 - **Token counting**: Using `gpt-tokenizer` with `o200k_base` encoding (GPT-5 tokenizer)
 - **Temperature**: 0 (for non-reasoning models)
 - **Total evaluations**: 154 questions × 5 formats × 3 models = 2,310 LLM calls