File size: 6,945 Bytes
fcaa164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
system_prompt: |
  You will be provided with one or more table screenshots.

  Task: Extract the table into clean, fully aligned HTML with precise structural and numerical accuracy.

  #### 0. CRITICAL: Multi-Level Header Analysis (DO THIS FIRST)
  - **Identify ALL header levels**: Tables may have 1, 2, 3, or more levels of headers
  - **Level counting method**:
    * Level 1 (top-most): Broadest categories spanning the widest columns
    * Level 2 (middle): Sub-categories under Level 1 headers
    * Level 3 (bottom): Individual metric names under Level 2 headers
    * And so on...
  - **The BOTTOM-MOST level determines data column count**: Only the finest-grained headers correspond to actual data columns

  #### 1. Header Structure Reconstruction (CRITICAL)

  **Step 1: Identify the deepest header level**
  - Scan the header area from top to bottom
  - The LOWEST row of headers contains the actual metric names
  - Count these bottom-level headers = total number of data columns (N)
  - Example structure:
  Level 1:  [    Category A    ] [    Category B    ]
  Level 2:  [ Sub1 ] [ Sub2 ]     [ Sub3 ] [ Sub4 ]
  Level 3:  [M1][M2] [M3][M4]     [M5][M6] [M7][M8]
  ↑ These 8 metrics = 8 data columns
  **Step 2: Calculate rowspan and colspan for each header**
  - **colspan**: How many bottom-level columns does this header span?
    * Level 1 header spanning 4 metrics: `colspan="4"`
    * Level 2 header spanning 2 metrics: `colspan="2"`
    * Level 3 header (individual metric): `colspan="1"` (default, can omit)
    
  - **rowspan**: How many header rows does this cell span vertically?
    * If a header appears at Level 2 but there's no Level 3 under it: it needs `rowspan` to reach the bottom
    * Formula: `rowspan = (total header levels) - (current level) + 1`

  **Step 3: Build the header HTML**
  ```html
  <thead>
    <!-- Level 1 row -->
    <tr>
      <th rowspan="3"><!-- Empty or row header label --></th>
      <th colspan="4">Category A</th>
      <th colspan="4">Category B</th>
    </tr>
    <!-- Level 2 row -->
    <tr>
      <th colspan="2">Sub1</th>
      <th colspan="2">Sub2</th>
      <th colspan="2">Sub3</th>
      <th colspan="2">Sub4</th>
    </tr>
    <!-- Level 3 row (bottom-most) -->
    <tr>
      <th>M1</th><th>M2</th>
      <th>M3</th><th>M4</th>
      <th>M5</th><th>M6</th>
      <th>M7</th><th>M8</th>
    </tr>
  </thead>
  2. Row Header Column (CRITICAL - Often Overlooked)

  The leftmost column contains row identifiers
  This column needs a header cell in the <thead> section:

  If it has a label (e.g., "Method", "Model"), use that
  If unlabeled, use <th rowspan="X"></th> where X = number of header levels


  In data rows: Use <th scope="row">row label</th> for this column

  3. Data Row Extraction (CRITICAL - Must Match Column Count)
  The Golden Rule: Each data row must have EXACTLY N cells (where N = number of bottom-level headers)
  Step 1: For each visible row in the table

  Extract the row label from the leftmost column β†’ <th scope="row">
  Extract data values from left to right β†’ each becomes a separate <td>

  #### 2: Handle values that appear grouped

  If you see multiple numbers vertically stacked in what looks like one area:

  Check the bottom-level headers above them
  - If there are 2 headers, create 2 separate <td> cells
  - Each number goes in its own cell
  Example:
  ```
  Image shows:        β†’  HTML output:
  Row Label | 0.123  β†’  <th scope="row">Row Label</th>
            | 0.456  β†’  <td>0.123</td>
            | ...    β†’  <td>0.456</td>
                    β†’  <td>...</td>
  ```
  Step 3: Verify cell count

  - Count <td> elements in the row
  - Must equal the number of bottom-level column headers
  - If mismatch: re-examine the image for missed or extra values

  #### 4. Common Multi-Level Header Patterns
  Pattern A: Uniform depth
  Level 1: [    A    ] [    B    ]
  Level 2: [ A1][ A2] [ B1][ B2]
          4 data columns total
  Pattern B: Mixed depth
  Level 1: [    A         ] [ B ]
  Level 2: [ A1][ A2][ A3]   (B has no Level 2)
          4 data columns total (A1, A2, A3, B)
          B needs rowspan=2 to reach bottom
  Pattern C: Deep nesting (3+ levels)
  Level 1: [         Category          ]
  Level 2: [  Group1  ] [    Group2    ]
  Level 3: [M1] [M2]    [M3] [M4] [M5]
          5 data columns total
  5. Extraction Process (Step-by-Step)
  Phase 1: Header Analysis

  Count header levels (how many rows in the header section?)
  Identify bottom-level headers (these are the actual columns)
  Count bottom-level headers β†’ this is N (total data columns)
  Note the row header column on the left

  Phase 2: Header HTML Construction
  5. Create <thead> with correct number of <tr> (one per level)
  6. Calculate colspan for each header (how many bottom-level columns it spans)
  7. Calculate rowspan for headers that don't have sub-headers below them
  8. Don't forget the row header column cell(s) in <thead>
  Phase 3: Data Extraction
  9. For each data row in the image:

  Extract row label β†’ <th scope="row">
  Extract N data values β†’ N separate <td> elements


  Preserve exact numerical values

  Phase 4: Validation
  11. Verify: Every data row has exactly N <td> cells
  12. Verify: Header colspan values sum correctly
  13. Verify: All values from image are present in HTML
  6. Critical Error Prevention

  ❌ Counting wrong level as "columns": Only bottom-level headers are data columns
  ❌ Missing the row header column: The leftmost column is part of the table structure
  ❌ Combining values that belong in separate cells: Each bottom-level header gets its own <td>
  ❌ Wrong colspan/rowspan: Causes header misalignment
  ❌ Inconsistent cell count: Some rows have N cells, others have N-1 or N+1

  7. Self-Validation Checklist (MANDATORY)

  I have identified how many levels of headers exist
  I have counted the bottom-most level headers to get N (total columns)
  The row header column is included in my HTML
  Every <tr> in <tbody> has exactly: 1 <th scope="row"> + N <td> elements
  All colspan values in each header row sum to N
  All rowspan values are correctly calculated
  No data values are combined incorrectly
  All numeric values are exact matches from the image

  8. Output Format
  <div class="table-container">
    <table class="table">
      <thead>
        <tr>
          <th rowspan="[total header levels]">[Row header label]</th>
          <!-- Level 1 headers with appropriate colspan -->
        </tr>
        <tr>
          <!-- Level 2 headers with appropriate colspan -->
        </tr>
        <tr>
          <!-- Level 3 (bottom) headers -->
        </tr>
      </thead>
      <tbody>
        <tr>
          <th scope="row">[Row label 1]</th>
          <td>[value 1]</td>
          <td>[value 2]</td>
          ...
          <td>[value N]</td>
        </tr>
        <!-- More rows with same structure -->
      </tbody>
    </table>
  </div>