What is Category Bias?
QuantWealth’s archaeological count-based exclusivity wealth measure is affected by how researchers categorize their count data. The same cemetery may be recorded differently by different researchers. One researcher might use broader categories for some types (e.g. “ceramics”), but then have more detailed categories for flints or beads, and another researcher might separate “ceramics” (or any other group) into finer categories like “c_bowl”, “c_jug”, “c_pot”. This subjective categorization can often produce different Gini coefficients. The Category Bias simulation explores how sensitive your results are to these categorization choices.
Consider these two ways of recording the same 10 graves:
Broad groupings: “ceramics”, “stone_tools”, “ornaments”
Fine types: “c_bowl”, “c_jug”, “s_axe”, “s_scraper”, “glass_beads”, “shell_pendant”, “copper_ring”
| Grave_ID | ceramics | stone_tools | ornaments | TOT | Prestige+1 |
|---|---|---|---|---|---|
| G1 | 2 | 1 | 0 | 2 | 4.83 |
| G2 | 0 | 1 | 0 | 1 | 2.83 |
| G3 | 1 | 0 | 1 | 2 | 5.67 |
| G4 | 3 | 2 | 5 | 3 | 7.50 |
| G5 | 0 | 1 | 0 | 1 | 2.83 |
| G6 | 1 | 0 | 0 | 1 | 3.00 |
| G7 | 0 | 0 | 0 | 0 | 1.00 |
| G8 | 2 | 1 | 3 | 3 | 7.50 |
| G9 | 1 | 0 | 0 | 1 | 3.00 |
| G10 | 0 | 1 | 0 | 1 | 2.83 |
Prestige values:
Gini (Prestige+1) = 0.302
| Grave_ID | c_bowl | c_jug | s_axe | s_scraper | glass_beads | shell_pendant | copper_ring | TOT | Prestige+1 |
|---|---|---|---|---|---|---|---|---|---|
| G1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 3 | 11.60 |
| G2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 5.33 |
| G3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 2 | 9.27 |
| G4 | 2 | 1 | 1 | 1 | 3 | 1 | 1 | 7 | 33.60 |
| G5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 4.00 |
| G6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 5.00 |
| G7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.00 |
| G8 | 1 | 1 | 0 | 1 | 2 | 1 | 0 | 5 | 23.60 |
| G9 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 4.60 |
| G10 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 4.00 |
Prestige values:
Gini (Prestige+1) = 0.526
⚠️ Key Observation: The same underlying data produces very different Gini coefficients:
With finer categories, rich graves (G4, G8) now have many more distinct types, amplifying measured inequality.
The split version (blue) shows a more bowed curve, indicating higher inequality. The lumped version (orange) is closer to the equality line.
The simulation runs 100 iterations. Each iteration:
This produces 100 split-Ginis and 100 lump-Ginis, showing the range of possible outcomes.
The number of times each column can split is limited by rules applied in priority order:
Columns sharing a prefix before underscore (e.g., “c_bowl”, “c_jug”, “c_pot”) form a group. The split limit is reduced by the group size.
Example: 5 columns with “c_” prefix → base limit (5) − 5 = 0 additional splits
Tip: Add consistent prefixes to your column names (e.g., “c_” for ceramics, “s_” for stone, “cu_” for copper) to control splitting behavior.
Columns containing: bead, ornament,
pearl, perle, pendant,
pebble, sherd, shell,
arrowhead
These items often have high counts but shouldn’t split excessively.
Their split limit uses asinh(median) — a function that
grows slowly for large numbers.
Example: Column with median count of 20 → asinh(20) ≈ 3.7 max splits
If no prefix or special pattern matches, the algorithm looks for shared substrings of 5+ characters.
Example: “limestone_beads” and “limestone_ornament” share “limestone” → grouped together
Count-based limit: Regardless of the above rules, a column cannot split into more sub-categories than its median count. If graves typically have 2 stone axes, splitting into 5 types makes no sense.
Columns are candidates for lumping if they share:
In each simulation, 2 or more columns from each group may be randomly combined by summing their counts.
| Element | Meaning |
|---|---|
| Black line | Your original Gini coefficient |
| Gray dashed lines | Interquartile range (IQR) — the middle 50% of simulated values |
| Orange dots | Gini values from lumping simulations (fewer categories) |
| Blue dots | Gini values from splitting simulations (more categories) |
| Pattern | Interpretation |
|---|---|
| Black line within gray dashes | ✅ Your result is robust to categorization choices |
| Wide spread of dots | ⚠️ Gini is sensitive to how categories are defined |
| Blue dots much higher than orange | Finer categories would increase measured inequality |
| Tight clustering around black line | ✅ Your categorization is well-balanced |
| Black line outside the IQR | ⚠️ Your categorization may be unusually coarse or fine |
IQR stored with submissions: When you add results to the map, the interquartile range from the Category Bias simulation is stored. This provides a measure of uncertainty due to categorization choices, visible in the map popup as “Bias IQR”.
Use consistent prefixes: Name columns like
c_bowl, c_jug, s_axe,
cu_ring to give you control over grouping
behavior.
Run the simulation: Always check category bias before drawing conclusions about inequality levels.
Report the IQR: When publishing results, mention the bias simulation IQR to indicate robustness.
Compare like with like: When comparing Gini coefficients across sites, ensure similar categorization granularity.
Documentation for QuantWealth Archaeological Inequality Tool Last updated: Januar 2026