Eighteen categories of design tokens, and why we refused to ship sixteen.

The eighteen-list, and why each category exists.

Color palette, gradients, typography, font-loading strategy, spacing scale, container max-widths, gutters, shadows, borders, border widths, border styles, CSS variables, motion timings, z-index ladder, breakpoints, component tree, image profile, font-face details. Eighteen categories, every one of them produces a structured token group inside the JSON we return.

The list is not arbitrary. It is the smallest set we could find that lets an agent generate a parity component without guessing. Drop CSS variables and the agent doesn't know that a color is bound to a semantic name. Drop motion timings and the hover state will feel wrong by twenty milliseconds in either direction. Drop the component tree and the agent has no way to understand the structural hierarchy of the page. Each category is one less guess.

Palette role inference, in HSL space.

We started with the obvious approach: cluster the captured colors by RGB distance, threshold at twenty-five units, deduplicate. It was a disaster. Linear's brand purple cluster pulled in icon shades that were not brand colors. Resend's signature neon green clustered with three unrelated text colors that happened to share a green channel. The clustering was distance-correct and category-wrong.

We rebuilt around HSL. Hue gets a forty-five-degree bucket. Saturation gets a twenty-percent bucket. Lightness gets a twenty-five-percent bucket — eight percent for grays, where small differences read as drift. The clusters now align with how a designer would group the same colors visually. Linear's brand purple stays together; the icon shades land in their own bucket. Resend's neon stays a neon and stops borrowing from text.

typescriptlib/extraction/engine/extractors/colors.ts

function clusterColorsHsl(colors: ColorToken[]): ColorToken[][] {
  const buckets = new Map<string, ColorToken[]>();
  for (const c of colors) {
    const { h, s, l } = c.hsl;
    const isGray = s < 8;
    const lumStep = isGray ? 8 : 25;
    const key = `${Math.floor(h / 45)}-${Math.floor(s / 20)}-${Math.floor(l / lumStep)}`;
    const bucket = buckets.get(key) ?? [];
    bucket.push(c);
    buckets.set(key, bucket);
  }
  return [...buckets.values()];
}

The brand-tier heuristic, or: which colors actually carry the brand.

Once colors cluster correctly, the next question is harder: which clusters are brand colors and which are neutrals? A single hex value cannot tell you. A frequency count cannot tell you — a brand color might appear three times on a landing page while the body text color appears two hundred times. We needed a perceptual signal.

The signal is a saturation-plus-luminance-plus-frequency heuristic. Saturation above fifty percent. Luminance between twenty-five and seventy-five percent. Frequency at five or more. Hit all three thresholds, you get the brand-tier flag. The flag is exposed in the JSON as isBrandTier and used by the curator brief to bold the brand colors when describing the palette.

Motion library detection, the hard way.

The naive approach is URL pattern matching against script tags. Find a Framer Motion bundle, flag Framer Motion. It works for marketing pages with vendored libraries and fails completely for modern bundlers — webpack, esbuild, Vite, Turbopack — that hash and concatenate library code into anonymous chunks. Maze runs GSAP. Their bundle has no string "gsap" anywhere. URL pattern matching missed it for months.

The fix is a window-global probe. Each library leaves runtime artifacts: GSAP attaches an internal _gsap store to elements, Framer Motion sets MotionIsMounted on initial render, Rive registers __rive_runtime as an instance hook. We extended the Playwright page.evaluate pass to inspect fifteen named globals at the end of the capture. URL match plus global hit, either is enough. False positives are rare; false negatives, in twelve smoke runs, dropped from sixty percent to zero.

The two analyzers we almost cut.

Two categories survived a deletion review by the smallest of margins. Component tree was the first; it walks the DOM in a depth-first pass and tags semantic landmarks — header, main, footer, navigation. We almost cut it because the output is verbose and most agents do not consume it directly. We kept it because the day we cut it is the day a designer asks "what's the structural hierarchy underneath this page," and we have nothing to show.

Image profile was the second. It computes the lazy-loading ratio, the decoding-async ratio, and the format distribution across all images on the page. Almost no agent needs this. We kept it because the operations team needed it — knowing that Stripe ships ninety-percent webp tells you something about their performance discipline that no other category captures.

What comes next, and where we are still wrong.

Two known gaps survive. The first is hover-state extraction — we capture only the initial computed style, so a button that changes color on hover gives us the resting color, not the active one. The fix is technically straightforward (dispatch hover, recompute style, restore) but reliability across thousands of components is hard. We deferred it. The second is animation timeline extraction beyond the cubic-bezier triple — for sites running scroll-linked Theatre.js or Rive sequences, we identify the library but not the timeline. Same status: known, deferred, on the roadmap.

Eighteen categories is not a finish line. It is the smallest correct surface for the agent moment we shipped against. The next two — hover state and animation timeline — turn it into nineteen and twenty before the year is out. The list grows when the agent's appetite grows. The list does not grow because we ran out of features to ship.

Berkay Erdinc· Curated by Claude

Founder & editor, AI2 Design. Fifteen years in product design, one stubborn opinion: depth still beats breadth.

@ai2design_

The eighteen-list, and why each category exists.

Palette role inference, in HSL space.

The brand-tier heuristic, or: which colors actually carry the brand.

Motion library detection, the hard way.

The two analyzers we almost cut.

What comes next, and where we are still wrong.

We don't believe in inspiration galleries. We built one anyway — and it's a transfer protocol.

Linear's restraint is not a style. It's a worldview.

WCAG 2.1 vs APCA. Which spec actually predicts whether your text is readable?

Be among the first to use it.