Tag: machine learning

The Hidden Geometry of Chess
Why “solving chess” is really a question about structure, not speed

People talk about “solving chess” as if it’s just a matter of more computing power or a slightly better engine. That’s wrong.

We aren’t blocked because Stockfish isn’t fast enough. We’re blocked because, as far as we can tell, chess looks like an enormous pile of unrelated positions. Engines thrash through that pile with astonishing efficiency, but they don’t compress it, they don’t explain it, and they definitely don’t solve it in any mathematical sense.

If chess ever becomes solvable in a meaningful way, it will be because someone finds a hidden structure that lets us treat that huge pile of positions as one object with internal geometry.

This piece is about what that would actually mean.

1. What “solving chess” really is

Forget engines for a moment.

Mathematically, solving chess means:

For every legal position, assign a value: “win”, “draw”, or “loss”, under perfect play, and give a corresponding best move.

You can think of this as a gigantic lookup table:
- Each position is a point in an absurdly large space
- The perfect-play value is a label on that point
Right now, we know a lot about tiny corners of this space:
- 7-piece endgames are solved exactly
- Some opening lines are mapped out very deeply
- Engines can estimate values locally with terrifying precision
But the global map — the full geometry of “win / draw / loss” across all positions — is completely opaque.

The key question is not:

“Can we search deeper?”

It’s:

“Is there any structure in that function from positions → values, or is it essentially random at scale?”

If it’s random, we’re done. No cleverness will save us: solving chess is just a matter of raw brute force at inhuman scales.

If there is structure, then the interesting work is to find it and formalise it.

2. Think of chess as a weird landscape

One useful way to think about chess:
- Imagine every legal position as a point in a high-dimensional space
- To each point, attach a number between –1 and +1 (loss to win); call this number the “value”
- We get a bizarre landscape: hills (winning positions), valleys (losing positions), long plateaus (drawn positions)
Engines don’t see the whole landscape. They only see:
- The immediate neighbours of where they stand (positions reachable in a few moves)
- Short local paths through the terrain (search trees)
- A heuristic sense of where the hills and valleys might be (evaluations)
What we don’t know is whether this landscape is:
- Structured — smooth in some hidden sense, decomposable, compressible
- Or chaotic — values fluctuate in a way that, beyond small endgame islands, is essentially intractable
To ask “is there structure?” in a serious way, we need more than metaphors. We need to propose what “structure” would look like in concrete, testable terms.

3. Three ways chess might secretly be structured

Here are three concrete structural possibilities. If any of them turn out to be true (even approximately), they’d radically change how we think about solving chess.

3.1. Low-dimensional geometry: the “few hidden directions” hypothesis

This is the idea that:

Although the state space of chess is astronomically large, the value function is governed by a small number of underlying “directions”.

Analogies:
- In physics, complex systems often reduce to a few dominant modes (think of how a vibrating drum can be described by a few main frequencies).
- In machine learning, deep networks often implicitly compress data into low-dimensional features.
Translated to chess:
- Take a big sample of positions from strong engine games.
- For each position, record a good evaluation (e.g. win probability from a top engine).
- Build a graph that connects positions which are both:
  - closely related (one move apart, or structurally similar), and
  - have similar evaluation values.
- Now ask:
“Can we describe this evaluation function mostly using a small number of ‘basis patterns’ on this graph?”

If the answer is yes — if the evaluation surface can be well-approximated by combining, say, 50 or 100 patterns on a graph with millions of positions — then chess has a kind of hidden geometry. That would be a big structural claim.

If the answer is no — if you need thousands or millions of independent patterns — then the “few hidden directions” hypothesis dies, and with it any hope of that particular kind of compression.

Either way, it’s a concrete empirical question.

3.2. Coarse-graining: the “macroscopic chess” hypothesis

Renormalisation in physics works like this:
- You ignore microscopic details
- You look at a system at a larger scale (block spins, average behaviour)
- Amazingly, the large-scale behaviour often obeys simple, stable laws
Is there an analogue for chess?

That would mean something like:

If you group positions by certain coarse features — material balance, pawn structure, blocked vs. open, king safety patterns — the average value within each group behaves in a stable, self-consistent way.

Concretely, you could try things like:
- Group positions that have:*
  - the same material counts, or
  - exactly the same pawn structure, or
  - the same “blocked board” when you tile the board into 2×2 or 4×4 squares and only record whether each tile has minor pieces, majors, kings, etc.
- For each group, compute the average engine evaluation. That gives you a coarse “macroscopic” evaluation.
- Now “zoom out” again: group these coarse states in an even rougher way and check if the new averages are consistent.
If, after a few such compressions, things stabilise — i.e. the coarse description repeats itself up to small noise — then chess has a phase structure: macroscopic classes that behave predictably regardless of micro-detail.

If nothing stabilises and everything stays sensitive to microscopic details all the way up, then chess is “RG-hostile”: no renormalisation structure to exploit.

Again: this is testable.

3.3. Decomposition: the “sum of local fights” hypothesis

This is the most intuitive one.

Informally:

Most real positions feel like a few largely independent local fights (king side vs queen side, a pawn majority, a piece trap), plus some interaction between them. Could the value of the position be approximated as “sum of local values plus a small correction”?

Rough sketch of how you’d test this:
1. For a given position, build an “influence graph” over the board: connect squares/pieces that directly attack or defend each other.
2. Partition this influence graph into a few regions (clusters with strong internal connections, weak connections between clusters).
3. For each region, treat it as a smaller sub-position and run a local engine evaluation on it (with a simple way of handling the “outside world”, e.g. frozen pieces).
4. Add up these local evaluations and compare to the full-engine evaluation of the original position.
If you find that:
- For most positions arising in serious play,
- The difference between “sum of local evaluations” and “true evaluation” is small and bounded,
then chess is decomposable: the global value almost always factorises into local parts plus a modest interaction term.

If that difference is often huge and scales with position complexity, then the “sum of local fights” intuition is simply wrong at the value level, however psychologically natural it feels to humans.

And once again: this is something you can actually measure.

The Chess Geometry Explorer

Testing the “Inherent Order” vs “Random Chaos” of the value function.

1. Low-Dim Geometry

Values follow smooth “hills” (dominant modes).

2. Coarse-Graining

Averages stabilize into macroscopic grids.

3. Local Decomposition

Value is a sum of separated local fights.

4. Chaos (RG-Hostile)

No structure. Random, incompressible complexity.

Structural Hypothesis: Low-Dimensionality

V(P) ≈ Σ α_i φ_i(P) … for small i

4. How you’d test these hypotheses in practice

All three ideas (low-dimensional geometry, coarse-grained phases, decomposability) can be tested with the same basic recipe:
1. Generate lots of positions
  - Sample from strong engine self-play (Stockfish / Leela).
  - Include a mix of openings, middlegames, and endgames.
2. Evaluate them with a very strong engine
  - Use deep search or a strong neural net head to get a “ground truth” evaluation U(P) for each position P.
3. Build the structures you care about
  - For geometry: build the value-similarity graph and do a spectral analysis (see how fast “energy” collapses into a few modes).
  - For coarse-graining: group positions by material/pawns/blocked tiles/king-zones and see whether averages stabilise when you compress repeatedly.
  - For decomposition: partition positions into regions and see how well the sum-of-local-values matches the whole.
4. Look for clean patterns or clear failures
  - Either: “most of the structure is captured by K ≈ log N patterns / groups / local terms”,
  - or: “no such collapse happens; complexity stays high everywhere.”
In other words, this is not about faith in hidden order. It’s about specifying exactly what kind of order would help, and then going looking for it with real data.

5. The hard counter-arguments (and why they matter)

There are good reasons this might all fail.
- In complexity theory, “most” Boolean functions are essentially incompressible: to describe them you need something as big as the truth table itself.
- Certain games can encode instances of SAT or other hard problems; their value functions inherit this hardness.
- Large graphs can be expanders — highly connected in a way that destroys nice clustering or low-dimensional embeddings.
If chess’s value landscape is “generic” in these senses, no amount of clever geometry will save us. Any function that compresses it would also compress hard problems we don’t know how to tame.

The point of making explicit structural hypotheses is that disproving them is also progress: if you can show that the value landscape fails every reasonable notion of structure, that’s a strong argument that “solving chess” really is computationally hopeless beyond small fragments.

6. Why this matters even if we never solve chess

Even if grand “solve chess” ambitions die, this structural line of attack matters for other reasons:
- It forces us to think about chess positions as a population with statistical and geometric properties, not just individual puzzles.
- It links computer chess to serious areas of mathematics and theoretical computer science: spectral graph theory, discrete harmonic analysis, renormalisation ideas, additive combinatorics.
- It gives a principled way to design better evaluation architectures: if you know decomposability holds, you’d design networks and search schemes that exploit it.
And more broadly, it’s a prototype for how to reason about other large decision spaces where we suspect “hidden structure” but don’t want to just chant that phrase and move on.

7. The honest bottom line

Right now, this is a sophisticated promissory note:

Either the chess value function has some kind of global structure — spectral, coarse-grained, or decomposable — or it doesn’t. If it does, we should be able to find evidence for it with the kinds of experiments sketched above. If it doesn’t, we should be able to demonstrate that too.

Engines answered the question “how strong can a machine play?”

This is aimed at a different question:

“Is there any mathematical order in the perfect-play value of chess, or is the game, at that level, structurally indistinguishable from a random hard function?”

That’s a much less romantic question than “who wins with best play?”, but it’s arguably more fundamental. If we knew the answer, the whole conversation around “solving chess” would finally stop being handwavy and become a matter of actual geometry.

https://thinkinginstructure.substack.com/p/the-hidden-geometry-of-chess
December 14, 2025

The Hidden Geometry of Clumping

Why galaxies, web networks, optimization landscapes — and perhaps even chess — form clusters, and what those clusters reveal about the structure of the underlying system

Clumping looks universal.

Galaxies condense out of nearly uniform early-universe matter.
PageRank concentrates probability on a handful of influential webpages.
Combinatorial optimization problems produce dense pockets of near-solutions.
Even chess positions seem to fall into plateaus and pits where evaluation changes slowly or chaotically.

The similarity is tempting — but misleading.

Across physics, networks, complexity theory, and even games, clumping is not a mechanism.
It is a diagnostic: the visible footprint of something deeper.

The geometry of the low-eigenvalue modes of the operator governing a system determines where its clumps form, and what those clumps mean.

Some systems have a handful of smooth, dominant modes (gravity).
Some have intermediate spectral bottlenecks (graphs).
Some have dense, ungapped spectra (NP-hard optimization).

Each produces clumps — but for radically different reasons.

Understanding that spectrum tells us how predictable a system is, how compressible it is, how learnable it is — and how hard.

1. Why low modes are the unifying principle

Every system considered here has three ingredients:

A state space
Density fields, directed graphs, bitstrings, chess positions.

A functional
Gravitational potential; random-walk operator; Hamiltonian or cost function; value function of a game.

A flow rule
Physical dynamics; Markov chain convergence; local search; neural evaluation.

Clumping occurs where this flow slows, accumulates, or fails to escape.

Across all these systems, such regions are controlled by small eigenvalues:

directions where the functional changes least,
nearly invariant subspaces under dynamics,
flat or marginal directions of the Hessian,
low-conductance sets in a graph,
rugged basins formed by many near-degenerate minima.

That is why low modes unify gravity, PageRank, spin glasses, and evaluation landscapes:
they determine the shape, scale, and meaning of clumps.

2. Gravity: clumps from smooth, low-dimensional instabilities

(Jeans 1902; Binney & Tremaine)

Gravity is the canonical structured landscape.

A small density fluctuation $\delta_k(t)$ in a fluid of density $\rho$ and sound speed $c_s$ satisfies the linear Jeans equation: $\delta_k(t) \propto \exp\!\left(\sqrt{4\pi G\rho – c_s^2 k^2}\, t\right).$

For long wavelengths $k$ such that $4\pi G\rho > c_s^2 k^2$ , the frequency becomes imaginary and perturbations grow exponentially in time, signaling gravitational instability.

Worked example

Let $G = \rho = 1$ and $c_s = 0$ . Then $\delta_k(t) = e^{\sqrt{4\pi}\, t} \approx e^{3.54 t}.$

A 0.1% perturbation grows tenfold in under one Hubble time. Large-scale overdensities collapse into galaxies.

Interpretation

Gravity has very few dominant modes.
Structure formation is governed by long-wavelength instabilities.
The clumps are smooth, coherent, and predictable.
The system is highly compressible.

3. Web networks: clumps from spectral bottlenecks

(Brin & Page 1998; Chung 1997; Cheeger 1970)

PageRank computes the stationary distribution $v$ v of the Google matrix: $v = \alpha u + (1 – \alpha) P v .$

PageRank does not use the graph Laplacian explicitly — but slow-mixing regions of the random walk correspond to:

nearly invariant subspaces of $P$ P,
which correspond to low-conductance sets,
which correspond to small Laplacian eigenvalues (via Cheeger’s inequality).

Thus clumping remains spectral, tied to bottlenecks in the graph.

Worked example

Construct two triangles connected by a single edge.
Random walks mix rapidly within each triangle but leak slowly between them.
The Laplacian’s second eigenvalue $\lambda_2$ is small.
PageRank assigns disproportionate mass to whichever cluster has stronger internal connectivity.

Interpretation

Clumps reveal topology, not physics.
There are more modes than in gravity, fewer than in NP-hard landscapes.
Compressibility is intermediate.

4. NP-hard optimization: clumps from rugged structure

(Sherrington & Kirkpatrick 1975; Mézard, Parisi & Virasoro 1987)

Take subset-sum: $f(S) = \left| \sum_{i \in S} a_i – T \right|.$

Plot this objective over the hypercube $\{0,1\}^n$ .
You obtain a landscape analogous to a spin glass:

exponentially many local minima,
barriers growing with dimension,
flat directions interspersed with sharp cliffs,
a dense spectrum of near-zero eigenvalues.

Worked example

Let $n = 12$ and $a_i \in [1,1000]$ be random integers.
Evaluating all $2^{12} = 4096$ configurations reveals:

many distinct local minima,
no dominant basin,
no coarse structure persisting across scales.

Interpretation

Clumping arises from too many competing minima.
The system is maximally incompressible.
Low modes are dense and uninformative.
This is the opposite of gravity.

5. The compressibility spectrum

These systems lie along a single axis determined by their low-eigenvalue structure:

System	Operator	Low-mode structure	Basin geometry	Compressibility
Gravity	Poisson / Jeans	Few, smooth	Large coherent wells	High
Web graphs	Random walk	Moderate, topological	Community clusters	Medium
NP-hard	Discrete Hamiltonian	Dense, ungapped	Fragmented minima	Low

Principle

Few low modes → structured clumps (predictable)
Several low modes → spectral clumps (clusterable)
Many low modes → rugged clumps (hard)

6. Edge cases and transitions

Protein folding
Smooth funnels mixed with glassy regions — a hybrid spectrum.

Hierarchical networks
Successive spectral gaps → layered clumps.

Turbulence
Energy cascades generate multi-scale spectral structure.

Phase transitions
In spin glasses and constraint-satisfaction problems, the low-mode spectrum densifies abruptly.

7. Why this matters: prediction, learning, hardness

Predictability
Gravity is predictable at large scales; NP-hard landscapes are not.

Learnability
Neural networks readily learn spectral structure; they struggle with rugged landscapes.

Computational hardness
Smooth → polynomial approximations possible.
Spectral → clustering helps.
Rugged → exponential barriers dominate.

Clump structure indicates what kinds of inference are fundamentally possible.

8. Chess: a system on the boundary

Chess appears to occupy a hybrid regime.

AlphaZero
Rapid spectral decay in value networks (Silver et al., 2018).

Leela Zero
Strong compression in CNN representations.

Stockfish NNUE
Thousands of parameters suffice, indicating inherent compressibility.

Measurement is feasible
Sampling $\sim 10^6$ ∼106 positions and extracting leading eigenvalues via randomized SVD is practical.

Hypothesis (testable)

Chess lies mid-spectrum: globally compressible, locally rugged in tactical regions.

A sharp spectral gap implies structural solvability.
A dense near-zero spectrum implies inherent NP-like complexity.

Either result is meaningful.

9. Bottom line

Clumping is ubiquitous — but not universal in cause.

Gravity: smooth physical instabilities
Networks: spectral bottlenecks
NP-hard systems: competing minima

Across all cases:

Clumps reflect the geometry of the low-eigenvalue spectrum — the determinant of predictability, learnability, and complexity.

Clumping is not the phenomenon.
It is the footprint of the geometry underneath.

Formal timestamp:
The Chess Eigenspectrum Hypothesis was published at Zenodo:
https://doi.org/10.5281/zenodo.17845086

https://thinkinginstructure.substack.com/p/the-hidden-geometry-of-clumping

December 12, 2025

Tag: machine learning

The Hidden Geometry of Chess

1. What “solving chess” really is

2. Think of chess as a weird landscape

3. Three ways chess might secretly be structured

3.1. Low-dimensional geometry: the “few hidden directions” hypothesis

3.2. Coarse-graining: the “macroscopic chess” hypothesis

3.3. Decomposition: the “sum of local fights” hypothesis

The Chess Geometry Explorer

4. How you’d test these hypotheses in practice

5. The hard counter-arguments (and why they matter)

6. Why this matters even if we never solve chess

7. The honest bottom line

The Hidden Geometry of Clumping

Why galaxies, web networks, optimization landscapes — and perhaps even chess — form clusters, and what those clusters reveal about the structure of the underlying system

1. Why low modes are the unifying principle

2. Gravity: clumps from smooth, low-dimensional instabilities

Worked example

Interpretation

3. Web networks: clumps from spectral bottlenecks

Worked example

Interpretation

4. NP-hard optimization: clumps from rugged structure

Worked example

Interpretation

5. The compressibility spectrum

6. Edge cases and transitions

7. Why this matters: prediction, learning, hardness

8. Chess: a system on the boundary

Hypothesis (testable)

9. Bottom line