Tag: ai

  • Beyond Transformers: Three Ways to Build Global Structure — and How the Field Is Actually Moving Forward

    Beyond Transformers: Three Ways to Build Global Structure — and How the Field Is Actually Moving Forward

    1. Introduction

    For the past several years, nearly every successful large-scale sequence model has converged on the same architectural pattern: transformers and their variants. Sparse attention, linear attention, grouped-query attention, kernel tricks — the surface details change, but the underlying mechanism remains the same.

    This has produced a familiar question:

    Are transformers inevitable, or are we simply stuck?

    The answer is neither. What is happening is more specific: the field has largely committed to one particular way of building global structure, and transformers saturate that choice extremely well.

    Once the alternatives are made explicit, both the limits of transformers and the shape of what comes next become much clearer.


    2. The Core Question: How Is Global Structure Built?

    Any sequence model that aims to perform non-trivial reasoning must answer one fundamental question:

    How does information from distant parts of the sequence come together?

    There are only a few fundamentally different answers. Everything else is variation.


    3. Explicit Comparison: The Transformer Regime

    Transformers build global structure by explicitly comparing tokens to each other.

    Each layer:

    1. embeds tokens in a shared space,
    2. computes similarity scores between all token pairs,
    3. aggregates information based on those scores,
    4. repeats the process in bounded depth.

    This gives transformers two defining properties:

    • Random access — any token can directly query any other.
    • Symmetry — relationships are not tied to sequence order or direction.

    The cost is obvious: O(n²) interactions. The payoff is equally clear: maximal expressiveness for arbitrary global retrieval and comparison.

    This is why transformers dominate tasks such as:

    • language modeling,
    • code understanding,
    • cross-document reasoning,
    • retrieval-augmented generation.

    Variants that keep explicit comparison but reduce cost (sparsity, kernels, approximations) remain inside this regime. They change how efficiently comparison is approximated, not what kind of structure is being computed.


    3.1 Hardware Alignment of Transformers

    The persistence of transformers is not just architectural — it is also hardware-driven.

    Dense attention has:

    • high arithmetic intensity,
    • predictable memory access patterns,
    • minimal control flow,
    • excellent tiling into SRAM / shared memory.

    In practice, large attention blocks amortize memory movement from high-bandwidth memory (HBM) and keep GPUs saturated. By contrast, many “efficient” alternatives reduce FLOPs but introduce:

    • serial dependencies,
    • irregular memory access,
    • lower arithmetic intensity.

    As a result, O(n²) attention often runs closer to peak hardware utilization than O(n) alternatives, particularly on modern accelerators.


    3.2 The KV Cache Problem

    In practice, the dominant bottleneck for long-context transformers is no longer raw attention FLOPs, but the memory footprint and bandwidth of the key–value (KV) cache during inference.

    For autoregressive generation, the KV cache grows linearly with context length and must be:

    • stored in high-bandwidth memory,
    • read at every decoding step,
    • kept resident to avoid recomputation.

    As context windows push into hundreds of thousands or millions of tokens, KV cache traffic — not attention compute — becomes the primary scaling limit.

    This is the concrete pain point that hardware-aware state-space models address. By replacing explicit token–token comparison with a constant-sized state, models such as Mamba eliminate the KV cache entirely. The trade is explicit: linear savings in memory and bandwidth in exchange for compressed global structure.

    This reframes the comparison:

    • Transformers pay for expressiveness primarily in memory bandwidth.
    • SSMs buy efficiency by fixing memory cost at O(1) per layer.

    The architectural divide is therefore as much about memory systems as about computation.


    4. Explicit Dynamics: The State-Space Regime

    State-space models (SSMs) such as Mamba, S4, RWKV, and Hyena take a genuinely different approach.

    Instead of explicitly comparing tokens, they:

    1. maintain a finite-dimensional state,
    2. update it sequentially as tokens arrive,
    3. let global context accumulate implicitly through dynamics.

    This replaces explicit comparison with state evolution.

    The benefits are real:

    • linear-time computation,
    • streaming capability,
    • low memory footprint,
    • strong performance on very long sequences with local or structured dependencies.

    But the limitation is structural:

    If the state has dimension d, it cannot faithfully encode O(n²) independent token–token relationships when n ≫ d.

    Information is compressed as it flows forward. Some distinctions are lost by design.

    This is not a flaw. It is the tradeoff.

    SSMs excel when:

    • long-range dependencies are compressible,
    • locality dominates,
    • throughput and context length matter more than arbitrary retrieval.

    5. The Role of Data (Often Under-Emphasized)

    Architecture alone does not determine how global structure is learned.

    Training data matters enormously:

    • Natural language has strong locality, redundancy, and hierarchical structure.
    • Code has explicit scoping, repetition, and long-range references.
    • Video and audio have smooth temporal dynamics.

    Transformers succeed partly because:

    • their inductive bias is weak,
    • large datasets teach them which comparisons matter.

    SSMs succeed where:

    • the data itself is compressible,
    • long-range dependencies can be summarized rather than retrieved exactly.

    In other words:

    Architecture determines what can be represented; data determines what needs to be represented.


    6. Implicit Constraints: The Variational / Lagrangian Regime

    A third regime replaces explicit comparison and explicit dynamics with implicit global constraints.

    These models define:

    • an energy, action, or constraint functional,
    • whose stationary point defines the representation.

    Examples include:

    • Deep Equilibrium Models (DEQs),
    • closed-loop / equilibrium transformers,
    • modern Hopfield-style associative memory networks.

    6.1 Implicit Depth and Gradient Flow

    In these models:

    • depth is not the number of layers,
    • it is the number of iterations required to reach equilibrium.

    This yields effectively unbounded depth without explicit stacking.

    Gradients are computed via implicit differentiation, rather than back-propagating through each iteration step. This mitigates classical vanishing/exploding gradient issues, but shifts sensitivity to conditioning and solver stability.


    6.2 Practical Costs

    • inference time is data-dependent,
    • convergence is not guaranteed in bounded steps,
    • conditioning matters enormously,
    • hardware utilization is poor due to iterative solvers and control flow.

    These models are powerful for:

    • global consistency,
    • constraint satisfaction,
    • associative reasoning,

    but remain operationally fragile at scale.


    6.3 Quantization and Numerical Stability

    An under-appreciated advantage of transformers is their robustness to aggressive quantization. Attention-based models routinely operate at 8-bit — and increasingly 4-bit — precision with minimal degradation.

    This robustness follows from:

    • feed-forward algebraic structure,
    • bounded activations via normalization,
    • absence of iterative convergence during inference.

    By contrast, it remains an open question whether variational and equilibrium models can maintain stable convergence under heavy quantization. Because these models rely on:

    • fixed-point iteration,
    • implicit solvers,
    • conditioning-sensitive dynamics,

    reduced numerical precision may affect convergence guarantees directly, rather than merely degrading output quality.

    As hardware efficiency increasingly depends on low-precision arithmetic, quantization tolerance becomes a first-class architectural constraint.


    7. Empirical Signatures of the Three Regimes

    • Transformers excel at precise global retrieval when data supports it and hardware can sustain dense compute.
    • SSMs excel when data structure allows aggressive compression and long sequential propagation.
    • Variational models excel when the task is fundamentally about satisfying constraints rather than retrieving facts.

    8. A Practical Decision Guide

    The right architectural question is not “what’s best?”, but:

    What must be preserved — and what can be traded away?

    • Need arbitrary random access → Transformers
    • Dependencies compressible, very long context → SSMs
    • Need global consistency → Variational components
    • Need multiple capabilities → Hybrid designs

    9. Hybrids: Not Speculative, Already Here

    Hybrid systems are not just algorithmic compromises — they are hardware-aware decompositions:

    • dense attention where arithmetic intensity is high,
    • state-space models where memory bandwidth dominates,
    • retrieval and tools where exact operations matter,
    • variational components where constraint satisfaction outweighs throughput.

    Successful hybrids reflect a single principle: explicit comparison is powerful but expensive, and should be used only where it is indispensable.

    An illustrative analogy.
    The distinction between explicit comparison and state-based dynamics can be made intuitive by analogy with composition versus continuation in music. Writing a new piece requires global structural decisions: motif selection, contrast, recurrence, and long-range planning. This is analogous to explicit comparison, where distant elements are actively related and reinterpreted. By contrast, extending an already-determined piece—maintaining its harmonic field, texture, and atmosphere—is primarily a matter of smooth propagation of state. This is where state-space dynamics excel. The analogy helps clarify why hybrid systems work best when these roles are separated in time or function: explicit mechanisms for planning and constraint-setting, followed by dynamic mechanisms for execution and continuation.

    This also explains why many naïve hybrids fail. When multiple mechanisms are applied indiscriminately to the same global-structure problem, the system pays the costs of each without gaining the benefits of either. Effective hybrids are not blends; they are partitions, with clear division of responsibility between comparison, propagation, and constraint enforcement.


    9.1 Hybrids as the Emerging Production Consensus

    The move toward hybrid architectures is no longer speculative. By 2025, it has become the dominant pattern in large-scale production models, particularly for long-context workloads where both expressiveness and efficiency matter.

    Several recent systems exemplify this convergence:

    • Jamba (AI21) combines state-space layers with transformer attention and mixture-of-experts routing, achieving context lengths beyond 256K tokens while maintaining high throughput.
    • Falcon-H1 (TII) interleaves parallel attention with Mamba-2 layers, targeting multilingual and long-context settings where memory bandwidth is the primary constraint.
    • Bamba (IBM) provides an open-source hybrid explicitly designed to reduce the memory overhead associated with full attention.
    • Related architectures (e.g. Zamba, Heracles, and similar designs) typically allocate 10–50% of layers to explicit attention, with the remainder implemented as state-space dynamics.

    Across balanced benchmarks, these hybrids consistently outperform both pure transformers and pure SSMs, not by inventing new primitives, but by assigning each mechanism to the role it performs best.

    This pattern reinforces the central claim of this paper: progress is not coming from replacing attention wholesale, but from restricting its use to the subproblems that genuinely require explicit comparison, while delegating long-range propagation and continuity to more efficient dynamics.


    10. Additional Axes and Open Frontiers

    The three-regime framework captures the dominant architectural tradeoffs, but several additional axes sharpen the picture.


    10.1 Recurrence vs. Parallelization

    • Transformers are fundamentally parallelizable across sequence length.
    • SSMs are fundamentally sequential, due to true recurrence.

    This affects not just inference, but training efficiency and scalability. Parallelism enables higher utilization and faster convergence per wall-clock time; recurrence enables constant memory and streaming computation. This is a deep computational divide.


    10.2 Generalization and Out-of-Distribution Behavior

    Different inductive biases lead to different generalization properties:

    • Transformers often generalize better on compositional and retrieval-based tasks.
    • SSMs often generalize better on temporal extrapolation and dynamical continuation.

    OOD reliability is therefore architecture-dependent, not merely data-dependent.


    10.3 Explicit Externalization: Tools and Memory

    When global structure cannot be efficiently computed or compressed internally, it is externalized:

    • retrieval systems,
    • databases,
    • code interpreters,
    • symbolic engines.

    This is not a failure mode but a fourth regime: explicit externalization of global structure. Modern systems already rely on this pathway to route around O(n²) limits.


    10.4 The Long Tail of Specialized Inductive Biases

    Highly structured data (graphs, sets, geometry) often favors specialized architectures:

    • graph neural networks,
    • equivariant models,
    • domain-specific solvers.

    These increasingly appear as components in hybrid systems, reinforcing the shift toward modular design.


    11. “But Large Transformers Already Work — Isn’t That Enough?”

    Yes — when O(n²) is affordable.

    But context windows are already pressing hardware limits, and many domains (video, audio, large codebases, agent memory) naturally exceed them. Existing systems already rely on retrieval, chunking, tools, and external structure.

    Hybrids are not about replacing transformers. They are about extending the regimes where transformers remain usable.


    12. Conclusion: Strategic Hybridization, Not Architectural Revolution

    Transformers dominate not because they are inevitable, but because they sit at the intersection of:

    • expressive global comparison,
    • data regimes that tolerate weak inductive bias,
    • hardware that rewards dense, regular computation.

    Progress beyond them is not coming from overthrow, but from strategic hybridization:

    • identifying where explicit comparison is indispensable,
    • replacing it elsewhere with dynamics, constraints, or external tools,
    • and aligning architecture choices with data structure and hardware realities.

    This is not stagnation. It is the mark of a maturing engineering discipline — one that understands its tradeoffs and designs accordingly.

  • When Intelligence Breaks the Systems It Touches

    When Intelligence Breaks the Systems It Touches

    Extraction, Pressure, and the Limits of Scalable Insight

    There is a class of systems in which intelligence becomes self-defeating once it scales.

    Not because the intelligence is wrong. Not because the models fail. But because extraction is inseparable from perturbation.

    In these systems, insight exists only while it is applied gently. Push too hard, and the structure that made the insight possible erodes. This is not a moral problem. It is a structural one.

    Markets belong to this class — though not every strategy reaches the boundary at the same speed, and not every domain with gradients rewards intelligence equally quickly.


    1. The Hidden Assumption

    Throughout this essay, “intelligence” means the same thing in every domain: the ability to identify, exploit, and systematically amplify a gradient in a complex system.

    That gradient may be informational (markets), physical (oil reservoirs, power grids), institutional (tax codes, regulation), or logistical (networks, supply chains). The form differs; the force does not.

    Much modern thinking quietly assumes a separation between knowing and acting. We behave as if intelligence can observe a system, extract information, and scale that extraction without altering the system itself.

    That assumption holds in static or weakly coupled environments. It fails in feedback-coupled ones.

    In such systems, observation requires interaction; interaction alters structure; and scaling induces regime change, not linear improvement. The system tolerates probing, but not sustained pressure.

    Automation does not change this structure, but it compresses the timescale: what once took years of primary extraction may now be exhausted in moments, making unrestrained intelligence catastrophic rather than merely erosive.

    The limit is not cognitive. It is structural.


    2. Two Kinds of Landscapes

    To understand the limit, we need a simple taxonomy — not about epistemology, but about what happens when intelligence scales.

    Type I: Weakly coupled landscapes

    • Analysis minimally alters the environment
    • Computation scales with limited back-reaction
    • Structure largely survives scrutiny

    Examples:

    • Mathematics
    • Formal optimisation problems

    Type II: Feedback-coupled landscapes

    • Observation changes dynamics
    • Exploitation alters the payoff surface
    • Scaling erodes the very structure being exploited

    Examples:

    • Financial markets
    • Ecosystems under harvesting
    • Adversarial regulatory systems

    The distinction is not philosophical. It is about capacity limits under scaling.


    3. Why “Alpha” Is the Wrong Metaphor

    Finance treats alpha as if it were a resource: something you find, bottle, and scale.

    This is a category error.

    Alpha is not a substance. It is a gradient.

    It exists only while the system is lightly perturbed. As extraction increases, the gradient flattens — not because intelligence weakens, but because the environment adapts.

    Different strategies encounter this limit at different capital thresholds.


    4. The Petroleum Engineering Analogy

    Petroleum extraction provides the cleanest physical analogue for what happens to alpha under scale, because it separates discovery, extraction, and environmental redesign with engineering precision.

    Primary Recovery: Natural Pressure

    An oil reservoir begins pressurised by geology. Oil flows naturally toward wells with minimal intervention. Extraction is cheap, local, and highly profitable.

    This corresponds to high-Sharpe, low-capacity strategies: small capital, steep gradients, minimal impact on the environment. Intelligence merely finds what already exists.

    Depletion: Extraction Degrades the Gradient

    As oil is removed, reservoir pressure drops. Flow slows. Each additional barrel is harder to extract, not because the oil has disappeared, but because extraction itself has degraded the enabling structure.

    In markets, this happens faster and more aggressively: arbitrage is competitive, gradients are informational rather than physical, and extraction actively destroys the signal through imitation and price response.

    Secondary Recovery: Pressure Maintenance

    To continue extraction, engineers inject water or gas to maintain pressure.

    This is not discovering new oil. It is intervening in the system to preserve extractability.

    Secondary recovery increases total yield — but only by redesigning the environment. It is capital-intensive, fragile, and fundamentally different from primary extraction.

    In markets, the analogue would be engineering volatility, preserving informational asymmetries, or structurally maintaining gradients. This is where regulation tightens.

    Enhanced Recovery: Environmental Redesign

    At the extreme, reservoirs are chemically or thermally altered to force oil out. The field is no longer natural; it has been redesigned around extraction.

    Markets explicitly forbid this stage when it serves private extraction.

    The legal and regulatory boundary in finance sits exactly here:

    • extraction is permitted,
    • pressure maintenance is constrained,
    • environmental redesign is prohibited.

    That boundary explains why alpha scales only so far.


    5. Persistence Requires Restraint

    The existence of limits does not mean extraction is fleeting.

    Some strategies persist for decades because they exercise restraint:

    • they remain below capacity thresholds,
    • exploit slowly renewing structure,
    • and avoid redesigning the environment that feeds them.

    This is why Jim Simons’ Medallion Fund worked for so long. It stayed small by design. Capacity was treated as a constraint, not a challenge.

    Persistence is achieved not by domination, but by self-limitation.

    Even when restraint is rational at the system level, it is often psychologically and institutionally unstable, because individual incentives reward immediate extraction over long-term preservation.

    This insight generalises.


    6. Adversarial Dynamics and Phase Transitions

    In feedback-coupled systems, competition does more than erase signal.

    It selects for opacity.

    Visible edges are copied and flattened. Surviving edges migrate into secrecy, latency, complexity, or institutional friction. What persists is not the best model, but the hardest one to observe.

    As coupling strengthens, systems do not degrade smoothly. They undergo phase transitions.

    A canonical example is the 2010 Flash Crash. Market intelligence had optimised normal-time efficiency so thoroughly that the system became hyper-fragile. When stress arrived, liquidity vanished discontinuously, prices collapsed, and recovery required external intervention.

    This is what “the system breaks” looks like: not gradual inefficiency, but abrupt loss of function.


    7. Why Infrastructure Cannot Exercise Restraint

    Infrastructure, logistics, and energy systems do not “fight back” when improved. Gains are cumulative, not self-erasing.

    Yet intelligence does not flood into them.

    The reason is not a lack of gradients. It is that infrastructure structurally cannot exercise restraint.

    Infrastructure creates value only when optimisation becomes common. A trading edge is profitable because others do not use it; an infrastructure improvement matters only when everyone does. Scale is not a side effect — it is the point.

    This has three structural consequences.

    First, infrastructure intelligence cannot remain small or selective. The moment it works, it demands broad rollout.

    Second, success forces visibility. Cables, grids, ports, and rights-of-way are physically anchored and jurisdictionally legible. Optimisation immediately collides with planning law, regulation, and the state.

    Third, optimisation destroys its own optionality. Gains are standardised, competitors free-ride, rents collapse, and political bargaining replaces technical optimisation.

    A contemporary illustration is renewable energy grid investment. Intelligence applied to generation, storage, and load balancing produces real gains — but once deployed, those gains become public infrastructure, not a defensible edge. Returns flatten precisely because the optimisation succeeds.

    This is why early infrastructure intelligence — exemplified by Paul Allen’s repeated investments in fibre and backbone capacity — failed to capture durable rents. The failure was not technical. It was structural.


    8. Deliberate Under-Optimisation in Fiscal Systems

    Tax enforcement often appears to fail because of weak resources, political hesitation, or legal complexity. This appearance is misleading.

    In reality, modern fiscal systems stabilise at a point of deliberate under-optimisation — not because enforcement intelligence is unavailable, but because scaling it further becomes self-destabilising.

    The United Kingdom provides a clean illustration. The UK has repeatedly committed to tackling offshore tax abuse, yet has consistently failed to enforce transparency measures — such as public beneficial ownership registers — across its own Overseas Territories, despite clear legal authority and repeated deadlines.

    Aggressive enforcement intelligence in a globalised system triggers feedback effects: capital relocation, legal arbitrage, retaliatory policy competition, and concentrated political backlash from embedded financial and legal interests. The legal distinction between avoidance and evasion functions as a pressure-release valve, allowing optimisation without collapse.

    Beyond a threshold, enforcement ceases to be stabilising and becomes destructive.

    As a result, fiscal systems do not maximise compliance. They select a survivable equilibrium: enough enforcement to maintain legitimacy, but not so much that intelligence destabilises capital flows, institutional networks, or political coalitions.

    Markets must restrain themselves to survive. Infrastructure cannot restrain itself. Fiscal systems restrain intelligence by design, even while rhetorically demanding more of it.


    9. The Boundary Condition

    Some systems allow extraction without redesign. Some systems constrain redesign and therefore self-limit extraction.

    Persistence depends on restraint — whether imposed by rules, chosen strategically, or structurally unavailable.

    Alpha fades not because intelligence weakens, but because systems break when intelligence refuses to stop.

    That is not ideology. That is systems theory.

    https://thinkinginstructure.substack.com/p/when-intelligence-breaks-the-systems

  • Why the AGI Architecture Isn’t Discussed Plainly — Even Though the Components Are Everywhere

    Why the AGI Architecture Isn’t Discussed Plainly — Even Though the Components Are Everywhere

    AI discussion tends to oscillate between two poles:

    • corporate optimism (“assistants and copilots”), and
    • superhuman speculation (“godlike AGI”).

    What we rarely see in public-facing discourse is the middle framing : the systems view familiar to cognitive science and robotics:

    Modern AI research is quietly assembling the classic ingredients of a cognitive architecture: memory, perception, world-modelling, action, and reward.

    This isn’t hidden knowledge. It’s referenced constantly in technical settings.

    The puzzle isn’t “why doesn’t anyone know this?” The puzzle is “why doesn’t this framing show up in public conversation?”

    Below is a grounded explanation: not secrecy, not conspiracy but just incentives, rhetoric, and communication asymmetry.


    1. The Research Community Already Talks This Way

    Cognitive architectures are not new ideas:

    • SOAR
    • ACT-R
    • Global Workspace Theory
    • Predictive Processing
    • reinforcement learners with learned world models
    • multi-agent planning systems
    • modern world-model agents (Dreamer, MuZero, etc.)

    If you attend NeurIPS, ICML, RSS, or CogSci, researchers routinely discuss:

    • memory structures
    • planning modules
    • latent world representations
    • reward shaping
    • embodied control loops

    None of this is taboo in research.

    What’s striking is how little this framing appears in public-facing AI conversation.


    2. Concrete Example:

    The Gato Case Study

    When DeepMind released Gato — a single model performing hundreds of tasks (vision, action, dialogue) with a shared latent representation — the technical discussion revolved around:

    • unified policy representations
    • cross-modal generalisation
    • steps toward cognitive integration

    Public coverage, however, called it:

    • “a more flexible chatbot,”
    • “a general-purpose assistant,”
    • “a precursor to better robots.”

    Same system. Two completely different framings.

    This is not deception. It’s communication strategy.


    3. Why Companies Avoid the Cognitive-Architecture Frame

    The reason is simple and unromantic: it’s an unhelpful narrative for selling products or explaining risk.

    • “Copilot” is safe.
    • “Synthetic agent with persistence and goal formation” triggers legal, regulatory, and reputational complications.

    Other practical reasons:

    • Regulatory optics: Any hint of autonomous goal systems invites scrutiny under emerging AI regulations.
    • Product boundary clarity: A “tool” has clear affordances. A “mind-like architecture” does not.
    • Internal alignment: Corporate AI teams often work in silos; no one wants to declare they’re building a cross-silo cognitive system.

    Nothing here is secret. It’s just commercially rational framing.


    4. The Military Factor: Bureaucratic, Not Covert

    Defence-funded research actively explores:

    • autonomous navigation
    • multi-modal perception
    • world-model planning
    • reward-driven RL agents
    • robust robotic control

    But it is framed bureaucratically as:

    • “autonomy improvements,”
    • “mission planning,”
    • “navigation robustness,”
    • “decision-support tools.”

    Not because the unified architecture is forbidden, but because “synthetic cognition” triggers political, ethical, and policy complications that defence institutions are structurally incentivised to avoid.

    This is bureaucracy, not secrecy.


    5. Why the “Superhuman AI” Narrative Wins Public Mindshare

    Here is the genuinely under-discussed psychological factor:

    Superhumanism preserves distance. It keeps AI safely “other.”

    People are more comfortable imagining:

    • an alien superintelligence,
    • a godlike optimizer,
    • a transcendent reasoning entity

    than confronting the idea that AI might instead become:

    • familiar,
    • continuous with us,
    • running versions of mechanisms cognitive science already attributes to human minds.

    Decades of empirical work show that people routinely resist mechanistic framings of human cognition and not because they’re wrong, but because they feel deflationary. We’ve seen this with:

    • predictive-processing accounts of perception
    • computational theories of memory
    • mechanistic models of emotion and decision-making

    So yes, human exceptionalism plays a role, but it’s one factor among several — not the whole story.


    6. Counterexample:

    Attempts at This Framing Rarely Stick

    Occasionally, major researchers do attempt the unified-systems framing:

    • Yann LeCun talks openly about “autonomous agents with world models.”
    • Demis Hassabis has described AI as “systems that can plan, remember, and act.”
    • Microsoft’s research on memory-augmented agents frames models as long-term planners.

    But these statements rarely propagate beyond technical audiences. In the press and on social platforms, they get flattened into:

    • “smarter assistants,”
    • “more capable models,”
    • “steps toward AGI.”

    This isn’t suppression. It’s a translation problem. Mind-like systems don’t fit easily into existing public narratives.


    7. What’s Actually Missing:

    A Middle Vocabulary

    The public currently has two dominant frames:

    • AI as tool (assistants, copilots, automation)
    • AI as godlike other (superintelligence, existential risk)

    What’s missing is the middle frame:

    AI as an evolving systems-integration project that overlaps heavily with cognitive science.

    This framing is accurate, grounded in decades of research, and describes what is actually happening in labs, but it lacks a natural constituency:

    • too technical for the general audience
    • too philosophical for PR
    • too messy for regulators
    • too mundane for futurists

    So it drifts into the background.


    Conclusion:

    No Taboo. Just a Framing Asymmetry

    There is no “forbidden AGI blueprint.” No secret knowledge. No institutional conspiracy of silence.

    Researchers openly study memory, control, world models, perception, planning, and reward integration. The ingredients of cognition have been on the table for decades.

    The silence comes from incentives and rhetoric:

    • Companies prefer tool framing.
    • Defence prefers subsystem framing.
    • Media prefers superhuman narratives.
    • The public struggles with mechanistic accounts of minds.
    • And nobody “owns” the systems-integration story.

    The result is a framing gap:

    The public is told stories, while the research world builds systems.

    https://thinkinginstructure.substack.com/p/why-the-agi-architecture-isnt-discussed

  • The Hidden NP-Complete Problem Sitting in Your Accounting Department

    Why matching payments to invoices sometimes defeats software — and what that reveals about modern work.

    Everyone learns about NP-complete problems in computer science.
    Almost nobody realises that one of them is hidden in the most routine corner of business life:

    applying a customer payment to a list of open invoices.

    This isn’t a metaphor.
    It is literally the subset-sum problem — formally catalogued by Garey & Johnson (Computers and Intractability, 1979) — and explicitly discussed in accounting-reconciliation research such as Pettersson & Strömberg (2007), who identify multi-item invoice matching as a computationally hard variant of subset selection.

    But the important point is not that the equivalence exists.
    It’s that everyday business practice routinely generates worst-case instances of a famous computational barrier — and accountants are the ones who run into it.


    A Worked Example That Shows the Entire Problem

    Take a payment of £4,215.

    The customer has nine open items:

    • £600
    • £615
    • £700
    • £1,200
    • £1,300
    • £1,415
    • £2,000
    • £2,015
    • (£300) credit note

    Try the obvious strategies:

    • Greedy (largest-first) → fails
    • Date proximity → fails
    • Similar-amount grouping → fails

    The correct match?

    £1,415 + £1,300 + £1,200 + (£300 credit note) = £4,215.

    This kind of combination is common in real accounts — especially when customers drip payments or credit notes distort the pattern.

    And the combinatorics behind the scene are brutal.
    With 1,000 open invoices, the search space is 2¹⁰⁰⁰ — vastly more than atoms in the observable universe.

    This is what ERP systems quietly face.


    Why This Isn’t Just a Trivia Fact

    A few operations-research papers note the connection between reconciliation and subset-sum, but very little writing explains why real-world accounting systems produce the hardest instance types:

    1. Repeated invoice amounts
      Creates dense clusters → many candidate subsets.
    2. Staggered and partial payments
      Three small payments → exponential branching across ten invoices.
    3. Credit notes and adjustments (negative numbers)
      Multiply the space of feasible combinations.
    4. Long account histories
      5–15 years of open items is normal in large ERPs.
    5. Exact-to-the-penny matching
      No numerical tolerance → no approximate shortcuts.

    In other words:
    ordinary bookkeeping practices routinely generate pathological subset-sum instances.


    ERP Systems Know This — They Just Don’t Say It

    When an ERP displays:

    “Unable to automatically apply payment.”

    the real meaning is:

    “You have asked me to solve an NP-complete instance for which no guaranteed fast method exists. Please be the algorithm.”

    And this is not speculation.
    Real ERP documentation says exactly this — but in more diplomatic language.

    • SAP Note 310597 (“Automatic Clearing: Limitations and Manual Intervention”) explicitly acknowledges that SAP’s F.13 auto-clearing fails for “complex multi-item combinations” or when credit memos create ambiguous matches, and must be resolved manually.
    • NetSuite’s Help Center — “Applying Payments to Multiple Invoices” states that automatic application may not complete when invoice/credit memo combinations “require user judgment.”
    • Oracle Receivables User Guide — “Automatic Receipt Processing Limitations” similarly lists cases where auto-apply halts because “multiple plausible matches exist.”

    All three systems — along with Microsoft Dynamics — converge on the same truth:

    The software stalls exactly where the mathematics becomes hostile.

    Meanwhile, credit controllers perform live combinatorial optimisation.


    What Matching Engines Actually Do

    Commercial reconciliation tools survive by using layered heuristics:

    • date proximity
    • behavioural priors (typical ways a customer pays)
    • amount clustering
    • machine-learned likelihood scoring
    • ILP solvers for isolated subproblems
    • manual review for anything ambiguous

    These handle most cases.
    But substantial manual effort persists across large organisations, even after decades of automation attempts — because the bottleneck isn’t a missing feature, it’s a mathematical limit.

    AI doesn’t escape this.
    Machine-learning tools don’t “solve” the problem; they learn better heuristics for navigating an NP-complete search space.
    Manual review remains essential because the hardness is structural, not technological.

    And once you accept that, the deeper point comes into view.


    The Larger, More Interesting Point

    This isn’t really about accounting or ERP failures.
    It’s about a much broader phenomenon:

    Many workflows in modern organisations look trivial on the surface yet sit directly on top of computationally hard problems.

    Invoice matching is just the clearest example.
    Other cases include:

    • multi-leg cash application
    • FX netting across global entities
    • portfolio allocation under constraints
    • warehouse picking optimisation
    • shift scheduling
    • bundled-product revenue recognition
    • supply-chain backorder allocation

    The “clerical” layer often conceals a theoretical limit — and a persistent research opportunity.

    Research implication:
    Domain-specific versions of subset-sum may admit specialised algorithms far more efficient than generic formulations. This is an underexplored intersection of computer science, accounting, and operations research.

    The next time an ERP system refuses to apply a payment automatically, don’t assume incompetence.
    Sometimes it’s telling you the truth:

    Some tasks in modern business are small on the surface — and NP-complete underneath.

  • The Hidden Geometry of Chess

    The Hidden Geometry of Chess

    Why “solving chess” is really a question about structure, not speed

    People talk about “solving chess” as if it’s just a matter of more computing power or a slightly better engine. That’s wrong.

    We aren’t blocked because Stockfish isn’t fast enough. We’re blocked because, as far as we can tell, chess looks like an enormous pile of unrelated positions. Engines thrash through that pile with astonishing efficiency, but they don’t compress it, they don’t explain it, and they definitely don’t solve it in any mathematical sense.

    If chess ever becomes solvable in a meaningful way, it will be because someone finds a hidden structure that lets us treat that huge pile of positions as one object with internal geometry.

    This piece is about what that would actually mean.


    1. What “solving chess” really is

    Forget engines for a moment.

    Mathematically, solving chess means:

    For every legal position, assign a value: “win”, “draw”, or “loss”, under perfect play, and give a corresponding best move.

    You can think of this as a gigantic lookup table:

    • Each position is a point in an absurdly large space
    • The perfect-play value is a label on that point

    Right now, we know a lot about tiny corners of this space:

    • 7-piece endgames are solved exactly
    • Some opening lines are mapped out very deeply
    • Engines can estimate values locally with terrifying precision

    But the global map — the full geometry of “win / draw / loss” across all positions — is completely opaque.

    The key question is not:

    “Can we search deeper?”

    It’s:

    “Is there any structure in that function from positions → values, or is it essentially random at scale?”

    If it’s random, we’re done. No cleverness will save us: solving chess is just a matter of raw brute force at inhuman scales.

    If there is structure, then the interesting work is to find it and formalise it.


    2. Think of chess as a weird landscape

    One useful way to think about chess:

    • Imagine every legal position as a point in a high-dimensional space
    • To each point, attach a number between –1 and +1 (loss to win); call this number the “value”
    • We get a bizarre landscape: hills (winning positions), valleys (losing positions), long plateaus (drawn positions)

    Engines don’t see the whole landscape. They only see:

    • The immediate neighbours of where they stand (positions reachable in a few moves)
    • Short local paths through the terrain (search trees)
    • A heuristic sense of where the hills and valleys might be (evaluations)

    What we don’t know is whether this landscape is:

    • Structured — smooth in some hidden sense, decomposable, compressible
    • Or chaotic — values fluctuate in a way that, beyond small endgame islands, is essentially intractable

    To ask “is there structure?” in a serious way, we need more than metaphors. We need to propose what “structure” would look like in concrete, testable terms.


    3. Three ways chess might secretly be structured

    Here are three concrete structural possibilities. If any of them turn out to be true (even approximately), they’d radically change how we think about solving chess.

    3.1. Low-dimensional geometry: the “few hidden directions” hypothesis

    This is the idea that:

    Although the state space of chess is astronomically large, the value function is governed by a small number of underlying “directions”.

    Analogies:

    • In physics, complex systems often reduce to a few dominant modes (think of how a vibrating drum can be described by a few main frequencies).
    • In machine learning, deep networks often implicitly compress data into low-dimensional features.

    Translated to chess:

    • Take a big sample of positions from strong engine games.
    • For each position, record a good evaluation (e.g. win probability from a top engine).
    • Build a graph that connects positions which are both:
      • closely related (one move apart, or structurally similar), and
      • have similar evaluation values.
    • Now ask:

    “Can we describe this evaluation function mostly using a small number of ‘basis patterns’ on this graph?”

    If the answer is yes — if the evaluation surface can be well-approximated by combining, say, 50 or 100 patterns on a graph with millions of positions — then chess has a kind of hidden geometry. That would be a big structural claim.

    If the answer is no — if you need thousands or millions of independent patterns — then the “few hidden directions” hypothesis dies, and with it any hope of that particular kind of compression.

    Either way, it’s a concrete empirical question.


    3.2. Coarse-graining: the “macroscopic chess” hypothesis

    Renormalisation in physics works like this:

    • You ignore microscopic details
    • You look at a system at a larger scale (block spins, average behaviour)
    • Amazingly, the large-scale behaviour often obeys simple, stable laws

    Is there an analogue for chess?

    That would mean something like:

    If you group positions by certain coarse features — material balance, pawn structure, blocked vs. open, king safety patterns — the average value within each group behaves in a stable, self-consistent way.

    Concretely, you could try things like:

    • Group positions that have:*
      • the same material counts, or
      • exactly the same pawn structure, or
      • the same “blocked board” when you tile the board into 2×2 or 4×4 squares and only record whether each tile has minor pieces, majors, kings, etc.
    • For each group, compute the average engine evaluation. That gives you a coarse “macroscopic” evaluation.
    • Now “zoom out” again: group these coarse states in an even rougher way and check if the new averages are consistent.

    If, after a few such compressions, things stabilise — i.e. the coarse description repeats itself up to small noise — then chess has a phase structure: macroscopic classes that behave predictably regardless of micro-detail.

    If nothing stabilises and everything stays sensitive to microscopic details all the way up, then chess is “RG-hostile”: no renormalisation structure to exploit.

    Again: this is testable.


    3.3. Decomposition: the “sum of local fights” hypothesis

    This is the most intuitive one.

    Informally:

    Most real positions feel like a few largely independent local fights (king side vs queen side, a pawn majority, a piece trap), plus some interaction between them. Could the value of the position be approximated as “sum of local values plus a small correction”?

    Rough sketch of how you’d test this:

    1. For a given position, build an “influence graph” over the board: connect squares/pieces that directly attack or defend each other.
    2. Partition this influence graph into a few regions (clusters with strong internal connections, weak connections between clusters).
    3. For each region, treat it as a smaller sub-position and run a local engine evaluation on it (with a simple way of handling the “outside world”, e.g. frozen pieces).
    4. Add up these local evaluations and compare to the full-engine evaluation of the original position.

    If you find that:

    • For most positions arising in serious play,
    • The difference between “sum of local evaluations” and “true evaluation” is small and bounded,

    then chess is decomposable: the global value almost always factorises into local parts plus a modest interaction term.

    If that difference is often huge and scales with position complexity, then the “sum of local fights” intuition is simply wrong at the value level, however psychologically natural it feels to humans.

    And once again: this is something you can actually measure.

    The Chess Geometry Explorer

    Testing the “Inherent Order” vs “Random Chaos” of the value function.

    1. Low-Dim Geometry
    Values follow smooth “hills” (dominant modes).
    2. Coarse-Graining
    Averages stabilize into macroscopic grids.
    3. Local Decomposition
    Value is a sum of separated local fights.
    4. Chaos (RG-Hostile)
    No structure. Random, incompressible complexity.
    Structural Hypothesis: Low-Dimensionality
    V(P) ≈ Σ α_i φ_i(P) … for small i

    4. How you’d test these hypotheses in practice

    All three ideas (low-dimensional geometry, coarse-grained phases, decomposability) can be tested with the same basic recipe:

    1. Generate lots of positions
      • Sample from strong engine self-play (Stockfish / Leela).
      • Include a mix of openings, middlegames, and endgames.
    2. Evaluate them with a very strong engine
      • Use deep search or a strong neural net head to get a “ground truth” evaluation U(P) for each position P.
    3. Build the structures you care about
      • For geometry: build the value-similarity graph and do a spectral analysis (see how fast “energy” collapses into a few modes).
      • For coarse-graining: group positions by material/pawns/blocked tiles/king-zones and see whether averages stabilise when you compress repeatedly.
      • For decomposition: partition positions into regions and see how well the sum-of-local-values matches the whole.
    4. Look for clean patterns or clear failures
      • Either: “most of the structure is captured by K ≈ log N patterns / groups / local terms”,
      • or: “no such collapse happens; complexity stays high everywhere.”

    In other words, this is not about faith in hidden order. It’s about specifying exactly what kind of order would help, and then going looking for it with real data.


    5. The hard counter-arguments (and why they matter)

    There are good reasons this might all fail.

    • In complexity theory, “most” Boolean functions are essentially incompressible: to describe them you need something as big as the truth table itself.
    • Certain games can encode instances of SAT or other hard problems; their value functions inherit this hardness.
    • Large graphs can be expanders — highly connected in a way that destroys nice clustering or low-dimensional embeddings.

    If chess’s value landscape is “generic” in these senses, no amount of clever geometry will save us. Any function that compresses it would also compress hard problems we don’t know how to tame.

    The point of making explicit structural hypotheses is that disproving them is also progress: if you can show that the value landscape fails every reasonable notion of structure, that’s a strong argument that “solving chess” really is computationally hopeless beyond small fragments.


    6. Why this matters even if we never solve chess

    Even if grand “solve chess” ambitions die, this structural line of attack matters for other reasons:

    • It forces us to think about chess positions as a population with statistical and geometric properties, not just individual puzzles.
    • It links computer chess to serious areas of mathematics and theoretical computer science: spectral graph theory, discrete harmonic analysis, renormalisation ideas, additive combinatorics.
    • It gives a principled way to design better evaluation architectures: if you know decomposability holds, you’d design networks and search schemes that exploit it.

    And more broadly, it’s a prototype for how to reason about other large decision spaces where we suspect “hidden structure” but don’t want to just chant that phrase and move on.


    7. The honest bottom line

    Right now, this is a sophisticated promissory note:

    Either the chess value function has some kind of global structure — spectral, coarse-grained, or decomposable — or it doesn’t. If it does, we should be able to find evidence for it with the kinds of experiments sketched above. If it doesn’t, we should be able to demonstrate that too.

    Engines answered the question “how strong can a machine play?”

    This is aimed at a different question:

    “Is there any mathematical order in the perfect-play value of chess, or is the game, at that level, structurally indistinguishable from a random hard function?”

    That’s a much less romantic question than “who wins with best play?”, but it’s arguably more fundamental. If we knew the answer, the whole conversation around “solving chess” would finally stop being handwavy and become a matter of actual geometry.

    https://thinkinginstructure.substack.com/p/the-hidden-geometry-of-chess

  • The Achilles Limit: When Quantum Feedback Can’t Quite Keep Pace

    Modern quantum computers are increasingly limited not just by noise in their components, but by the difficulty of acting on quantum information fast enough to matter.

    This is not a failure of materials or fabrication. It is a consequence of control: the unavoidable fact that acting on a quantum system means responding to information that is already out of date.

    This is not a new problem — but it is an old one we have forgotten how to recognize.

    More than two thousand years ago, Zeno described a paradox in which Achilles can never overtake a tortoise, because before he reaches where the tortoise is, he must first reach where it was. By the time he arrives, the tortoise has moved on.

    Mathematically, the paradox dissolves. Achilles wins.

    Physically, however, the structure of the problem has quietly returned — inside the control loops of quantum machines.


    Control Is Always Late

    To control any physical system, three steps are unavoidable:

    • Measurement — extracting information about the system
    • Inference — processing that information to decide what to do
    • Actuation — applying a control signal to correct or stabilize the system

    In classical engineering, these steps can often be made fast enough that delay is negligible. The system barely changes while the controller thinks.

    Quantum systems are different.

    Measurement disturbs the system being measured. Information arrives stochastically rather than deterministically. And the system continues evolving — sometimes rapidly — during every moment of inference and actuation.

    Control, in other words, is always aimed at the past.

    Achilles runs. The quantum state moves. Feedback chases where it was.


    Where This Shows Up in Hardware

    The Achilles problem is not abstract. It appears in real quantum machines.

    In trapped-ion systems, logical operations often proceed via Rabi oscillations at tens to hundreds of kilohertz. Errors accumulate on comparable timescales.

    By contrast, high-fidelity state measurement typically takes microseconds. During that window — before any correction can even be decided — the quantum state continues evolving through many cycles of the very dynamics one is trying to control.

    The tortoise is moving at tens or hundreds of kilohertz. Achilles must stop for microseconds to look.

    Superconducting qubits exhibit a related tension. Signals must travel from millikelvin cryogenic hardware to room-temperature electronics and back. Even at near–speed-of-light propagation in cryogenic cabling — roughly 5 nanoseconds per meter — a few meters of wiring introduce tens of nanoseconds of irreducible delay before any classical processing occurs.

    These delays are not accidents of poor engineering. They are consequences of how quantum information must be extracted, transmitted, and acted upon in a hybrid quantum–classical system.


    Why This Is Structurally Hard

    Quantum computers survive only because of feedback. Error correction, state stabilization, and adaptive control all depend on monitoring fragile quantum states and responding in real time.

    But the architecture is inherently hybrid:

    • The quantum system evolves continuously and probabilistically.
    • The classical controller operates discretely, downstream from measurement.
    • The interface between them is noisy, delayed, and irreversible.

    Extracting more information helps only up to a point. Measurement introduces backaction. Acting faster risks injecting additional noise. Acting more gently allows errors to grow.

    Achilles does not fail categorically. He may catch the tortoise locally. But doing so becomes progressively more costly as the system evolves faster than the controller can respond without destabilizing it.


    A Necessary Detour: Prediction and the Quantum Zeno Effect

    Two obvious objections arise at this point.

    Why Not Aim Ahead?

    Modern control theory does not simply chase the present; it predicts the future. Kalman filters, model-predictive control, and observers all attempt to act on where the system will be, not where it was.

    These techniques are already used in quantum control, and they can dramatically reduce effective latency.

    But prediction comes at a price. It relies on accurate models. In quantum systems, modeling error does not merely reduce performance — it feeds directly into backaction, instability, or decoherence. A controller that aims ahead and misses does not merely lag; it perturbs the system in the wrong direction.

    Prediction shifts the Achilles problem forward in time. It does not eliminate it.

    Why Not Measure Faster?

    At the opposite extreme lies the Quantum Zeno Effect: measure frequently enough, and evolution can be frozen altogether.

    Here the Achilles metaphor turns ironic. If Achilles looks too often, the tortoise stops moving.

    But this too reveals a tradeoff rather than an escape. Zeno-style stabilization relies on strong, frequent measurement — precisely the regime where backaction dominates and usable dynamics are suppressed. One can halt motion, but not compute.

    Between slow pursuit and frozen observation lies a narrow operating regime. It is there — not at either extreme — that scalable quantum control must live.


    Feedback, Tradeoffs, and the Waterbed Question

    From a classical control perspective, this entire discussion may sound familiar.

    The Bode sensitivity integral tells us that reducing sensitivity in one frequency band necessarily increases it elsewhere. Push the waterbed down here, and it rises there.

    One interpretation of the Achilles problem is that it is simply the quantum manifestation of this principle.

    The conjecture raised here is more cautious — and more specific:

    Quantum systems may impose a hard floor on how far such tradeoffs can be pushed, because delay, measurement backaction, and finite signal propagation are not merely engineering imperfections but physical constraints.

    In classical systems, delay can often be absorbed into redesigned controllers without changing long-term stability. In quantum systems, the same delay is entangled with disturbance, irreversibility, and probabilistic state update.

    Whether this distinction is fundamental or merely contingent remains an open question.


    Engineered Dissipation: Winning by Not Chasing

    Notably, some of the most robust quantum stabilization strategies avoid active pursuit altogether.

    Engineered dissipation, autonomous error correction, and attractor-based dynamics succeed precisely because they replace real-time inference with geometry. Instead of chasing the state, they shape the landscape so that unwanted motion decays on its own.

    These approaches work not because feedback is ineffective, but because pursuit itself has limits.

    Achilles does best when the track tilts toward the finish line.


    A Testable Conjecture

    The conjecture is simple to state, and careful in scope:

    It remains an open question whether control latency in quantum systems can always be absorbed into feedback laws without introducing new stability costs or unfavorable scaling constraints.

    If true, this would mean that some errors persist not because qubits are too noisy, but because information about their state arrives too late to be acted upon without causing further disturbance.

    This is not a claim about slow computers or inadequate electronics. Even with arbitrarily fast classical processing, measurement takes time, signals take time to propagate, and the quantum system does not wait.


    What Would Prove This Wrong?

    A strong idea must name its own failure modes.

    The Achilles conjecture would be falsified by a control protocol that achieves arbitrarily low steady-state error in a continuously evolving quantum system despite finite, nonzero delay between measurement and actuation.

    Alternatively, a proof that feedback delay can always be absorbed into a redefinition of the control law — without degrading long-term stability or scaling — would render the conjecture false.

    Such results may already exist. Or they may not.

    Either way, the question has rarely been asked this directly.


    Why This Matters Now

    As quantum hardware improves, control — not materials — is becoming the bottleneck. Coherence times are longer. Noise is better understood. What increasingly limits performance is the ability to respond fast enough, gently enough, and accurately enough to what the system is doing right now.

    If control latency imposes a fundamental constraint, it will shape which architectures scale and which do not. It may also explain why some of the most promising approaches rely less on active feedback and more on engineered dissipation — not because feedback fails, but because pursuit has limits.

    Achilles eventually overtakes the tortoise on paper.

    The question is whether physics has already answered the race — or whether Achilles is still running.

    https://thinkinginstructure.substack.com/p/the-achilles-limit-when-quantum-feedback

  • THE ANALYTIC STRUCTURE OF CONSTANTS

    THE ANALYTIC STRUCTURE OF CONSTANTS

    How singularities and symmetry determine the speed of numerical approximation

    Some mathematical constants are easy to approximate. Others converge painfully slowly. A few remain stubborn even after centuries of work. This variation is not random. It reflects the analytic structure of the functions that define the constants.

    The central idea of this article is simple:

    The ability of a function to continue analytically beyond the real line determines how fast any basic approximation method can converge. The location of singularities and the presence of global symmetries influence the decay of coefficients in Taylor, Fourier, or related expansions, and that decay controls the speed of computation.

    This gives us a clear way to understand why certain constants are intrinsically slow and why others allow rapid algorithms once the right structure is identified.


    1. Local and Global Analytic Structure

    Constants inherit their computational difficulty from the analytic behaviour of the functions behind them.

    Local structure

    Some functions have singularities very close to the real axis. For example:

    • arctan has singularities at ±i

    • 1/x has a pole at 0

    • algebraic functions have branch points near their roots

    Such functions have a limited radius of convergence for their power series. Their coefficients decay only at a polynomial rate, and this restricts how fast any elementary approximation can converge. By “elementary,” we mean methods that use:

    • Taylor expansions

    • Euler–Maclaurin corrections

    • Riemann sums and trapezoidal rules

    • simple algebraic transformations

    • Machin-type arctan decompositions

    These methods rely solely on real-line information and do not use any global structures such as periodicity or modular symmetry.

    A brief historical aside

    The contrast between “local” and “global” structure is not just a theoretical classification. When modular-form formulas for π were discovered and refined, the speed was so extraordinary that the Chudnovsky brothers built a home-made supercomputer in their New York apartment in the 1990s specifically to exploit them. The machine, assembled from spare parts and cooled with improvised plumbing, set world records for digits of π. It remains one of the clearest demonstrations of how global analytic structure can translate directly into raw computational power.

    Global structure

    Other functions behave nicely over large regions of the complex plane. Examples include:

    • sin(πx), which is entire and periodic

    • modular forms, which are analytic on the upper half-plane and satisfy transformation laws

    • elliptic functions, which are doubly periodic

    Their Fourier or spectral coefficients decay exponentially or faster, and this creates the possibility of very rapid convergence. Algorithms that use these structures are not elementary in the sense defined above. They rely on analytic continuation and global symmetry.


    2. Why Analytic Structure Determines Convergence

    The mechanism behind the phenomenon is classical. If a function is analytic inside a disk of radius R, then its Taylor coefficients are bounded by M divided by R to the power n. This means:

    • a nearby singularity (small R) leads to slow coefficient decay

    • entire behaviour (large R) gives exponential decay

    • modular or elliptic symmetries can create even faster decay

    Since all basic approximation schemes ultimately depend on expansions of this sort, the rate of coefficient decay sets a hard limit on the speed of convergence.

    This is a precise mathematical fact, not a heuristic.


    3. Constants Limited by Local Singularities

    These constants can only be reached slowly with elementary methods.

    π through arctan

    The singularities of arctan at ±i are at distance 1 from the real axis. Its Taylor coefficients behave like 1/n, which gives convergence of order 1/n for the usual Gregory series. This proves that real-line Taylor methods for π must be slow.

    Machin-type formulas help only because arctan(1/q) moves the singularities farther away, but the convergence is still polynomial.

    e and the logarithm

    The standard definitions through integrals or ODEs involve local behaviour. Any Riemann-sum or Euler–Maclaurin approach remains slow for the same analytic reason.

    γ (Euler–Mascheroni)

    The constant γ is the limit of Hₙ minus ln n. The defining function 1/x has a singularity at 0, so any elementary method that uses derivative information of 1/x, including Euler–Maclaurin, can only achieve polynomial convergence. There is no known elementary method that gives exponential decay of coefficients.


    4. Constants that Become Fast Once Their Global Structure Is Recognized

    ζ(2)

    The naive series 1 + 1/2² + 1/3² + … converges slowly. This is exactly what the coefficient-decay principle predicts.

    The situation changes completely once ζ(2) is linked to the sine function. The infinite product for sin(πx) is entire and periodic, so its associated coefficients decay exponentially. Fourier expansions and spectral methods then provide rapid convergence and lead directly to the closed form π²/6.

    This is the clearest example of how identifying the right global structure can transform a slow constant into a fast one.

    The Analytic Speed Limit

    Bars show digits gained per iteration. Local singularities (red) cap progress; global symmetries (green) accelerate it.
    Current Iteration
    0
    Step Size
    100
    Local (polynomial)
    Global (exponential)
    Click Run 100 repeatedly to see divergence.

    5. Constants With No Known Usable Global Structure

    ζ(3)

    The constant ζ(3) is analytically well-defined, and many series exist for it, but none of the known representations produce exponentially decaying coefficients using elementary constructions. At present there is no known periodic expansion, no simple entire product, and no modular-form identity that generates a rapidly convergent expression. Some series converge reasonably well, but never in a truly exponential way without heavy analytic work.

    Catalan and elliptic constants

    These constants are connected to functions with branch cuts and deep symmetries that are difficult to exploit. No simple representation with rapid coefficient decay is known.


    6. The Mechanistic Pattern

    The behaviour of constants now follows a very simple pattern:

    Local singularities produce polynomial convergence. Examples include π via arctan, e, the logarithm, γ, and the naive series for ζ(2) and ζ(3).

    Global periodicity or entire behaviour produces exponential convergence once the structure is used. Examples include ζ(2) through the sine product, and fast π algorithms based on modular forms.

    Deep analytic structure without accessible symmetry produces no known fast elementary convergence. Examples include ζ(3), Catalan’s constant, and elliptic integrals.

    The pattern is not historical. It is a direct consequence of standard complex analysis.


    7. Why Modular Forms Create Fast Algorithms for π

    Modular forms satisfy transformation laws that relate values at different points in the upper half-plane. By moving to regions where q = exp(2πiτ) is extremely small, one obtains series whose coefficients fall away at a superexponential rate. This behaviour is the reason the Chudnovsky and Ramanujan series converge so quickly. They harness global symmetry that elementary methods cannot access.

    This explains why polygon-based approximations are slow and why modular methods are exceptionally fast. The analytic behaviour is fundamentally different.

    Chudnovsky π Calculator

    Ready.
    
        

    8. Counterexamples and Edge Cases

    BBP formulas for π

    Although the BBP series looks elementary, its derivation relies on analytic continuation of polylogarithms and special algebraic identities. It does not fall under the elementary methods described here.

    Euler–Maclaurin for γ

    The method improves constants but not the overall rate. It remains polynomial.

    Continued fractions

    Some continued fractions converge quickly for algebraic constants, but analytic limitations prevent them from giving exponential speed for transcendental constants like π or γ without global structure.

    Nothing here contradicts the mechanism.


    9. Why These Ideas Matter

    The analytic structure of a constant provides a practical guide to its computational difficulty. It tells us:

    • no simple fast algorithm for γ exists unless new global structure is found • ζ(3) will not yield rapid convergence without discovering symmetry now unknown • every fast algorithm for π must rely on entire or modular behaviour

    These are clear predictions grounded in complex analysis.

    The principle is concise. The decay of coefficients controls convergence. The analytic continuation of a function controls the decay of its coefficients.

    Local structure gives slow convergence. Global structure gives fast convergence. Deep structure remains inaccessible without heavy machinery.

    This is why some constants are easy and others are not, and why the discovery of global analytic structure has such dramatic computational consequences.

    https://thinkinginstructure.substack.com/p/the-analytic-structure-of-constants

  • Iain M. Banks: The Structural Genius and Hidden Hollow at the Heart of The Culture

    Iain M. Banks: The Structural Genius and Hidden Hollow at the Heart of The Culture

    Iain M. Banks built one of the most audacious futures in modern science fiction: a galaxy-spanning civilisation of abundance, wit, ethics, and machine gods, the Minds, who run everything.

    The Culture novels are dazzling. They are also strangely unsatisfying.

    You close them impressed but not moved, awed but unanchored. As though you’ve glimpsed a universe of extraordinary machinery in which the human layer is somehow… thin.

    There’s a structural reason for this. Banks wrote systems with depth and humans with surface detail, and that contradiction defines his entire fictional universe.


    1. Banks Writes Worlds From the Outside In

    Banks’s signature technique is the cascading scale reveal:

    • a detail
    • a chamber
    • a valley
    • a continent
    • a megastructure
    • a ship the size of nations

    He zooms outward until the human layer is dwarfed by the machinery of the world.

    This is not simply style; it is worldview. Banks writes like an engineer describing an operating system, not a novelist exploring interior life.

    The result: Culture novels are intoxicating on the architectural level and emotionally underpowered on the human one.


    2. The Minds Are the Real Characters

    Banks’s affection lies with his AIs and it shows.

    The Minds have:

    • wit
    • history
    • moral uncertainty
    • ambition
    • interior conflict
    • personality
    • actual stakes

    They drive the plot. They embody the ethical arguments. They make the decisions that matter.

    By contrast, Culture humans are:

    • reversible
    • consequence-free
    • post-gender
    • chemically modulated
    • psychologically unscarred
    • eternally cushioned

    They speak with the same tonal varnish. They rarely undergo irreversible change. They exist in a world that protects them from their own choices.

    Narratively, the Minds carry the novels. Humans decorate them.


    3. The Endings Don’t Land. Because They Can’t

    Banks’s novels expand brilliantly but resolve weakly. This is not a writing flaw but a structural inevitability.

    In a post-scarcity civilisation with:

    • no real danger,
    • no irreversible loss,
    • no meaningful political conflict,
    • and superintelligences capable of averting catastrophe…

    human decisions cannot generate narrative stakes.

    Every genuine crisis resolves the same way:

    a Mind intervenes.

    Thus the endings become:

    • spectacle without consequence
    • philosophy without resolution
    • fade-out instead of closure

    Banks raises moral questions his world cannot structurally answer.


    Nuance A: Banks could write human depth … when the world allowed it

    Characters like:

    • Zakalwe (Use of Weapons),
    • Gurgeh (The Player of Games),
    • Byr Genar-Hofoen (Look to Windward),

    prove Banks had the ability to write interiority, trauma, and moral weight.

    But these characters stand out precisely because they push against the gravitational pull of the Culture’s architecture. The civilisation itself flattens human lives into pleasant, reversible experiences.

    Individual brilliance exists; the system does not support it.


    4. Surface Detail Exposes the Fault Line… With a Necessary Caveat

    The Hell subplot in Surface Detail is Banks’s most conceptually ambitious idea:

    • simulated afterlives,
    • eternal punishment as political technology,
    • consciousness trapped in constructed torment.

    But the execution feels strangely hollow. Traditional Hell demands metaphysics:

    • guilt
    • spiritual dread
    • shame
    • religious terror

    Banks instead gives us:

    • infrastructure
    • architecture
    • system design
    • torture as software

    Many readers find this spiritually empty. It’s a metaphysical idea rendered as technical spectacle.

    But here’s an important nuance:

    The hollowness may be deliberate.

    Even so, the narrative effect is unchanged: the system is vivid, the interior torment thin. The philosophical ambition exceeds the emotional grounding.

    The fault line remains visible.


    Nuance B: Some argue the imbalance is intentional

    There is a legitimate counterargument that:

    The Culture’s hollowness is deliberate. It’s a vision of a civilisation so perfected that humanity’s psychological depth has evaporated.

    A fair interpretation. But even if intentional, the narrative effect remains the same:

    The novels soar when the Minds are present and sag when the humans take the stage.

    Structure trumps intent.


    5. Utopia by Deletion

    The Culture avoids drama not through wisdom but through removal. It deletes the forces that shape real human societies:

    • scarcity
    • ideology
    • religion
    • taboo
    • shame
    • generational trauma
    • political faction
    • meaningful death

    In eliminating these, Banks creates a civilisation of ease but also one in which human interiority has almost nothing to push against.

    He compensates by importing external conflict (Special Circumstances, wars, interventions). This only exposes the contradiction:

    The Culture claims moral purity while outsourcing violence to deniable AIs.

    It is utopia by subtraction, held together by the benevolence of gods.


    Final Thoughts

    Banks was a visionary system-builder with a political conscience. He wanted:

    • perfect ethics,
    • perfect abundance,
    • perfect freedom,
    • perfect intelligence.

    But perfect systems erase the very conditions under which human stories acquire meaning.

    The Minds embody Banks’s brilliance. The humans embody his ideology. The gap between them is the hollowness many readers feel.

    The Culture is a post-human AI theocracy wrapped in humanist rhetoric. It is a utopia whose perfection makes its human layer narratively weightless.

    This is the contradiction at the heart of Banks’s work:

    • His worlds are breathtaking.
    • His systems are immaculate.
    • His ideas are audacious.
    • But the humanity inside them is often surface detail.

    Banks wrote universes worth remembering, even if the people who inhabit them seem to dissolve as soon as you close the book

    https://thinkinginstructure.substack.com/p/iain-m-banks-the-structural-genius