Why Neural Networks Deserved a "Physics" Prize

A physics-first reading of the 2024 Nobel Prize in Physics, awarded for foundational work on artificial neural networks.

Published
2026-04-05
Updated
2026-04-21
Author
Editorial Team
Tags
physics, ai, neural-networks

A Physics Question, a Computer-Science Answer

The 2024 Nobel Prize in Physics went to work on artificial neural networks — a body of ideas most non-specialists file under computer science. The honest first reaction is the one the reader may already have: is this really a prize in physics? The position taken here is that the core ideas honored by the award are direct descendants of statistical mechanics, and the award reads, on inspection, as a rare case of physical thinking taking permanent root in a neighboring discipline. Not a borderline case, but an illustration of how far physics can travel.

What follows traces the lineage backward from 2024 to the 1920s, beginning with the Ising model and walking forward through the Hopfield network, the Boltzmann machine, and the revival of deep learning, using the vocabulary of physics.

The Ising Model at the Source

The story starts in 1920s Germany. In 1920 Wilhelm Lenz handed his student Ernst Ising a dissertation topic: a one-dimensional lattice of spins pointing up or down. Ising completed his doctorate at the University of Hamburg in 1924. It is, in effect, the smallest possible model of ferromagnetism — two states per site, nearest-neighbor couplings, nothing else.

One-dimensional Ising chains have no phase transition, a conclusion that was something of a disappointment at the time. Twenty years later, in 1944, Lars Onsager published the exact solution of the two-dimensional case, showing a genuine finite-temperature phase transition. The model became the lingua franca of statistical mechanics, the cleanest possible object through which to teach and think about collective behavior. More importantly for this article, it established the habit of thought that would carry forward: write down a global energy function over many variables, and look for the configurations that minimize it.

The Hopfield Network (1982)

In 1982, John Hopfield, then at Princeton and the National Academy of Sciences, published "Neural networks and physical systems with emergent collective computational abilities" in Proceedings of the National Academy of Sciences 79(8), 2554–2558. The paper is the moment neural networks get rewritten in the language of statistical physics.

Each unit takes values $+1$ or $-1$, as in the Ising model. Symmetric weights $w_{ij}$ couple pairs of units, with zero self-connections. The global energy of the network is defined as:

# Hopfield energy: symmetric weights, diag = 0
def energy(state, weights):
    # state: +/-1 vector; weights: symmetric matrix with diag=0
    return -0.5 * state @ weights @ state

Under asynchronous updates, this energy monotonically decreases, so the network settles into one of the local minima that its weights define. Embed your patterns-to-remember as those minima, feed in a noisy initial state, and the network descends the energy landscape to the closest memory. That is the content of Hopfield's proposal: associative memory as energy descent.

The embedding rule is Hebbian — in plain terms, add together the outer products of the stored patterns and normalize by the number of units $N$. How many patterns can you fit? In 1985, Daniel Amit, Hanoch Gutfreund and Haim Sompolinsky applied spin-glass tools in Physical Review Letters and showed that a critical ratio $\alpha = p/N$ of about $0.138$ separates reliable recall from catastrophic failure. That number has been the textbook "storage capacity" ever since.

The Boltzmann Machine (1985)

Deterministic recall is not enough to capture rich data. In 1985 David Ackley, Geoffrey Hinton and Terrence Sejnowski published "A Learning Algorithm for Boltzmann Machines" in Cognitive Science 9, 147–169, introducing three changes.

First, they added hidden units alongside visible ones, so that the network could represent latent structure not directly present in the inputs. Second, they made each unit's update stochastic, sampling states from a Boltzmann distribution $P \propto \exp(-E/T)$ with a temperature parameter $T$. In form, the machine is now a literal statistical-mechanical system. Third, they derived a learning rule based on the difference between statistics collected under a "clamped" data-driven phase and a "free" model-driven phase — an update corresponding to minimizing KL divergence between model and data.

The formulation was elegant. Training was not. Both phases require sampling from an equilibrium distribution, and the cost grows sharply with the size of the network. Through the 1990s, this sampling bottleneck helped stall the progress of deeper networks.

From RBMs to Deep Belief Nets

The breakthrough arrived in 2006. Hinton restricted the architecture to a Restricted Boltzmann Machine, a bipartite graph of visible and hidden layers with no within-layer connections. The restriction makes layer-wise Gibbs sampling tractable, and training cost drops dramatically.

The same year, Hinton, Simon Osindero and Yee-Whye Teh published "A Fast Learning Algorithm for Deep Belief Nets" in Neural Computation 18(7), 1527–1554. Their method stacks RBMs, pre-trains them greedily layer by layer, and then fine-tunes the whole network as a single system. That combination cleared the 1990s wall — the sense that deep networks could not be trained — in practice. A companion paper in Science that July, with Ruslan Salakhutdinov, used autoencoders for dimensionality reduction. Pre-training rooted in physics-style thinking gave deep learning its second spring.

From Hebbian Rules to Gradient Descent

A pause to line up the learning rules. In 1949, Donald Hebb's The Organization of Behavior introduced the principle that neurons that fire together wire together. That principle becomes, quite literally, the weight construction for a Hopfield network: sum of outer products of the stored patterns. Boltzmann machines train by descending the log-likelihood, using a gradient that reads naturally as a free-energy descent. Modern deep networks descend a loss landscape by stochastic gradient descent.

From Hebb in 1949 to the transformer in the 2020s, the backbone of the story is the same: a dynamics that walks downhill on some engineered energy-like surface. The first strong deliberate import of that idea into neural networks came from Hopfield and Hinton.

The Echoes Still Ringing

The inheritance of statistical mechanics is not confined to the work the 2024 prize honored.

In 2015, Jascha Sohl-Dickstein and colleagues published "Deep Unsupervised Learning using Nonequilibrium Thermodynamics," recasting generative modeling as a controlled diffusion process. A decade later, that proposal became the foundation of the diffusion models behind Stable Diffusion and Imagen. Seen through the right lens, a modern image generator is a statistical-mechanical device solving a reverse-time stochastic differential equation.

Energy-based models more broadly have returned to the research conversation. The relationship between renormalization-group flow and depth in neural networks is an active area where mathematical physics meets machine learning. The border between the two fields keeps thinning.

Not every modern AI advance descends from physics, and it would be unfair to pretend otherwise. Transformers come from sequence modeling and attention, and their design owes little to energy landscapes. But within the specific lineage that the 2024 Physics Prize recognized — spins to associative memory, Boltzmann distributions to deep learning, and on to diffusion models — the case for calling this a physics story holds. It is one of the cleanest examples we have of a physical idea becoming the beating heart of another discipline.

Related laureates

Share this article

Related articles