25/04/2026
Learning Mechanics and the Second Formation of Deep Learning Theory
The recent paper “There Will Be a Scientific Theory of Deep Learning” should not be read as a conventional critique of deep learning, nor as a narrow technical paper offering one new theorem, model, or experiment. Its importance lies elsewhere. It is best understood as a field-formation paper: a paper that attempts to organize scattered lines of deep learning theory into a coherent foundational research programme. In this sense, it has the character of a literature review, but it is more ambitious than an ordinary review. It does not simply summarize existing work. It proposes that several strands of theory are converging toward what the authors call learning mechanics, a future scientific theory of the learning process itself.
The phrase “trial and error” in the paper should therefore be understood carefully. The authors are not saying that deep learning researchers merely guess randomly. Deep learning already has sophisticated engineering methods, including neural architecture search, scaling laws, optimizer heuristics, hyperparameter transfer, and accumulated architectural wisdom. Neural Architecture Search (NAS), for example, can systematically explore model-design spaces before a final architecture is chosen. But from the viewpoint of this paper, NAS still belongs largely to the empirical-engineering regime. It automates exploration; it does not yet provide a first-principles explanation of why a particular architecture should work, how its representations will form, what scaling behavior it will follow, or what failure modes will emerge. NAS searches the design space. Learning mechanics would explain the design space.
This is why the paper feels like a literature-review style work, especially around Table 1. The authors categorize existing research into five major bodies of evidence: analytically solvable settings, simplifying limits, simple empirical laws, theories of hyperparameters, and universal phenomena across models and tasks. These categories are not merely a convenient taxonomy. They are presented as the beginnings of a systematic science. The authors compare them to the role of solvable models, limiting regimes, empirical laws, scaling analysis, and universal behavior in physics. In doing so, they are trying to give deep learning theory a disciplinary map.
This makes the paper valuable as foundational knowledge. A useful analogy is economics. Economics did not become a systematic field all at once. It passed through political economy, marginalism, neoclassical theory, econometrics, macroeconomic formalization, game theory, information economics, behavioral economics, and computational approaches. Each stage contributed different primitives, methods, and explanatory ambitions. Political economy supplied broad conceptual framing. Neoclassical theory formalized agents, constraints, equilibria, and optimization. Econometrics introduced measurement, estimation, and predictive testing. Later traditions revised or extended the framework, but the field became powerful because it developed a layered foundation of concepts, mathematical tools, and empirical methods.
The paper is attempting something similar for deep learning. It asks whether the field can move from accumulated engineering success toward a more mature scientific theory. The proposed objects of this theory are architecture, data, objective function, learning rule, initialization, optimizer, hyperparameters, scale, representations, weights, and performance. The desired theory would not predict every microscopic detail of every neural network. Rather, like physics or economics, it would operate at the right level of abstraction. It would explain and predict coarse but meaningful quantities: loss curves, scaling laws, training regimes, feature formation, sharpness, hyperparameter behavior, and representation geometry.
This is also why Imre Lakatos provides a strong philosophical frame for understanding the paper. In Lakatosian terms, the paper is not presenting a completed theory. It is formulating a research programme. The proposed hard core is that deep learning can be scientifically explained through the mathematical dynamics of learning. Neural networks are not permanently inscrutable black boxes; they are complex but measurable systems whose training processes may obey discoverable laws. Around that hard core, the paper identifies a protective or research belt: solvable toy models, infinite-width and infinite-depth limits, neural scaling laws, edge-of-stability phenomena, hyperparameter-scaling theory, µP, neural collapse, representation universality, and related empirical regularities. Section 5 then functions as the positive heuristic: it tells future researchers what problems to pursue if they want to make the programme progressive.
But this raises an obvious historical question: did deep learning not already have such a research programme long ago, with Hinton, Bengio, and LeCun? The answer is yes, but with an important distinction. The older deep learning programme was primarily constructive. Its hard core was that intelligence-like behavior could emerge from distributed representations learned by multilayer neural networks trained through gradient-based methods. This programme gave us backpropagation, representation learning, convolutional networks, deep architectures, pretraining, feature learning, and eventually the practical foundations of modern AI.
The new paper does not deny that earlier programme. Instead, it suggests that the field is now entering a second phase. The first phase showed that deep learning works. The second phase asks whether we can explain, predict, and control why it works. Hinton, Bengio, and LeCun helped establish the constructive research programme of deep learning. This paper tries to formulate the theoretical research belt needed to turn that programme into a mature science.
What we also found out from last week’s AI research monitoring is that this “learning mechanics” shift is not merely philosophical; it is already appearing at the frontier of LLM research. The strongest papers selected by the monitor were not simply about making models produce better answers, but about making their reasoning processes observable, steerable, testable, and auditable. Work on shared logical subspaces, contrastive prompt optimization, CFG interpretation, process reward models, grounded pausing, neural garbage collection, calibrated multi-attempt reasoning, causal attention alignment, and sabotage auditing all point in the same direction: LLMs are being treated less like mysterious text oracles and more like instrumentable systems. The old stack was largely prompt, answer, score, and tune; the emerging stack is internal representation, reasoning trace, process reward, formal testbed, memory policy, causal pathway, and controlled behavior. In Lakatosian terms, this looks like a new protective belt around the LLM reasoning programme: moving beyond prompt engineering and benchmark chasing toward telemetry, process supervision, activation steering, formal diagnostics, and alignment-by-control. This is why the monitoring results felt coherent. They captured the operational birth of what we might call LLM reasoning mechanics: a practical counterpart to the broader theoretical ambition of learning mechanics, where the goal is not only to make models perform, but to understand, shape, and govern the process by which they perform.