Or Why Effective Theories are so … Effective
Solids, liquids, gases. What do these words really mean? How about cell, organ, or human? At a fundamental level, these things are complex assemblages of interacting subatomic particles. But you probably have an easy time recognizing a human without knowing about their electron configurations; you might instead identify key characteristics like physical appearance or behavior. These various abstractions help us understand the macro-world, but a seemingly naive philosophical question is “why can we do that?”
Take gas as an example. For most purposes, we can describe a gas by its density, temperature, and pressure, regardless of its microscopic makeup. Helium and Nitrogen gas are made of different atoms; but, zooming out it doesn’t matter. This feature is ubiquitous in science: at large enough or coarse enough scales, the underlying microscopic details become irrelevant and we can describe the collective behavior quite accurately using just a few emergent parameters. In a practical sense, our abstractions are the identification and naming of various collective phenomena.
Science works so incredibly well, in part for this very reason. However, it is far from obvious why theories of higher-level behavior can work so well in the midst of ridiculous small-scale or low-level uncertainties. With a nod to Eugene Wigner, why are these effective theories so … effective? To get at the heart of the matter, let’s consider a simple game of flipping coins.
Flipping coins and flowing theories
Flip a coin. Heads or tails? Flip it again. And again. How often did the coin land heads up? You should of course expect and find heads about half the time.
Although this experiment is perhaps laughably simple, there is surprisingly more to explore. Notice that I said we expect heads about half the time—how variable is that result? This is a high-level question: a single coin flip can only be heads or tails, but variance in results emerges once a collection of coin flips is considered. Motivated by this, let’s instead flip the coin ten times and count the number of heads. Then do it again. And again. We of course expect the average number of heads to be about five, but sometimes it’s four or maybe eight. In a sense, we have coarse-grained our system, aggregating and averaging our flips, similar to digitizing a photograph into pixels. I “performed” (via computer magic) a series of these experiments, and got the following results.
The Gaussian at a glance
The Gaussian, often referred to as the “Bell-curve” because of its bell-like shape, is a distribution of probabilities or expectations that an event will occur with some frequency. The distribution is centered on the average value (labeled “μ” above) such that about 68% of the outcomes should fall within one standard deviation “σ” above or below the average. About 95% of the outcomes should fall within two standard deviations about the average. The Gaussian is perhaps the most commonly encountered distribution in statistics as it accurately describes the outcomes of a large number of random, independent events. The justification for its prominence is provided by the Central Limit Theorem (see text).
We find something familiar to many students: the more flips per trial, the closer our results look to the famous Bell curve, more commonly referred to the Gaussian or Normal distribution in scientific work (see inset for a refresher). The more and more we coarse-grain our system (larger aggregation of flips per trial), the more the distribution of frequencies behaves like a Gaussian. This idea is so common and powerful in statistics, it gets its own name: The Central Limit Theorem.
The Central Limit Theorem is a powerful tool, and explains why we can be so successful at dealing with statistical uncertainties. But this is also a consequence of a much deeper idea that goes under the arcane name Renormalization Group.
The renormalization group (or RG) was first developed in the context of particle physics, but matured with the pioneering work of Kadanoff and Wilson in the late 60s and 70s (for which Wilson won the Nobel Prize). RG is in some sense a “theory of theories,” explaining how different microscopic descriptions, when zoomed out or coarse-grained, tend to “flow” to a common theory, or Universality Class. In our coin-flipping example, we saw our theory (distribution of frequencies) flow to the Gaussian.
When coarse-graining a system, many of the microscopic parameters combine to give some effective emergent parameters, like the mean and variance from our coin flips, or temperature and pressure of gases. Upon further rescaling or coarse-graining, some of these emergent parameters may survive, but may also change value or renormalize (the “R” in “RG”). When the theories no longer change under rescaling, they fall into some universality class and can all be accurately described by the same set of parameters. These would be our gases, or magnets, or perhaps even our humans.
Stiff or sloppy?
RG helps explain why we can describe systems accurately at a given scale using only a small number of relevant parameters. But there is another way to look at it. In our coin-flipping game, we looked for the average number of heads after a set of flips, as well as the variance of that result. But these “observables” lack information about individual flips—information has been lost. In this sense, coarse-graining is akin to information compression, and RG tells us what information is relevant enough to still identify or distinguish the system.
This view was recently explored by Benjamin Machta and coworkers at Cornell, where they introduced a means of quantifying how sensitive a theory is to its parameters. Initially enormous sets of parameters describing physical or biological systems tend to naturally collapse under coarse-graining into sets of relevant “stiff” and irrelevant “sloppy” emergent parameters. The stiff parameters are key to determining overall system behavior, but even relatively large changes in the sloppy parameters leave the behavior essentially unchanged. From an experimental viewpoint, an inordinate amount of data is required to even begin seeing an effect from “sloppy” parameters. In other words, we can perform successful science while leaving the sloppy parameters wildly uncertain.
So do we understand?
Having introduced an abstract “theory of theories” hopefully illuminates some aspects contributing to the success of science. Understanding the behavior of a gas, let alone the rest of the universe, would be intractable—if not impossible—if we needed to know every exact detail. It seems the success of science is due in part by a hidden hierarchy of scales which can be unearthed explored by RG and some information theory, but does a hierarchy need to appear? Are we that fortunate, or are our scientific methods just nicely tuned to find the cases where it is so? Although we now understand more about what enables science to be so successful, we have only breached the barriers of philosophy slightly. To fully understand why will be a continuing fruitful quest.
 Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, and James P. Sethna. “Parameter Space Compression Underlies Emergent Theories and Predictive Models.” Science 1 November 2013: 342 (6158), 604-607. [DOI:10.1126/science.1238723] ↩