The Unreasonable Effectiveness of Science

Or Why Effective Theories are so … Effective

Solids, liquids, gases.  What do these words really mean?  How about cell, organ, or human?  At a fundamental level, these things are complex assemblages of interacting subatomic particles.  But you probably have an easy time recognizing a human without knowing about their electron configurations; you might instead identify key characteristics like physical appearance or behavior.  These various abstractions help us understand the macro-world, but a seemingly naive philosophical question is “why can we do that?

Take gas as an example.  For most purposes, we can describe a gas by its density, temperature, and pressure, regardless of its microscopic makeup.  Helium and Nitrogen gas are made of different atoms; but, zooming out it doesn’t matter.  This feature is ubiquitous in science: at large enough or coarse enough scales, the underlying microscopic details become irrelevant and we can describe the collective behavior quite accurately using just a few emergent parameters. In a practical sense, our abstractions are the identification and naming of various collective phenomena.

Science works so incredibly well, in part for this very reason.  However, it is far from obvious why theories of higher-level behavior can work so well in the midst of ridiculous small-scale or low-level uncertainties.  With a nod to Eugene Wigner, why are these effective theories so … effective?  To get at the heart of the matter, let’s consider a simple game of flipping coins.

Flipping coins and flowing theories

Flip a coin.  Heads or tails?  Flip it again.  And again.  How often did the coin land heads up?  You should of course expect and find heads about half the time.

Although this experiment is perhaps laughably simple, there is surprisingly more to explore.  Notice that I said we expect heads about half the time—how variable is that result?  This is a high-level question: a single coin flip can only be heads or tails, but variance in results emerges once a collection of coin flips is considered.  Motivated by this, let’s instead flip the coin ten times and count the number of heads.  Then do it again.  And again.  We of course expect the average number of heads to be about five, but sometimes it’s four or maybe eight.  In a sense, we have coarse-grained­ our system, aggregating and averaging our flips, similar to digitizing a photograph into pixels.  I “performed” (via computer magic) a series of these experiments, and got the following results.

Histograms of coin-flipping experiments shows the limit to a Gaussian under coarse-graining.

These plots show histograms of my coin flipping experiments.  Each “experiment” consisted of 40 trials, and for each trial I “flipped” a coin a certain number of times and counted up the number of heads.  The red bars show the deviation over ten experiments.  Notice that the more we coarse-grain the system (more flips per trial), the more the distribution looks Gaussian—just what we would expect from the Central Limit Theorem.

The Gaussian at a glance

gaussian distributionThe Gaussian, often referred to as the “Bell-curve” because of its bell-like shape, is a distribution of probabilities or expectations that an event will occur with some frequency.  The distribution is centered on the average value (labeled “μ” above) such that about 68% of the outcomes should fall within one standard deviation “σ” above or below the average.  About 95% of the outcomes should fall within two standard deviations about the average.  The Gaussian is perhaps the most commonly encountered distribution in statistics as it accurately describes the outcomes of a large number of random, independent events.  The justification for its prominence is provided by the Central Limit Theorem (see text).

We find something familiar to many students: the more flips per trial, the closer our results look to the famous Bell curve, more commonly referred to the Gaussian or Normal distribution in scientific work (see inset for a refresher).  The more and more we coarse-grain our system (larger aggregation of flips per trial), the more the distribution of frequencies behaves like a Gaussian.  This idea is so common and powerful in statistics, it gets its own name: The Central Limit Theorem.

The Central Limit Theorem is a powerful tool, and explains why we can be so successful at dealing with statistical uncertainties.  But this is also a consequence of a much deeper idea that goes under the arcane name Renormalization Group.

The renormalization group (or RG) was first developed in the context of particle physics, but matured with the pioneering work of Kadanoff and Wilson in the late 60s and 70s (for which Wilson won the Nobel Prize).  RG is in some sense a “theory of theories,” explaining how different microscopic descriptions, when zoomed out or coarse-grained, tend to “flow” to a common theory, or Universality Class.  In our coin-flipping example, we saw our theory (distribution of frequencies) flow to the Gaussian.

When coarse-graining a system, many of the microscopic parameters combine to give some effective emergent parameters, like the mean and variance from our coin flips, or temperature and pressure of gases.  Upon further rescaling or coarse-graining, some of these emergent parameters may survive, but may also change value or renormalize (the “R” in “RG”).  When the theories no longer change under rescaling, they fall into some universality class and can all be accurately described by the same set of parameters.  These would be our gases, or magnets, or perhaps even our humans.

The Gaussian is an RG fixed point in the space of probability distributions.

As we saw in the coin-flipping experiment, aggregating or coarse-graining the trials pushed the distribution to look more Gaussian.  In fact, the Central Limit Theorem (CLT) tells us that any distribution of random, independent variables with a finite variance will flow to a Gaussian under coarse-graining.  However, the CLT is really just a special case of RG flow to a fixed point (Universality Class) in a broader, more abstract space of theories.  Other distributions or theories that don’t obey the requirements of the CLT, such as those with correlated trials or unbounded variance, may flow somewhere else. The renormalization group tells us how and where theories flow.

Stiff or sloppy?

Parameter space compression. The parameters reduce to a set of stiff and sloppy parameters that respectively determine the behavior or are irrelevant.Imagine a simple system described by two parameters, p1 and p2, in which regions within the gray ellipses above correspond to roughly the same system behavior.  If we allow the parameters to vary, the behavior is left essentially unchanged so long as their values stay within the ellipses.  The system, however, is better described by the collective emergent parameters p1′ and p2′, which are special combinations of the old ones.  Notice that moving along the “stiff” p2′ direction quickly leads us out of the ellipse and consequently has large, important effects on the system behavior.  In contrast, p1′ can take on a much wider range of values without leaving the ellipse, implying we can be pretty sloppy about its value without worrying about it changing the system behavior.

RG helps explain why we can describe systems accurately at a given scale using only a small number of relevant parameters. But there is another way to look at it.  In our coin-flipping game, we looked for the average number of heads after a set of flips, as well as the variance of that result.  But these “observables” lack information about individual flips—information has been lost.  In this sense, coarse-graining is akin to information compression, and RG tells us what information is relevant enough to still identify or distinguish the system.

This view was recently explored by Benjamin Machta and coworkers at Cornell, where they introduced a means of quantifying how sensitive a theory is to its parameters.[1]  Initially enormous sets of parameters describing physical or biological systems tend to naturally collapse under coarse-graining into sets of relevant “stiff” and irrelevant “sloppy” emergent parameters. The stiff parameters are key to determining overall system behavior, but even relatively large changes in the sloppy parameters leave the behavior essentially unchanged.  From an experimental viewpoint, an inordinate amount of data is required to even begin seeing an effect from “sloppy” parameters. In other words, we can perform successful science while leaving the sloppy parameters wildly uncertain.

So do we understand?

Having introduced an abstract “theory of theories” hopefully illuminates some aspects contributing to the success of science.   Understanding the behavior of a gas, let alone the rest of the universe, would be intractable—if not impossible—if we needed to know every exact detail.  It seems the success of science is due in part by a hidden hierarchy of scales which can be unearthed explored by RG and some information theory, but does a hierarchy need to appear?  Are we that fortunate, or are our scientific methods just nicely tuned to find the cases where it is so? Although we now understand more about what enables science to be so successful, we have only breached the barriers of philosophy slightly.  To fully understand why will be a continuing fruitful quest.

[1] Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, and James P. Sethna. “Parameter Space Compression Underlies Emergent Theories and Predictive Models.” Science 1 November 2013: 342 (6158), 604-607. [DOI:10.1126/science.1238723]



Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s