Title: Uncontrolled: The Surprising Payoff of Trial-and-Error in Business, Politics and Society
Author: Jim Stanzi
Scope: 5 stars
Readability: 4 stars
My personal rating: 5 stars
See more on my book rating system.
If you enjoy this summary, please support the author by buying the book.
Topic of Book
Stanzi traces the history of Randomized Control Trials (RCTs) in science, business and social science. He argues that running vast numbers of RCTs can help solve problems in a very complicated world.
Manzi has an interesting background that gives his a very broad range of thinking. He was educated as a physicist and then went into business consulting. In order to help business managers make good decision, he gradually came around to doing large numbers of RCTs.
I stumbled across the book by accident, and was pleasantly surprised. This is one of the best books that I have read in a long time. I highly recommend it.
- As we go from physics to biology to social science, we deal with increasingly complex phenomena. This makes it difficult to build compact and comprehensive predictive theories.
- Randomized Control Trials (RCTs) are the best method for identifying what works in a complex environment.
- RCTs started in science and then grew rapidly in medical research and business. It is just getting started in social science and public policy.
Important Quotes from Book
Science and technology have made astounding advances over the past half-century. The most significant relevant developments have been in biology and information technology. The tradition of liberty has always had a strong “evolutionist” bent… the mechanics of genetic evolution provide a clear and compelling picture of how a system can capture and exploit implicit insight without creating explicit knowledge, and this naturally becomes the model for the mechanism by which trial and error advances society’s material interests without conscious knowledge or planning. A further technical development enabled by information technology—the explosion in randomized clinical trials that first achieved scale in clinical biology, and has started to move tentatively into social program evaluation—provides a crucial tool that could be much more widely applied to testing claims for many political and economic policies.
Combining these ideas of evolution and randomized trials led Donald T. Campbell, a twentieth-century social scientist at Northwestern University, to create a theory of knowledge, which he termed “evolutionary epistemology.”
The foundation is unstructured trial and error.
The reason we have increasing trouble building compact and comprehensive predictive theories as we go from physics to biology to social science is the increasing complexity of the phenomena under investigation. But this same increasing complexity has another pernicious effect: it becomes far harder to generalize the results of experiments.
This is because the vast majority of reasonable-sounding interventions will work under at least some conditions, and not under others. For the hypothetical literacy program described above, an experiment to test the program is not really a test of the program; it is a test of how well the program applies to a specific situation.
Of course, this would require that each experiment be cheap enough to make this many tests feasible. Over the past couple of decades, this has been accomplished for certain kinds of tests. The capability has emerged not within formal social science, but in commercial enterprises. The motivation has been the desire to more reliably predict the causal effects of business interventions like the example of the retail-store upgrade program that opened this book. The enabling technological development has been the radical decreases in the costs of storing, processing, and transmitting information created by Moore’s Law. The method has been to use information technology to routinize, and ultimately automate, many aspects of testing.
This division of labor should not be surprising. Biological and social science researchers developed the randomized trial, and then the conceptual apparatus for thinking rigorously about the problem of generalization. Commercial enterprises have figured out how, in specific contexts, to convert this kind of experimentation from a customized craft to a high-volume, low-cost, and partially automated process.
I believe that by more widely applying the commercial techniques of radically scaling up the rate of experimentation, we can do better than we are now: somewhat improve the rate of development of social science; somewhat improve our decisions about what social programs we choose to implement; and somewhat improve our overall political economy. Spread across a very big world, this would justify a large absolute investment of resources and hopefully would help to avoid at least a few extremely costly errors.
The thesis of this book can therefore be summarized in five points:
- Nonexperimental social science currently is not capable of making useful, reliable, and nonobvious predictions for the effects of most proposed policy interventions.
- Social science very likely can improve its practical utility by conducting many more experiments, and should do so.
- Even with such improvement, it will not be able to adjudicate most important policy debates.
- Recognition of this uncertainty calls for a heavy reliance on unstructured trial-and-error progress.
- The limits to the use of trial and error are established predominantly by the need for strategy and long-term vision.
Francis Bacon’s text Novum Organum, written almost four hundred years ago, prophesied the modern scientific method.
Bacon began with a theory that combined two key elements. The first was the observation that nature is extraordinarily complicated as compared to human mental capacities… The second element of his theory was his belief that humans tend to overinterpret data into unreliable patterns and therefore leap to faulty conclusions.
Bacon believed that this combination of errors had consistently led natural philosophers to enshrine premature theories as comprehensive certainties that discouraged further discovery. Proponents of alternative theories, all of whom had also made faulty extrapolations from limited data to create their theories, would then attempt to apply logic to decide between them through competitive debate. The result was a closed intellectual system whose adherents spent their energies in ceaseless argumentation based on false premises, rather than seeking new information.
Bacon proposed a new method… He called this method induction. The practical manifestation of his proposed approach came to be called the scientific method.
The ultimate goal of Baconian science is not philosophical truth; it is improved engineering.
Writing a little more than a century after Bacon, skeptical British philosopher David Hume focused on the problem of how we can generalize from a finite list of instances to a general rule.
Hume’s observation is that to the extent that my belief in a particular cause-and-effect relationship relies on induction, this belief must always remain provisional. I must always remain open to the possibility that although I have never seen an exception to this rule, I might encounter one at some point in the future… This claim is what I will mean by the Problem of Induction throughout this book.
The Problem of Induction can be restated usefully as the observation that there may always be hidden conditionals to any causal rule that is currently believed to be valid. As Hume argued, this problem of hidden conditionals is always present philosophically…, a more generalized version of the Problem of Induction is the central practical problem in developing useful predictive rules in the social sciences.
Experiments can disprove theories by providing counterexamples but cannot prove theories no matter how many times we repeat them, since it is always theoretically possible that in some future experiment we will find a counterexample.
We know our physical theories are true in the sense that they enable human capability. Science does not tell us whether theories are true in the classic philosophical sense of accurately corresponding to reality, only that they are true in the sense of allowing us to make reliable, nonobvious predictions.
Deep commonalities across activities such as science, markets, common law, and representative democracy at least indicate the plausibility of a common underlying structure. Each is a noncoercive system for human social organizations to increase material well-being in the face of a complex environment.
The purposes of randomization, therefore, are: (1) to help prevent experimenter bias from assigning systematically different patients to the test versus control groups, consciously or unconsciously, and (2) more subtly, yet more profoundly, to hold approximately equal even those potential sources of bias between the test and control groups of which we are ignorant. It is a method designed to create controlled experiments in the presence of rampant, material hidden conditionals.
The RFT had been developed in its modern form by the 1940s.
The RFT is a relatively new piece of technology—newer than the automobile or the airplane, and about the same age as color television or the electronic computer. Over the past sixty years, it has driven out alternative means of determining therapeutic efficacy wherever it can be applied practically.
In the current era, nonexperimental therapeutic findings remain dramatically less reliable than proper experimental findings. In a well-known 2005 paper, Dr. John Ioannidis, a professor at the Stanford School of Medicine, evaluated the reliability of forty-nine influential studies (each cited more than 1,000 times) published in major journals between 1990 and 2003 that reported effective interventions based on either experimental or nonexperimental methods. More than 80 percent of nonrandomized studies that had subsequent replications using stronger research designs were contradicted or had found stronger effects than the replication, whereas this was true for only 10 percent of findings shown initially in large RFTs. To repeat: 90 percent of large randomized experiments produced results that stood up to replication, as compared to only 20 percent of nonrandomized studies.
The randomized experiment is the scientific gold standard of certainty of predictive accuracy in business, just as it is in therapeutic medicine. If a program is practically testable and an experiment is cost-justified (i.e., if the expected value of the incremental information is worth the cost of the test), experimentation dominates all other methods of evaluation and prediction.
I have observed the results of thousands of business experiments. The two most important substantive observations across them are stark. First, innovative ideas rarely work. Second, those that do work typically create improvements that are small compared to the size of the strategic issues they are intended to address, or as compared to the size of the dreams of those who invent them.
This is very similar to the results achieved in clinical drug trials.
A distinct organizational entity, normally quite small, must be created to design experiments, and then provide their canonical interpretation… It should have no incentives other than scorekeeping; therefore, it should never develop program ideas or ever be a decision-making body.
The orientation should not be toward big, onetime “moon shot” tests, but instead toward many fast, cheap tests in rapid succession whenever feasible.
This template sounds simple but is hard to implement. Like physicians, business professionals often resist constraints on their autonomy, and no matter what they say, often also resist being held accountable for results.
First, controlled experiments are a necessary but not sufficient condition for scientific progress, and randomization, whenever available, is the best method to establish control in an experiment that can be embedded in a sequence of experiments and supported by theory to allow us to draw reliable causal conclusions about the effects of a specific social action.
Second, replicating experiments is required before drawing a conclusion.
Therapeutic medicine, business, and policy-making institutions that have overcome these challenges have evolved a very similar package of coping mechanisms. First, they have political sponsorship from ultimate decision-makers. Without this, the whole exercise is academic. Second, the political sponsor is paired with an agitator who directly leads the development of a testing infrastructure and community of experts with scientific values, ensures that this function delivers tangible value, and protects it from political interference… Third… integration with operational data systems, focus on multiple iterative simple tests, repeatable testing procedures, a combination of flexibility in choosing programs to test with consistent rigor in testing method, and others—that have the effects of lowering the cost per test, and making replication and near-replication far more practical and useful.
The characteristic error of the contemporary Right and Left in this is enforcing too many social norms on a national basis.
More generally, innovation appears to be built upon the kind of trial- and-error learning mediated by markets. It requires that we allow people to do things that seem stupid to most informed observers—even though we know that most of these would-be innovators will in fact fail. This is premised on epistemic humility. We should not unduly restrain experimentation, because we are not sure we are right about very much. For such a system to avoid becoming completely undisciplined, however, we must not prop up failed experiments. And to induce people to take such risks, we must not restrict huge rewards for success, even as we recognize that luck plays a role in success and failure.
We can visualize the evolution of the economy as being like a huge conveyor belt in which sectors originate in entrepreneurial innovation, proceed through scale-up, and end with being compacted into low-employment/high-productivity automated sectors. It is “discover, routinize, automate” at the highest level. Looking today at entrepreneurial start-ups, then at high-growth technology sectors, then at more mature parts of the service economy, then at manufacturing, and then at agriculture is like burrowing down through layers of an archaeological dig. We see the various stages of this process in cross-section.
The federal government should establish an agency analogous to the Food and Drug Administration (FDA) to develop, promulgate, and enforce standards for designing and interpreting social-policy randomized experiments.
We should invest in human and technology infrastructures to increase radically the annual volume of RFTs that can be conducted economically. To put a stake in the ground, I propose that we set a medium-term goal of conducting as many social policy RFTs in the United States each year as we do clinical trials: about 10,000.