The publication of Geronimo Cardano’s (1501-1576) Ars Magna in 1545 is generally recognized as the beginning of modern mathematics. At roughly the same time, Prussian mathematician Georg Joachim Rheticus (1514-1576) made arrangements for Andreas Osiander to publish De Revelutionibus, in which Nicolaus Copernicus (1473 – 1543) proposed a heliocentric theory of the solar system (in the year of Copernicus’ death). Contrary to belief at the time, Copernicus claimed that the earth circled the sun instead of the other way around. Copernicus and others set the world straight, but more importantly, heliocentrism kick-started a burst of scientific discovery by notable Europeans such as Nicolaus Copernicus, Tycho Brahe, Johannes Kepler, Galileo Galilei, René Descartes, Isaac Newton and Gottfried Leibniz. The Age of Enlightenment period (circa 1650-1750) generally promoted a mechanical view of the world. The earth circles the sun with regularity and Newton’s apple experiment produces the same result no matter who repeats it.

Some scholars place the beginning of this burst of rational thought with the publication of Isaac Newton’s (1642-1727) Philosophiæ Naturalis Principia Mathematica (1687), which postulated deterministic rules for how the universe works. Newton’s laws of gravitation and three laws of motion took us to the moon nearly 300 years later.

In Newton’s deterministic universe, an apple falls from its tree in exactly the same way every time. Its path is predictable in every detail. A trip to the moon can be carried out in exactly the same way each time the same forces are applied to a projectile. Newton’s world is so regular and predictable that reality and mathematical models are perfectly aligned. Determinism has served humanity well.

But, Newtonian physics and heliocentric models are unable to predict when an apple will fall from its tree, or how a modern personal computer computes, or even why the Black Death (1348-1350) eradicated 30% of the European population. Deterministic laws cannot explain (or predict) unexpected catastrophes that “just happen”. Newton’s mechanical universe was too simple to explain complex, interconnected, non-linear reality. Not to mention, relativity theory.

Newtonian physics began to crumble almost immediately. Rigorous mathematical models were excellent for some explanations, but completely unable to describe many other everyday phenomena. Accidents, diseases, and disasters remained the providence of the divine, and were impossible to explain by Newton’s laws. That is, until a teen-ager named Blaise Pascal began his ascent into the world of reason.

In Against The Gods: The Remarkable Story of Risk, Peter L. Bernstein (1919 – 2009) documents the downfall of determinism and the rise of chance. His primary thesis is that humans often turn to mysticism or spiritualism to explain chance events that cannot be explained by the mechanical world of determinism. In a review of Bernstein’s book, Peter Coy writes, “Statisticians, in the telling of Peter L. Bernstein, are nothing less than Promethean heroes. [Bernstein] argues that the people who mastered the calculation of probabilities, beginning in 16th century Italy, stole from the gods something more precious than fire--namely, the understanding of risk. The mastery of risk is the foundation of modern life, he contends, from insurance to the stock market to engineering, science, and medicine. Bernstein argues that the mastery of risk is what divides modern from ancient times.”

One of Bernstein’s central characters is Blaise Pascal – a prodigy not unlike today’s Internet entrepreneurs. For example, Pascal built and sold 50 mechanical calculators by the age of 18. In 1654 at the age of 30, Pascal vastly improved on a solution to a problem posed by the famous mathematician Pierre de Fermat and relayed to him by his mentor Chevalier de Méré. And in the process, Pascal invented probability theory, which laid the foundation for risk analysis. More profoundly, Pascal’s invention introduced a new idea – that the world is not entirely deterministic, nor is it entirely beyond the yen of humans. Pascal’s breakthrough came by an indirect means – he invented probability theory to solve a real-world problem – how to win games of chance. Gambling motivated the creation of non-deterministic science.


The “problem of the points” asks, “what is the likelihood of winning a game of chance given that we know the mid-play scores of two players?” For example, in tennis the first player to exceed 40 wins. But if the score is 40-30, what is the probability of coming from behind and winning? Here is a game of chance that has more than one outcome, depending on skill and “luck”. Unlike a falling apple, the outcome of this game can differ each time it is played. And unlike Newton’s laws of motion, gambling obeys an entirely different set of rules.


Pascal’s solution was based on a simple assumption – that probability is the number of ways of winning divided by the total number of ways of winning and losing. To come from behind and make two points before the leader with 40 points scores one more point, our underdog has to be either very skilled or very lucky. Assuming equal skill, the chance of scoring two points before the leader scores one point is identical to the chance of getting two heads in a row when tossing a coin twice. Let H and T represent heads and tails, respectively. Then according to Pascal’s newfound technology, there is only one HH outcome from two tosses: HH, HT, TH, or TT. Therefore, the probability of winning the tennis match when the underdog is down by 40-30 is 25%, because only one of the 4 possible ways of finishing the game end with the underdog reaching the winning score, first. According to Pascal, when evenly matched, the difference between winning and losing is pure luck. More importantly, Pascal showed that pure luck could be represented precisely by mathematics. No Gods of Chance needed.

Pascal defined probability as a number between zero and one [0,1]. Zero represents an impossible outcome; one represents a certain outcome; and a fraction in between represents the likelihood of the desired outcome. A probability may also represent uncertainty when it falls between zero and one. For example, 0.5 represents maximum uncertainty, because the desired outcome is no more likely to occur than not occur. Tossing a balanced coin is equally likely to turn up heads (H), as it is tails (T). Chances are 50-50% that the coin will land on heads or tails, 25-75% that two heads (HH) will happen in two tosses, etc.


Pascal’s method of calculating probability is considered an a priori approach because it is predictive – it calculates the probability of a possible event, even before it happens. Various a priori methods of prediction continue to be used today. For example, we attempt to predict the likelihood of a terrorist attack before it happens by considering all the ways an attack can take place. Pascal’s method of a priori probability calculation is based on combinatorial enumeration: Pr(favorable outcome) = (all possible ways of a favorable outcome happening)/(all possible outcomes).

Note that Pascal’s method ignores history – it ignores what has happened in the past. For example, it ignores how many times in the past a tennis player has won when leading by 40-30. It also ignores the history of terrorism, earthquakes, floods, and cyber attacks. This is the meaning of a priori – before it happens.

Another one hundred years would pass before a priori probability theory was extended by a posteriori probability theory – predictions based on past events and historical evidence. French mathematician and astronomer Pierre-Simon, marquis de Laplace (1749 – 1827) asked, “What is the probability the sun will rise tomorrow?” Laplace conjectured that future events are a consequence of past events and therefore, their likelihood of occurring in the future can be calculated by simply counting the number of similar events that have happened in the past and dividing by all possible opportunities for the events to happen again.

Here is his result. Let S be the number of times in the past that the sun appeared on schedule. Then the probability that it will rise again is (S+1)/(S+2). Given that the sun has never failed to rise, this number is very close to 1.0, or certainty. Conversely, the probability that the mother of all catastrophes will terminate life as we know it is 1.0 – (S+1)/(S+2), or very close to zero. But, it is not exactly zero. Why?


Laplace’s sunrise problem illustrates an important development in thinking about risk. Laplace’s interpretation of probability is motivated by evidence, while Pascal’s combinatorial enumeration is motivated by mathematics. Laplace’s interpretation contains an element of uncertainty, while Pascal’s does not. In Laplace’s world, risk assessment involves an element of uncertainty. In Pascal’s world, risk is mathematically precise and without “noise”.

The presence of uncertainty in Laplace’s model explains why the probability of the sun rising isn’t exactly 100% and the probability of it not rising isn’t exactly zero. This “side effect of uncertainty” is called Laplace’s rule of succession. It says that when nothing is known about past performance, the probability of any event occurring in the future is (0+1)/(0+2) = 50%. In other words, the event either happens or not, with equal probability. As pointed out above, 50% represents “maximum ignorance” in situations where nothing is known about the likelihood of the event. As evidence mounts to the contrary (the sun has risen billions of times), the residue of uncertainty diminishes, but a small amount always remains in the estimate. Laplace’s probability of the sun rising tomorrow is 99.9999….%, not 100%.


In contrast to a priori probability, a posteriori probability ignores combinatorial mathematics. Instead, it applies historical data to the problem of predicting the future. Laplace’s method has been criticized because of that famous stockbroker caveat, “past performance is no guarantee of future performance”. And indeed, this is a valid criticism. Even after a stock has risen 1,000 days in the past, there remains a residue of uncertainty that it will rise once again tomorrow. Laplace might respond to this criticism by arguing that accuracy can be (partially) improved by simply collecting more evidence! Laplace’s residue of uncertainty will be explored in more detail when I describe Bayesian conditional probability theory.

Laplace’s method is readily applicable to the problem of estimating the probability of a successful terrorist attack, given T attempts and S successful attacks in the past. Pr(successful terrorist attack) = (S+1)/(T+2). For example, since 2001, there have been S = 3 successful attacks in the US, out of 26 attempts. Thus, the probability of a successful (future) attack, based on the past, is (3+1)/(26+2), or 14%.

Probability theory was devised largely for a very practical reason – gambling. Predicting the amount of money one could make by “risking” capital at the card table and roulette wheel was probability theory’s “killer app”. A handsome profit can be had by correctly predicting the future outcome of a game of chance. Indeed, investigation of this killer app accelerated for the next 200 years and continues, today. Edward Oakley Thorp (1932-), an American mathematics professor, author, hedge fund manager, and blackjack player best known as the "father of the wearable computer” demonstrated perhaps the most dramatic application of probability theory to gambling in 1961. Thorpe’s used a concealed computer to beat the Blackjack tables in Las Vegas. Thorpe documented his technique in a best seller titled Beat the Dealer in 1962. [His technique is the famous card counting method].

Thorpe was following in the footsteps of Geronimo Cardano (1501-1576), one of Peter Bernstein's probability hero’s.  Cardano was a famous Milanese physician, but more importantly, he was also a compulsive gambler. Gambling drove Cardano to formulate early ideas that later became the basis of modern risk assessment. He combined probability estimates with gains and losses – consequences – to formulate the early idea of risk. He was concerned with predicting how much money might be made by repeatedly playing a certain game, and on the downside, how much money might be lost. Cardano intuitively understood risk as his expected gain or loss after a hard day of gambling.

Cardano’s intuition was formalized by Daniel Bernoulli (1700-1782) – a third-generation grandson of the famous family of Swiss mathematicians. Bernoulli’s risk equation is the foundation of modern expected utility theory, EUT (1738). According to Bernoulli, risk is the product of the probability of a certain outcome and its consequence: R = Pr(C)C, where Pr(C) is the probability of losing C dollars, say, and C is the loss measured in dollars. When n independent events are possible, risk is simply the sum of all expected values: Pr(C1)C1 + Pr(C2)C2 + …Pr(Cn)Cn. This breakthrough in risk calculation continues to be used today in financial and engineering calculations.

Of course consequence can also be measured in terms of fatalities, economic decline, loss of productivity, and other measures. Probability can be calculated in a number of ways (to be discussed later on), but it is always a unit-less number in the interval [0, 1]. Thus, risk is measured in the same units as consequence. If consequence is given in terms of fatalities, then risk is given in terms of loss of life. If measured in terms of dollars, then risk is expressed in terms of dollars.

It is important to note that risk is not a probability, and probability is not a risk. Rather, the elements of risk are likelihood as measured by a probability, and gain/loss as measured by a consequence. For example, the likelihood of having a computer virus attack your personal computer is rather high, but the risk is rather low if we measure consequence as the cost associated with removing the virus. On the other hand, the likelihood of another 9/11-size terrorist attack is rather small, but the consequence is very high. The risk may be large, if the product of likelihood and consequence is large, but small if the product is small. Probability and consequence are handmaidens in the estimation of risk.

Risk is also not vulnerability or threat. These two terms are often mistaken for risk, because they are closely related to risk. Generally, vulnerability is a weakness in an asset that may be exploited to cause damage. It can be quantified as a probability, but it is not a form of risk, because it does not measure expected gain or loss. Similarly, threat is a potential for harm that can also be quantified as a probability, but it is not a form of risk for the same reasons as vulnerability. Generally, threat can be quantified as the probability of an attack or catastrophic event, and assigned a number between zero and one. But as discussed later, this definition of threat is controversial.

We are now in a position to understand the modern manifestations of risk-informed decision-making used to decide how best to allocate resources to either increase expected gains or reduce expected losses. The challenge of risk assessment comes down to the challenges posed by calculating probabilities and consequences. How do we estimate the probability of a future event, and how do we know the extent of damages? As it turns out, this is more complicated than the pioneers of expected utility theory ever imagined.