I’ve been a big fan of Nassim Taleb’s Incerto. He wrote a series of essays on life, where all the topics revolve around decision making under uncertainty. I wanted to dig deeper on some of the more technical concepts he alluded too, so last year I explored a few textbooks on probability theory.

I was surprised with how elegant the field was. The most inspiring idea to me was how the originators interpreted probability through set theory. Not only is it a beautiful way to look at things, but by seeing it this way, they could apply few axioms, leverage set theory, and badabing badaboom they had a whole field’s worth of discoveries.

I wanted to share with you an example from one of the textbooks, that illustrated the power of seeing probability through this lens, and demonstrated how you could begin deriving complex ideas from the simplest kernel.

Let’s say you have 2 boxes. In Box 1, you have 99 red balls, and 1 white ball. In Box 2, you have 99 white balls, and 1 red ball.

**You pick a box at random, then, you pick a ball from that box.**

This can get a bit tough to reason about. There’s a 50% chance you pick Box 1. In Box 1, you have a 99% chance. If you picked Box 2, you have a 1% chance. How do we combine these probabilities together?

If you reason the way you were taught in high school, you may think like this:

Well, there’s a 50% chance I pick Box 1, and a 99% chance after that to pick a red ball And, there’s a 50% chance I pick Box 2, and a 1% chance after that to pick a white ball.

So the total probability can be `50% * 99% + 50% * 1%`

Which is 49.5% + 0.5% which is…50%

Now this will work, but notice how the probability was 50% — Did you really need to do all that work to figure this out? (1)

Let’s reimagine what *probability* here means. First, let’s consider: what are *all* *the possible outcomes*?

For out experiment, an outcome must contain two choices: The box we chose, and the ball we chose after that. We could represent it like this:

This is *one outcome — we picked Box 1, then picked Red Ball 1.*

How many of these outcomes do we have?

We can list it out: we pick Box 1, Red Ball 1, Box 1…Red Ball 99, etc. In total, we would have 200 possible outcomes.

Now that we have all the outcomes in mind, we can answer the question: *what’s the probability that we pick a red ball?*

Well, how many outcomes contain a “red ball”?

Looks like 100. This means that 100 of 200 outcomes would give us the result “we got a red ball” — 100/200 makes 50%

Note how this boiled down to just “counting” the outcomes we cared about. Is it really that easy? Let’s try with a harder example.

This question can get pretty hairy to answer from what we learned in high school. *Given* that we chose a red ball, what’s the chance that it was Box 1? Well, there are 99 red balls in Box 1, and only 1 red ball in Box 2, so the chance that it came from Box 1 is *very high.* But how high?

We may recall Bayes Theorem here, but the formula can be hard to remember.

However, if we think in sets, we can *kind of* derive Bayes Theorem. Let’s look at our outcomes again:

How many of these outcomes contain “red ball”?

Yup, 100 total. Since we *know* we got a red ball, this means that we could have *only gotten* one of these 100 outcomes.

Out of these outcomes, how many come from “Box 1”?

That's 99 outcomes. So out of 100 outcomes that could have happened, 99 of those came from Box 1. 99/100 and you have a 99% chance that given a red ball, it came from Box 1.

This too, just came down to counting the number of outcomes. Now, it can get a lot more difficult — what if you can’t possibly count the number of outcomes? what if each outcome has a different probability? But, just from this notion of events forming a set of possible outcomes, we can chug along and derive out quite a bit.

(1) Alexandre came up with a pretty beautiful intuitive solution to question 1: consider symmetry — since the problem is symmetric (you can reverse white and red), it implies the only solution could be 50%

*Thanks to Daniel Woelfel, Alexandre Lebrun, Bipin Suresh, Mark Shlick, Davit Magaltadze for reviewing drafts of this essay*