Bayes Theorem from Scratch

Whenever you hear a poorly descriptive name like “Bayes Theorem”, the last thing you’d want to do is learn what it is. Probability is one of those topics that is encountered all the time but is poorly taught in high school (I’m looking at you, Personal Finance). I’m here to show you it’s a simple idea, and it really should’ve been named The Easiest-to-Understand Theorem in Probability. After reading this, you’ll always be able to derive it from scratch, and you won’t get scared of calculating probability anymore. The one thing to keep in mind is that it takes more effort to learn some notation and terminology, rather than to understand the concept itself.

Let’s start with the most basic concept you already know – the probability of something happening.

It’s easy to visualize things when we have examples, so let’s take my favorite active NBA player today – Klay Thompson. Let’s say Klay takes 10 shots and makes 6 of them. Given this set of data, his probability of making a shot is 6 out of 10 (or 60%).

In this scenario, there are only two outcomes – he either makes a shot or misses a shot. Both outcomes need to add up to 100% since they represent all the possibilities that can happen.

That means his probability of missing a shot is 40%, or 4/10.

Let’s visualize this:

Let’s call Klay a “node”, and the possibilities coming from this node are called “branches”. There is a 6/10 chance to follow the “Makes” branch, and a 4/10 chance to follow the “Misses” branch. All branches stemming from a node need to sum to 1, since the branches represent all the possible outcomes. In reality, there can be more than 2 branches – in fact, as many branches as there are possible outcomes! We only stick with 2 here to keep things simple. In physics, we jokingly call this a “spherical cow” (the joke is that nothing in the real-world is a perfect sphere, yet we solve all our problems as if they are). So here, to keep things simple, we stick with just 2 branches.

Klay is feeling a little lonely, so let’s introduce his Splash Brother – another NBA player named Steph Curry.

Steph Curry is a great shooter. He breaks shooting records with no regard for human life. As a main contributor, he is on the basketball court (or known as “on the floor”) most of the time. Let’s say that at any given moment, there is a ¾ chance that Steph is on the floor.

When he’s on the basketball court, players tend to be more aware of him. Since defenders are occupied with Steph, Klay is more open and tends to make more of his shots. Klay can make shots without Steph on the floor, as we showed above, but he tends to make more of them when Steph is on the floor. When Steph is on the floor, Klay makes 80% (8/10) of his shots, and therefore misses 20% (2/10) of them. Let’s visualize all of this in a tree:

Just to recap, when Steph is on the floor, Klay makes 8/10 of his shots. When Steph is off the floor, Klay makes 6/10 of his shots. There is a ¾ chance that Steph is on the floor at any given moment. As a reminder, all branches stemming from a node sum to 1.

Now we can answer all sorts of questions with the above graph.

For instance, what is the probability that Klay misses a shot when Steph is off the floor ? (Answer: 2/10)

What is the probability that Klay makes a shot when Steph is off the floor ? (Answer: 6/10)

These kinds of probabilities are called “conditional probabilities”, because some “event” needs to happen before we answer the “probability” of something happening. In the 2 examples above, that “event” is whether or not Steph is on the floor, and the “probability” that we’re asking is whether Klay makes his shot.

We can represent a conditional probability like this:

P(k = makes \:|\: s = on\:the\:floor)

That is read: “What is the probability that Klay makes his shot given that Steph is on the floor”. A shorter version is this:

P(k|s)

Which is another way of saying “What is the probability of k given s”. As you can guess, the probability is the first parameter, the “|” translates to “given”, and the last parameter is the event that must take place. Don’t worry if that doesn’t make sense – you get used to it the more you see it. It’s just notation that mathematicians have agreed upon.

Another way to look at conditional probabilities is using a Venn diagram.

We notice from the diagram that “Steph being on the floor” is all the “given” part of the conditional probability – this is everything inside the red circle.

Using this Venn diagram, we can come up with an equation for

P(k|s)

Where k = Klay makes a shot, and s = Steph is on the floor. In other words, again, it is the probability that Klay makes a shot while Steph is on the floor.

We know that all the possible outcomes of the probability must include “Steph being on the floor”. This is represented by the red circle above. We see that Klay making a shot AND Step being on the floor is represented by the overlap of the red and blue circles, or the area that is shaded in purple.

The purple area is also known as the “intersection between Klay making a shot and Steph being on the floor”. We can represent it in this notation:

 P(k \cap s)

This is read as the probability of “Klay making a shot and Steph being on the floor”, or “the probability of k intersect s, where k = Klay making a shot, and s = Steph being on the floor”.

It is easy to see, then, that the probability of Klay making a shot given that Steph is on the floor is equal to the probability of Klay making a shot AND Steph being on the floor, over the probability of Steph being on the floor. Mathematically, we can say:

P(k|s)=\cfrac{P(k \: \cap \: s)}{P(s)}

Where k = Klay makes a shot, and s = Steph is on the floor.

Let’s suppose we want to answer the question, “what is the probability that Klay makes a shot AND Steph is on the floor ?”

Rearranging the equation, we get:

P(k \cap s) = P(k|s) \cdot P(s)

As you can see, it’s as simple as multiplying “the probability of Klay making a shot given Steph is on the floor” by “the probability that Steph is on the floor”.

Look at the tree diagram from earlier, let’s answer the question, “what is the probability that Klay makes a shot AND Steph is on the floor”:

  • The probability of Klay making a shot given Steph is on the floor = 8/10
  • The probability that Steph is on the floor = ¾.
  • Multiply these two together gives you (8/10) * (3/4) = 24/40 or 3/5

So there is a 3/5 chance that Klay makes a shot and Steph is on the floor.

What if we wanted to answer the question, what is the probability that Steph is on the floor, given Klay makes a shot ? This is the reverse conditional probability that we had from earlier. We can represent this probability like this:

P(s|k)

We know that this must equal the probability of Steph being on the floor and Klay making a shot divided by the probability of Klay making a shot. In other words,

P(s|k) = \cfrac{P(s \: \cap \: k)}{P(k)}

How do we find the probability that Klay makes a shot? We need to find the probability that Klay makes a shot and Steph on the floor, and add it to the probability that Klay makes a shot and Steph is not on the floor. We already found the probability that Klay makes the shot and Steph is on the floor – which is 3/5.

To find the probability that Klay makes a shot and Steph is not on the floor, we can do a similar calculation using the tree diagram. Let’s also introduce one final notation. The “apostrophe” next to a variable means events that are mutually exclusive to that variable. For instance, if s means that Steph is on the floor, then s’ means that Steph is not on the floor.

P(s')

This can be read, “the probability of Steph not being on the floor”, or as the tree diagram says, ¼.

Therefore, the probability of Klay making a shot and Steph not being on the floor is:

P(k \cap s') = P(s') \cdot P(k|s')

To get the probability of Klay making a shot, we add this probability together with the probability of Klay making a shot and Steph is on the floor:

P(k) = P(k \cap s) + P(k \cap s') = P(k|s) \cdot P(s) + P(k|s') \cdot P(s')

Therefore, the probability of Steph being on the floor given that Klay has made a shot is:

P(s|k) = \cfrac{P(s\:\cap\:k)}{P(k|s) \cdot P(s) + P(k|s') \cdot P(s')}

Furthermore, we can see that:

P(s \cap k) = P (k \cap s)

Where we know that the right hand side is really:

P(k \cap s) = P(k|s) \cdot P(s)

Finally, we can rewrite the probability of Steph being on the floor given that Klay has made a shot:

P(s|k) = \cfrac{P(k|s) \cdot P(s)}{P(k|s) \cdot P(s) + P(k|s') \cdot P(s')}

And now, we’re done!

Wait a sec, you protest. Where did Bayes theorem come in ?

We actually derived it much earlier, and then went on to show the “extended” form when the events are binary (here, our events are binary because either Steph is on the floor or he isn’t, and either Klay makes a shot or he doesn’t).

The whole point of Bayes theorem is to get from

P(k|s)

to

P(s|k)

We showed this earlier:

P(s|k) = \cfrac{P(s \cap k)}{P(k)}

Which could be simplified to:

P(s|k) = \cfrac{P(k|s) \cdot P(s)}{P(k)}

That is Bayes theorem. When the events are binary, then we can show what P(k) really is:

P(s|k) = \cfrac{P(k|s) \cdot P(s)}{P(k|s) \cdot P(s) + P(k|s') \cdot P(s')}

That wasn’t so bad, was it? See, I told you the hardest part was learning the notation!

Let’s actually find this using the tree diagram!

P(s|k) = \cfrac{8/10 \cdot 3/4}{3/4 \cdot 8/10 + 1/4 \cdot 6/10} = \cfrac{3/5}{3/5 + 3/20} = \cfrac{3/5}{15/20} = 4/5

There’s a 4/5, or 80% chance that Steph is on the floor when Klay made a shot!