ProbabilityExpectation and Variance
We often want to distill a random variable's distribution down to a single number. For example, consider the height of an individual selected uniformly at random from a given population. This is a random variable, and communicating its distribution would involve communicating the heights of every person in the population. However, we can summarize the distribution by reporting an average height: we add up the heights of the people in the population and
If the random individual is selected according to some non-uniform probability distribution on the population, then it makes sense to calculate a
The expectation (or mean ) of a random variable is the probability-weighted average of :
For example, the expected number of heads in two fair coin flips is
There are two common ways of interpreting expected value.
- The expectation may be thought of as the value of a random game with payout . According to this interpretation, you should be willing to pay anything less than $1 to play the game where you get a dollar for each head in two fair coin flips. For more than $1 you should be unwilling to play the game, and at $1 you should be indifferent.
- The second way of thinking about expected value is as a long-run average. If you play the dollar-per-head two-coin-flip game a very large number of times, then your average payout per play is very likely to be close to $1.
We can test this second interpretation out:
Use the expression
sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6
mean(rand(0:1) + rand(0:1) for k=1:10^6) to play the dollar-per-head two-coin-flip game a million times and calculate the average payout in those million runs.
How close to 1 is the result typically? Choose the best answer.
from numpy.random import randint sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6
mean(rand(0:1) + rand(0:1) for k=1:10^6)
Solution. Running the code several times, we see that the error is seldom as large as 0.01 or as small as 0.0000001. So the correct answer choice is the third one.
We will see that this second interpretation is actually a theorem in probability, called the law of large numbers. In the meantime, however, this interpretation gives us a useful tool for investigation: if a random variable is easy to simulate, then we can sample from it many times and calculate the average of the resulting samples. This will not give us the expected value exactly, but we can get as close as desired by using sufficiently many samples. This is called the Monte Carlo method of approximating the expectation of a random variable.
Use a Monte Carlo simulation to estimate the expectation of , where and are independent die rolls.
import numpy as np
sum(randint(1,7)/randint(1,7) for i in range(10_000_000))/10_000_000 returns approximately 1.43. The actual mean is
sum(x/y for x in range(1,7) for y in range(1,7))/36, which is . So we can say that the Monte Carlo result with 10 million trials is quite close to the correct value.
mean(rand(1:6)/rand(1:6) for i=1:10^8) returns approximately 1.43. The actual mean is
mean(x/y for x=1:6, y=1:6), which is , so we can say that the Monte Carlo result with 100 million trials is very close to the correct value.
The following exercise confirms an intuitive fact about expectation: a random variable which is always larger than another has a larger mean. We will state this idea with "larger" replaced by its weak version "at least as large as".
Explain why if for all .
Solution. If for all then we have
Expectation and distribution
Although the definition involves the probability space , we can also write a formula for expectation in terms of the probability mass function of the distribution of :
The expectation of a discrete random variable is equal to
The idea is that the given formula is just a rearrangement of the terms in the definition of expectation. Let's begin by considering an example. Suppose with probability mass function satisfying , , and . Suppose and . Then
We can group the first two terms together to get
This expression is the one we would get if we wrote out
Therefore, we can see that the two sides are the same.
Let's write this idea down in general form. We group terms on the right-hand side in the formula according to the value of :
Then we can replace with and pull it out of the inside sum to get
Since is equal to , we get
The expectation of a random variable need not be finite or even well-defined. Show that the expectation of the random variable which assigns a probability mass of to the point (for all ) is not finite.
Consider a random variable whose distribution assigns a probability mass of to each point for and a probability mass of to for each . Show that is not well-defined. (Note: a sum is not defined if and are equal to and ,
Solution. We multiply the probability mass at each point by the location and sum to get
For the second distribution, the positive and negative parts of the are both infinite for the same reason. Therefore, the sum does not make sense and the mean is therefore not well-defined.
We can also work out the expectation of a function of two
If , and and are discrete random variables defined on the same probability space, then
Proof. We use the same idea we used in the proof of the expectation formula: group terms in the definition of expectation according the value of the pair . We get
We can use this theorem to show that expectation distributes across multiplication for independent random variables:
Exercise (independence product formula)
Show that if and are independent random variables.
Solution. Using the definition of independence, we have
The expectation of a random variable gives us some coarse information about where on the number line the random variable's probability mass is located. The variance gives us some information about how widely the probability mass is spread around its mean. A random variable whose distribution is highly concentrated about its mean will have a small variance, and a random variable which is likely to be very far from its mean will have a large variance. We define the variance of a random variable to be the average squared distance from to its mean:
The variance of a random variable is defined to be
The standard deviation of is the square root of the variance:
Consider a random variable which is obtained by making a selection from the list
uniformly at random. Make a rough estimate of the mean and variance of this random variable just from looking at the number line. Then use Python to calculate the mean and variance exactly to see how close your estimates were.
import numpy as np
Solution. My estimate of the mean and variance are and ,
Calculating the mean exactly using
m = mean([0.245, 0.874, 0.998, 0.567, 0.482]), we get a value of 0.6332. Calculating the variance exactly using
mean([(a-m)^2 for a in A]) (where is the array above), we get a value of 0.074. Therefore, my estimate was a little high.
Consider the following game. We begin by picking a number in with uniform probability. If that number is less than , we pick another number from the same distribution and add it to the first. We repeat this procedure until the running sum exceeds . Let be the random variable whose value is the number of draws needed to end the game. Use a simulation to approximate the expected value and variance of . Include your code in your answer as well as some discussion of your results.
rand(0:1000)/1000 returns a sample from the desired distribution. Also, it's a good idea to wrap a single run of the game into a zero-argument function.
import numpy as np
Solution. We define a function
run which plays the game once, and we record the result of the game over a million runs. We estimate the mean as the mean of the resulting list, and we estimate the variance using
import numpy as np def runs_till_over(): s = 0 ctr = 0 while s < 1.0: s += np.random.randint(0,1001)/1000 return ctr A = [runs_till_over() for _ in range(1_000_000)] μ = np.mean(A) var = np.mean((a-μ)**2 for a in A) μ,var
function runs_till_over() s = 0 ctr = 0 while s < 1.0 s += rand(1:1000)/1000 end ctr end A = [runs_till_over() for _ in 1:1_000_000] μ = mean(A) var = mean((a-μ)^2 for a in A) μ,var
We get a mean of about , and a variance of about .
We can use linearity of expectation to rewrite the formula for variance in a simpler form:
We can use this formula to show how variance interacts with linear operations:
Show that variance satisfies the properties
if is a real number and is a random variable, and if and are independent random variables,
Proof. The first part of the statement follows easily from linearity of expectation
Since by linearity, we have
Rearranging and using linearity of expectation, we get
The desired result follows because if and are independent, by the
Consider the distribution which assigns a probability mass of to each integer point , where is equal to the reciprocal of .
Show that this distribution has a finite mean but not a finite variance.
Solution. Let be a random variable with this distribution. Then
Since the sum on the right converges by the -test, it follows that is finite. On the other hand,
does not converge because of the harmonic series term. Therefore is infinite.