2 Foundations
2.1 Probability Theory
- Ω : An space of possible outcomes.
- S : A set of measurable events to which we are willing to assign probabilities. ∀α∈S ⋅ α⊆Ω
Probability theory requires that S satisfy three basic properties:
- ∅∈S,Ω∈S
- Closed under union: ∀α,β∈S ⋅ α∪β∈S
- Closed under complementation: ∀α∈S ⋅ (Ω−α)∈S
The latter two properties implies that S is also closed under other boolean operations, such as intersection and set difference
Definition 2.1 (Probability Distribution) A probability distribution P over (Ω,S) is a mapping from events in Ω to real values that satisfies the following conditions:
- Probabilities are non negative : ∀α∈S ⋅ P(α)≥0.
- Trival event has the maximal possible probability of 1 : P(Ω)=1.
- ∀α,β∈S ⋅ α∩β=∅⇒P(α∪β)=P(α)+P(β).
This definition implies:
- P(∅)=0.
- P(α∪β)=P(α)+P(β)−P(α∩β).
The conditional probability of β given α is defined as:
P(β∣α)=P(α)P(α∩β)
- The conditional probability is not defined when P(α)=0
- The conditional probability given an event (say α) satisfies the properties of definition 2.1, and thus it is a probability distribution by its own right. Hence, we can think of the conditioning operation as taking one distribution and returning another over the same probability space
The chain rule of conditional probabilities:
- P(α∩β)=P(α)P(β∣α).
- P(α1∩α2⋯∩αk)=P(α1)P(α2∣α1)P(α3∣α1∩α2)⋯P(αk∣α1∩α2⋯∩αk−1)
Bayes' rule
P(α∣β)=P(β)P(β∣α)P(α)
P(α∣β∩γ)=P(β∣γ)P(β∣α∩γ)P(α∣γ)
A random variable is a way of reporting an attribute of the outcome.
A random variable is a function that associates with each outcome in Ω a value.
- Categorical random variables that take one of a few values.
- Random variables thta take integer value.
- Random variables that take real values.
Val(X): the set of values that a random variable X can take.
Marginal distribution: A probability distribution over one random variable.
Joint distribution: A probability distribution over two or more random variables.
Conditional Probability: .
P(X∣Y) : A set of conditional probability distributions.
Bayes' rule in term of conditional probability distributions:
P(X∣Y)=P(Y)P(X)P(Y∣X)
Independence
Definition 2.2 (Independent events) An event α is independent of event β in P, denoted P⊨(α⊥β), if P(α∣β)=P(α) or if P(β)=0.
Proposition 2.1 A distribution P satisfy (α⊥β), if and only if P(α∩β)=P(α)P(β).
Conditional Independence
Definition 2.3 (Conditional Independent) An event α is conditionally independent of event β given event γ, denoted P⊨(α⊥β∣γ), if P(α∣β∩γ)=P(α∣γ), or if P(β∩γ)=0.
Proposition 2.2 P⊨(α⊥β∣γ), if and only if P(α∩β∣γ)=P(α∣γ)P(β∣γ).
Independence of Random Variables
Definition 2.4 (Conditional Independence) Let X,Y,Z be sets of random variables. X is conditionally independent of Y given Z in a distribution P, if P satisfies (X=x⊥Y=y∣Z=z), for all values x∈X,y∈Y, and z∈Z.
If Z is empty, then we write (X⊥Y), and say that the two sets of random variables are marginally independent.
Proposition 2.3 The distribution P satisfies (X⊥Y∣Z), if and only if P(X,Y∣Z)=P(X∣Z)P(Y∣Z).
Symmetry: (X⊥Y∣Z)⇒(Y⊥X∣Z).
Decomposition: (X⊥Y,W∣Z)⇒(X⊥Y∣Z)
Proof:
P(X,Y∣Z)=w∑P(X,Y,w∣Z)=w∑P(X∣Z)P(Y,w∣Z)=P(X∣Z)w∑P(Y,w∣Z)=P(X∣Z)P(Y∣Z)
- Weak union: (X⊥Y,W∣Z)⇒(X⊥Y∣Z,W)
Proof:
P(X,Y∣Z,W)=P(W)P(X,Y,W∣Z)=P(X∣Z)P(W)P(Y,W∣Z)=P(W∣Z)P(X∣Z)P(W∣Z)P(Y∣Z,W)=P(W∣Z)P(X,W∣Z)P(Y∣Z,W)=P(X∣Z,W)P(Y∣Z,W)
- Contraction: (X⊥W∣Z,Y)&(X⊥Y∣Z)⇒(X⊥Y,W∣Z)
Proof:
P(X,Y,W∣Z)=P(X,W∣Z,Y)P(Y∣Z)=P(X∣Z,Y)P(W∣Z,Y)P(Y∣Z)=P(Y∣Z)P(X,Y∣Z)P(Y∣Z)P(W,Y∣Z)P(Y∣Z)=P(X∣Z)P(W,Y∣Z)
Definition 2.5 (Positive Distribution) A distribution P is said to be positive, if ∀α∈S ⋅ α≠∅⇒P(α)>0.
Intersection: For positive distribution, and for mutually disjoint sets X,Y,Z,W:
(X⊥Y∣Z,W)&(X⊥W∣Z,Y)⇒(X⊥Y,W∣Z)
Proof:
P(X,Y,W∣Z)=P(X,Y∣Z,W)P(W∣Z)=P(X∣Z,W)P(Y∣Z,W)P(W∣Z)=P(X∣Z,W)P(W∣Z)P(Y,W∣Z)P(W∣Z)=P(X∣Z,W)P(Y,W∣Z)=?=P(X∣Z)P(Y,W∣Z)
X
Y
Z