2 Foundations

2.1 Probability Theory

  • Ω\Omega : An space of possible outcomes.
  • S\mathcal{S} : A set of measurable events to which we are willing to assign probabilities. αS  αΩ\forall \alpha \in \mathcal{S} \ \cdot \ \alpha \subseteq \Omega

Probability theory requires that S\mathcal{S} satisfy three basic properties:

  • S,ΩS \varnothing \in \mathcal{S}, \Omega \in \mathcal{S}
  • Closed under union: α,βS  αβS \forall \alpha, \beta \in \mathcal{S} \ \cdot \ \alpha \cup \beta \in \mathcal{S}
  • Closed under complementation: αS  (Ωα)S \forall \alpha \in \mathcal{S} \ \cdot \ (\Omega - \alpha) \in \mathcal{S}

The latter two properties implies that S\mathcal{S} is also closed under other boolean operations, such as intersection and set difference

Definition 2.1 (Probability Distribution) A probability distribution PP over (Ω,S) (\Omega, \mathcal{S}) is a mapping from events in Ω\Omega to real values that satisfies the following conditions:

  • Probabilities are non negative : αS  P(α)0 \forall \alpha \in \mathcal{S} \ \cdot \ P(\alpha) \ge 0 .
  • Trival event has the maximal possible probability of 1 : P(Ω)=1 P(\Omega) = 1 .
  • α,βS  αβ=P(αβ)=P(α)+P(β) \forall \alpha, \beta \in \mathcal{S} \ \cdot \ \alpha \cap \beta = \varnothing \Rightarrow P(\alpha \cup \beta) = P(\alpha) + P(\beta) .

This definition implies:

  • P()=0 P(\varnothing) = 0 .
  • P(αβ)=P(α)+P(β)P(αβ) P(\alpha \cup \beta) = P(\alpha) + P(\beta) - P(\alpha \cap \beta) .

The conditional probability of β\beta given α\alpha is defined as:

P(βα)=P(αβ)P(α) P(\beta | \alpha) = \dfrac{P(\alpha \cap \beta)}{P(\alpha)}

  • The conditional probability is not defined when P(α)=0 P(\alpha) = 0
  • The conditional probability given an event (say α\alpha) satisfies the properties of definition 2.1, and thus it is a probability distribution by its own right. Hence, we can think of the conditioning operation as taking one distribution and returning another over the same probability space

The chain rule of conditional probabilities:

  • P(αβ)=P(α)P(βα) P(\alpha \cap \beta) = P(\alpha) P(\beta | \alpha).
  • P(α1α2αk)=P(α1)P(α2α1)P(α3α1α2)P(αkα1α2αk1) P(\alpha_1 \cap \alpha_2 \cdots \cap \alpha_k) = P(\alpha_1)P(\alpha_2 | \alpha_1) P(\alpha_3 | \alpha_1 \cap \alpha_2) \cdots P(\alpha_k | \alpha_1 \cap \alpha_2 \cdots \cap \alpha_{k-1} )

Bayes' rule

P(αβ)=P(βα)P(α)P(β) P(\alpha | \beta) = \dfrac{P(\beta | \alpha)P(\alpha)}{P(\beta)}

P(αβγ)=P(βαγ)P(αγ)P(βγ) P(\alpha | \beta \cap \gamma) = \dfrac{P(\beta | \alpha \cap \gamma)P(\alpha | \gamma)}{P(\beta | \gamma)}

A random variable is a way of reporting an attribute of the outcome.

A random variable is a function that associates with each outcome in Ω\Omega a value.

  • Categorical random variables that take one of a few values.
  • Random variables thta take integer value.
  • Random variables that take real values.

Val(X)Val(X): the set of values that a random variable XX can take.

Marginal distribution: A probability distribution over one random variable.

Joint distribution: A probability distribution over two or more random variables.

Conditional Probability: .

P(XY) P(X | Y) : A set of conditional probability distributions.

Bayes' rule in term of conditional probability distributions:

P(XY)=P(X)P(YX)P(Y) P(X | Y) = \dfrac{P(X)P(Y | X)}{P(Y)}

Independence

Definition 2.2 (Independent events) An event α\alpha is independent of event β\beta in PP, denoted P(αβ)P \models (\alpha \bot \beta) , if P(αβ)=P(α)P(\alpha | \beta) = P(\alpha) or if P(β)=0P(\beta) = 0.

Proposition 2.1 A distribution PP satisfy (αβ)(\alpha \bot \beta), if and only if P(αβ)=P(α)P(β)P(\alpha \cap \beta) = P(\alpha)P(\beta).

Conditional Independence

Definition 2.3 (Conditional Independent) An event α\alpha is conditionally independent of event β\beta given event γ\gamma, denoted P(αβγ)P \models (\alpha \bot \beta | \gamma), if P(αβγ)=P(αγ)P(\alpha | \beta \cap \gamma ) = P(\alpha | \gamma), or if P(βγ)=0P(\beta \cap \gamma) = 0.

Proposition 2.2 P(αβγ)P \models (\alpha \bot \beta | \gamma), if and only if P(αβγ)=P(αγ)P(βγ)P(\alpha \cap \beta | \gamma) = P(\alpha | \gamma) P(\beta | \gamma).

Independence of Random Variables

Definition 2.4 (Conditional Independence) Let X,Y,Z\mathcal{X}, \mathcal{Y}, \mathcal{Z} be sets of random variables. X\mathcal{X} is conditionally independent of Y\mathcal{Y} given Z\mathcal{Z} in a distribution PP, if PP satisfies (X=xY=yZ=z)(\mathcal{X}=\mathcal{x} \bot \mathcal{Y} = \mathcal{y} | \mathcal{Z} = \mathcal{z}), for all values xX,yY\mathcal{x} \in \mathcal{X}, \mathcal{y} \in \mathcal{Y}, and zZ\mathcal{z} \in \mathcal{Z}.

If Z\mathcal{Z} is empty, then we write (XY)(\mathcal{X} \bot \mathcal{Y}), and say that the two sets of random variables are marginally independent.

Proposition 2.3 The distribution PP satisfies (XYZ)(\mathcal{X} \bot \mathcal{Y} | \mathcal{Z}), if and only if P(X,YZ)=P(XZ)P(YZ)P(\mathcal{X}, \mathcal{Y} | \mathcal{Z}) = P(\mathcal{X} | \mathcal{Z})P(\mathcal{Y}|\mathcal{Z}).

  • Symmetry: (XYZ)(YXZ) (\mathcal{X} \bot \mathcal{Y} | \mathcal{Z}) \Rightarrow (\mathcal{Y} \bot \mathcal{X} | \mathcal{Z}).

  • Decomposition: (XY,WZ)(XYZ)(\mathcal{X} \bot \mathcal{Y},\mathcal{W} | \mathcal{Z}) \Rightarrow (\mathcal{X} \bot \mathcal{Y} | \mathcal{Z})

Proof: P(X,YZ)=wP(X,Y,wZ)=wP(XZ)P(Y,wZ)=P(XZ)wP(Y,wZ)=P(XZ)P(YZ) \begin{aligned} P(\mathcal{X},\mathcal{Y}|\mathcal{Z}) & = \sum_w P(\mathcal{X},\mathcal{Y},w|\mathcal{Z})\\ & = \sum_w P(\mathcal{X}|\mathcal{Z})P(\mathcal{Y},w|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z}) \sum_w P(\mathcal{Y},w|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z}) P(\mathcal{Y}|\mathcal{Z}) \end{aligned}

  • Weak union: (XY,WZ)(XYZ,W) (\mathcal{X} \bot \mathcal{Y},\mathcal{W} | \mathcal{Z}) \Rightarrow (\mathcal{X} \bot \mathcal{Y} | \mathcal{Z},\mathcal{W})

Proof:

P(X,YZ,W)=P(X,Y,WZ)P(W)=P(XZ)P(Y,WZ)P(W)=P(XZ)P(WZ)P(WZ)P(YZ,W)=P(X,WZ)P(WZ)P(YZ,W)=P(XZ,W)P(YZ,W) \begin{aligned} P(\mathcal{X}, \mathcal{Y} | \mathcal{Z},\mathcal{W}) & = \dfrac{P(\mathcal{X},\mathcal{Y},\mathcal{W}|\mathcal{Z})}{P(\mathcal{W})} \\ & = P(\mathcal{X}|\mathcal{Z}) \dfrac{P(\mathcal{Y},\mathcal{W}|\mathcal{Z})}{P(\mathcal{W})}\\ & = \dfrac{P(\mathcal{X}|\mathcal{Z})P(\mathcal{W}|\mathcal{Z})}{P(\mathcal{W}|\mathcal{Z})} P(\mathcal{Y} | \mathcal{Z},\mathcal{W})\\ & = \dfrac{P(\mathcal{X},\mathcal{W}|\mathcal{Z})}{P(\mathcal{W}|\mathcal{Z})} P(\mathcal{Y} | \mathcal{Z},\mathcal{W})\\ & = P(\mathcal{X} | \mathcal{Z},\mathcal{W})P(\mathcal{Y} | \mathcal{Z},\mathcal{W})\\ \end{aligned}

  • Contraction: (XWZ,Y)&(XYZ)(XY,WZ) (\mathcal{X} \bot \mathcal{W}|\mathcal{Z},\mathcal{Y})\&(\mathcal{X} \bot \mathcal{Y}|\mathcal{Z}) \Rightarrow (\mathcal{X} \bot \mathcal{Y},\mathcal{W} | \mathcal{Z})

Proof:

P(X,Y,WZ)=P(X,WZ,Y)P(YZ)=P(XZ,Y)P(WZ,Y)P(YZ)=P(X,YZ)P(YZ)P(W,YZ)P(YZ)P(YZ)=P(XZ)P(W,YZ) \begin{aligned} P(\mathcal{X},\mathcal{Y},\mathcal{W}|\mathcal{Z}) & = P(\mathcal{X},\mathcal{W}|\mathcal{Z},\mathcal{Y})P(\mathcal{Y}|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z},\mathcal{Y})P(\mathcal{W}|\mathcal{Z},\mathcal{Y})P(\mathcal{Y}|\mathcal{Z})\\ & = \dfrac{P(\mathcal{X},\mathcal{Y}|\mathcal{Z})}{P(\mathcal{Y}|\mathcal{Z})} \dfrac{P(\mathcal{W},\mathcal{Y}|\mathcal{Z})}{P(\mathcal{Y}|\mathcal{Z})} P(\mathcal{Y}|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z})P(\mathcal{W},\mathcal{Y}|\mathcal{Z}) \end{aligned}

Definition 2.5 (Positive Distribution) A distribution PP is said to be positive, if αS  αP(α)>0\forall \alpha \in \mathcal{S} \ \cdot \ \alpha \ne \varnothing \Rightarrow P(\alpha) > 0.

  • Intersection: For positive distribution, and for mutually disjoint sets X,Y,Z,W\mathcal{X},\mathcal{Y},\mathcal{Z},\mathcal{W}:

    (XYZ,W)&(XWZ,Y)(XY,WZ) (\mathcal{X} \bot \mathcal{Y}|\mathcal{Z},\mathcal{W})\&(\mathcal{X} \bot \mathcal{W}|\mathcal{Z},\mathcal{Y}) \Rightarrow (\mathcal{X} \bot \mathcal{Y},\mathcal{W}|\mathcal{Z})

    Proof:

P(X,Y,WZ)=P(X,YZ,W)P(WZ)=P(XZ,W)P(YZ,W)P(WZ)=P(XZ,W)P(Y,WZ)P(WZ)P(WZ)=P(XZ,W)P(Y,WZ)=?=P(XZ)P(Y,WZ) \begin{aligned} P(\mathcal{X},\mathcal{Y},\mathcal{W}|\mathcal{Z}) & = P(\mathcal{X},\mathcal{Y}|\mathcal{Z},\mathcal{W})P(\mathcal{W}|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z},\mathcal{W})P(\mathcal{Y}|\mathcal{Z},\mathcal{W})P(\mathcal{W}|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z},\mathcal{W}) \dfrac{P(\mathcal{Y},\mathcal{W}|\mathcal{Z})}{P(\mathcal{W}|\mathcal{Z})}P(\mathcal{W}|\mathcal{Z})\\ & = P(\mathcal{X}|\mathcal{Z},\mathcal{W}) P(\mathcal{Y},\mathcal{W}|\mathcal{Z})\\ & = ? \\ & = P(\mathcal{X}|\mathcal{Z})P(\mathcal{Y},\mathcal{W}|\mathcal{Z})\\ \end{aligned}

X Y Z


results matching ""

    No results matching ""