3.4. Multi-dimensional stochastic variables#
joint distribution
\[p_{XY}(x,y) \]marginal distribution. For continuous variables
\[p_X(x) := \int_{y} p_{XY}(x,y) \, dy\]while for discrete variables
\[p_X(x_i) = \sum_j p_{XY}(x_i,y_j)\]conditional distribution, \(p_{X|Y}(x|y)\). The following holds
\[p_{XY} = p_{X|Y} \, p_Y = p_{Y|X} p_X\]
For continuous r.v., integrating over \(x\) the relation \(p(x,y) = p(x|y) p(y)\)
as the normalization condition holds for conditional distribution \(p(x|y)\).
Property 3.1
3.4.1. Moments#
expected value
\[\boldsymbol{\mu}_{\mathbf{X}} := \mathbb{E}\left[ \mathbf{X} \right] = \int_{\mathbf{x}} p(\mathbf{x}) \, \mathbf{x} \, d \mathbf{x}\]covariance
\[\boldsymbol{\sigma}^2_{\mathbf{X}} := \mathbb{E} \left[ \Delta \mathbf{X} \, \Delta \mathbf{X}^T \right] = \int_{\mathbf{x}} p(\mathbf{x}) \, \Delta \mathbf{x} \Delta \mathbf{x}^T \, d \mathbf{x} \ ,\]with \(\Delta \mathbf{X} := \mathbf{X} - \boldsymbol{\mu}_{\mathbf{X}} \), and \(\Delta \mathbf{x} = \mathbf{x} - \boldsymbol{\mu}_{\mathbf{X}}\).
Taking a pair of components \(X_i\), \(X_j\) of the random vector \(\mathbf{X}\), their covariance is the \(ij\) component of the array \(\boldsymbol{\sigma}^2\),
\[\sigma^2_{ij} := \mathbb{E}\left[ \Delta X_i \, \Delta X_j \right] =: \rho_{ij} \sigma_i \sigma_j \ ,\]having introduced (Pearson) correlation, \(\rho_{ij}\), between random variable \(X_i\) and \(X_j\), and being \(\sigma_i\) the standard deviation of variable \(X_i\), square root of its variance \(\sigma^2_i\),
\[\begin{split}\begin{aligned} \sigma^2_i & = \mathbb{E}\left[ \left( X_i - \mu_i \right)^2 \right] = \\ & = \int_{\mathbf{x}} (x_i - \mu_i)^2 p_{\mathbf{X}}(\mathbf{x}) d \mathbf{x} = \\ & = \int_{x_i} (x_i - \mu_i)^2 p_i (x_i) \, d x_i \end{aligned}\end{split}\]Here the integrals read
\[\begin{split}\begin{aligned} \mu_i & = \int_{\mathbf{x}} x_i \, p_{\mathbf{X}}(\mathbf{x}) x_i \, d \mathbf{x} = \\ & = \int_{\mathbf{x}} x_i \, p(x_1, x_2, \dots, x_i, \dots, x_n) d x_1 d x_2 \dots d x_i \dots d x_n = \\ & = \int_{\mathbf{x}} x_i \, p(x_i) p(x_1, x_2, \dots, x_{i-1}, x_{i+1}, \dots, x_n | x_i) d x_1 d x_2 \dots d x_i \dots d x_n = \\ & = \int_{x_i} x_i \, p(x_i) \underbrace{\int_{x_1} \dots \int_{x_n} p(x_1, x_2, \dots, x_{i-1}, x_{i+1}, \dots, x_n | x_i) d x_1 \dots d x_{i-1} d x_{i+1} \dots d x_n}_{= 1 \text{ $\forall x_i$}} d x_i = \\ & = \int_{x_i} x_i \, p(x_i) \, d x_i \ . \end{aligned}\end{split}\]
Property of correlation. \(|\rho_{XY}| \le 1\). Proof with Cauchy-Schwartz inequality todo
Notation
Here, covariance is indicated as \(\boldsymbol{\sigma}^2\). This is not a power \(2\), but just a symbol, at most recalling that covariance matrix is semi-definite positive.
Properties of covariance.
symmetric
semi-definite positive
spectrum…
3.4.2. Bayes’ theorem#
Theorem 3.1 (Bayes’ theorem)
Where \(p_Y(y) \ne 0\),
3.4.3. Statistical independence#
Definition 3.6 (Independent random variables)
Given two random variables \(X\), \(Y\) with joint distribution, the random variable \(X\) is independent from \(Y\) if its conditional probability equals its marginal probability,
i.e. the probability of \(X\) doesn’t depend on \(Y\).
3.4.3.1. Independence implies no correlation#
Given two random variables \(X\), \(Y\) are independent if \(p(x|y) = p(x)\) and thus \(p(x,y) = p(x) p(y)\). Covariance of two random variable reads
and if they’re independent, it immediately follows that their covariance \(\sigma^2_{XY}\) is zero (and so their correlation \(\rho_{XY}\))
as the expected value of the deviation from the expected value is zero, \(\mathbb{E} \left[ X - \mathbb{E}[X] \right] = 0\).
Proof for continuous r.v.
having used here the common notation abuse \(p_X(x) = p(x)\) and \((1)\) statistical independence, \(p(x,y) = p(x) p(y)\), and \((2)\) \(\mathbb{E}\left[ X - \mathbb{E}[X] \right] = 0\).
Proof for discrete r.v.
Repeat the proof for continuous r.v. using summations instead of integrals.