几个数理统计相关的问题

几个数理统计相关的问题

为什么样本方差的分母是 \(n-1\)

如果已知随机变量 \(X\) 的期望为 \(\mu\),方差 \(\sigma^{2}=E[(X-\mu)^{2}]\).

那么采样后用 \[ S^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_i-\mu)^{2} \]

来近似 \(\sigma^{2}\).

一般说来 \(\mu\) 是未知的,只有样本的均值 \[ \bar{X}=\frac{1}{n}\sum_{i=1}^{n} X_{i}. \]

我们要证明可以用 \[ S^{2}=\frac{1}{n-1}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

来近似 \(\sigma^{2}\).

首先我们知道 \[ E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\mu)^{2}]=\sigma^{2} \]

根据中心极限定理,\(S^{2}\) 的采样均值会服从 \(\mu=\sigma^{2}\) 的正态分布. 这就是所谓的无偏估计.

易知 \[ \sum_{i=1}^{n} (X_{i}-\bar{X})^{2}\leqslant \sum_{i=1}^{n} (X_{i}-\mu)^{2} \]

如果我们令

\[ \tilde{S}^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

则有 \[ \begin{aligned} E[\tilde{S}^{2}]&=E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}]=E[\frac{1}{n}\sum_{i=1}^{n} ((X_{i}-\mu)-(\bar{X}-\mu))^{2}] \\ &=E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\mu)^{2}-\frac{2}{n}(\bar{X}-\mu)\sum_{i=1}^{n} (X_{i}-\mu)+(\bar{X}-\mu)^{2}]\\ &=\sigma^{2}-E[(\bar{X}-\mu)^{2}] \end{aligned} \]

\[ \begin{aligned} E[(\bar{X}-\mu)^{2}]=E(\bar{X}-E[\bar{X}])^{2}&=var(\bar{X}) \\ &=var\left(\frac{\sum_{i=1}^{n} X_{i}}{n}\right)\\ &=\frac{1}{n^{2}}\sum_{i=1}^{n} var(X_{i}) \\ &=\frac{\sigma^{2}}{n} \end{aligned} \]

\[ E[\tilde{S}^{2}]=\frac{n-1}{n}\sigma^{2} \]

\[ S^{2}=\frac{1}{n-1}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

得到的就是无偏估计.

只考虑一阶和二阶统计量的高斯过程

设有两个高斯过程 \(M\)\(N\),如果只考虑一阶和二阶统计量,可以设 \[ M=\mu_{M}+\Sigma_{M}\sqrt{1-\rho}X_1+\Sigma_{M}\sqrt{\rho}Y, \]

\[ N=\mu_{N}+\Sigma_{N}\sqrt{1-\rho}X_2+\Sigma_{M}\sqrt{\rho}Y. \]

其中 \(X_1,X_2,Y\) 都是标准高斯过程. 这样就有

\[ E[M]=\mu_{M},E[N]=\mu_{N}. \]

\[ var(M)=\Sigma_{M}, var(N)=\Sigma_{N}. \]

\[ cov(M,N)=\rho\Sigma_{M}\Sigma_{N}. \]

这与二元正态分布是很类似的.

低通滤波

函数的低通滤波(?)对于函数 \(\kappa(t)\),定义 \(\tilde{\kappa}(t)\)\[ (1+\frac{\mathrm{d}}{\mathrm{d}t})\tilde{\kappa}(t)=\kappa(t) \]

可以认为是 \[ \frac{\mathrm{d}}{\mathrm{d}t}\tilde{\kappa}(t)=-\tilde{\kappa}(t)+\kappa(t) \]

的变形.

统计量的公式

对于一个 Markov chain \(Z \rightarrow X_1,X_2\),有 \[ \mathbb{E}_{X_1}(X_1)=\mathbb{E}_{Z}(\mathbb{E}_{X_1|Z}(X_1)), \]

\[ \operatorname{var}_{X_1}(X_1)=\mathbb{E}_{Z}(\operatorname{var}_{X_1|Z}(X_1))+\operatorname{var}_{Z}(\mathbb{E}_{X_1|Z}(X_1)), \]

\[ \operatorname{cov}_{X_1,X_2}(X_1,X_2)=\mathbb{E}_{Z}(\operatorname{cov}_{X_1,X_2|Z}(X_1,X_2))+\operatorname{cov}_{Z}(\mathbb{E}_{X_1|Z}(X_1),\mathbb{E}_{X_2|Z}(X_2)). \]

PCA, ICA, CCA, PLS

PCA, ICA

PCA 和 ICA 都是降维算法. 不同于 PCA 得到的主成分在时间和空间上都不相关(左右奇异向量都是正交的),ICA 得到的主成分只在一个部分上具有最大的统计独立性.

PCA 目标是找一组轴,使得数据在这组轴上的投影方差最大. ICA 的想法是观察到的信号是由一些独立的信号混合而成的. By the central limit theorem, any linear mixture of independent variables will be more "Gaussian" than the original variables. Thus, ICA seeks to create a new set of axes. The axes are oriented such that the projection of data points onto the axes is maximally non-Gaussian. We can use kurtois, negentropy, or mutual information to measure the non-Gaussianity.

Independent components can be interpreted as the dominant functional networks or modes of activity that contribute to the observed neuroimaging data[1].

Canonical Correlation Analysis (CCA)

The goal of CCA is to relate two sets of data, \(\mathbf{X}_{n \times p}\) and \(\mathbf{Y}_{n \times q}\), with \(p\) and \(q\) variables in their respective columns and \(n\) observations in the row.

CVA (Friston et al. 1996, Strother et al. 2002), linear discriminant analysis (LDA), and multivariate analysis of variance are special case of CCA.

We want to create pairs of new variable that are linear combinations of the original variables in \(\mathbf{X}(\mathbf{XU}(i)_{n \times 1})\) and the variables in \(\mathbf{Y}(\mathbf{YV}(i)_{n \times 1})\) and that have the maximum correlation with each other. In addition, pairs of canonical variates are mutually orthogonal with all the other pairs. Factorize the adjusted correlation matrix \[ (\mathbf{X}^{\mathsf{T}}\mathbf{X})^{-1/2} \mathbf{X}^{\mathsf{T}}\mathbf{Y}(\mathbf{Y}^{\mathsf{T}}\mathbf{Y})^{-1/2}=\mathbf{USV}^{\mathsf{T}}. \]

PLS

Reference

  1. Anthony R.McIntosh & Bratislav Misic, 2013. Multivariate Statistical Analyses for Neuroimaging Data. Annual Reviews. ↩︎

几个数理统计相关的问题
http://example.com/2023/01/19/几个数理统计相关的问题/
Author
John Doe
Posted on
January 19, 2023
Licensed under