几个数理统计相关的问题

为什么样本方差的分母是 \(n-1\) ？

如果已知随机变量 \(X\) 的期望为 \(\mu\)，方差 \(\sigma^{2}=E[(X-\mu)^{2}]\).

那么采样后用 \[ S^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_i-\mu)^{2} \]

来近似 \(\sigma^{2}\).

一般说来 \(\mu\) 是未知的，只有样本的均值 \[ \bar{X}=\frac{1}{n}\sum_{i=1}^{n} X_{i}. \]

我们要证明可以用 \[ S^{2}=\frac{1}{n-1}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

来近似 \(\sigma^{2}\).

首先我们知道 \[ E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\mu)^{2}]=\sigma^{2} \]

根据中心极限定理，\(S^{2}\) 的采样均值会服从 \(\mu=\sigma^{2}\) 的正态分布. 这就是所谓的无偏估计.

易知 \[ \sum_{i=1}^{n} (X_{i}-\bar{X})^{2}= \sum_{i=1}^{n} (X_{i}-\mu)^{2}-n(\mu-\bar{X})^{2}\leqslant \sum_{i=1}^{n} (X_{i}-\mu)^{2} \]

如果我们令

\[ \tilde{S}^{2}=\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

则有 \[ \begin{aligned} E[\tilde{S}^{2}]&=E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}]=E[\frac{1}{n}\sum_{i=1}^{n} ((X_{i}-\mu)-(\bar{X}-\mu))^{2}] \\ &=E[\frac{1}{n}\sum_{i=1}^{n} (X_{i}-\mu)^{2}-\frac{2}{n}(\bar{X}-\mu)\sum_{i=1}^{n} (X_{i}-\mu)+(\bar{X}-\mu)^{2}]\\ &=\sigma^{2}-E[(\bar{X}-\mu)^{2}] \end{aligned} \]

而 \[ \begin{aligned} E[(\bar{X}-\mu)^{2}]=E(\bar{X}-E[\bar{X}])^{2}&=var(\bar{X}) \\ &=var\left(\frac{\sum_{i=1}^{n} X_{i}}{n}\right)\\ &=\frac{1}{n^{2}}\sum_{i=1}^{n} var(X_{i}) \\ &=\frac{\sigma^{2}}{n} \end{aligned} \]

故 \[ E[\tilde{S}^{2}]=\frac{n-1}{n}\sigma^{2} \]

故 \[ S^{2}=\frac{1}{n-1}\sum_{i=1}^{n} (X_{i}-\bar{X})^{2} \]

得到的就是无偏估计.

只考虑一阶和二阶统计量的高斯过程

设有两个高斯过程 \(M\) 和 \(N\)，如果只考虑一阶和二阶统计量，可以设 \[ M=\mu_{M}+\Sigma_{M}\sqrt{1-\rho}X_1+\Sigma_{M}\sqrt{\rho}Y, \]

\[ N=\mu_{N}+\Sigma_{N}\sqrt{1-\rho}X_2+\Sigma_{M}\sqrt{\rho}Y. \]

其中 \(X_1,X_2,Y\) 都是标准高斯过程. 这样就有

\[ E[M]=\mu_{M},E[N]=\mu_{N}. \]

\[ var(M)=\Sigma_{M}, var(N)=\Sigma_{N}. \]

\[ cov(M,N)=\rho\Sigma_{M}\Sigma_{N}. \]

这与二元正态分布是很类似的.

低通滤波

函数的低通滤波（？）对于函数 \(\kappa(t)\)，定义 \(\tilde{\kappa}(t)\) 为 \[ (1+\frac{\mathrm{d}}{\mathrm{d}t})\tilde{\kappa}(t)=\kappa(t) \]

可以认为是 \[ \frac{\mathrm{d}}{\mathrm{d}t}\tilde{\kappa}(t)=-\tilde{\kappa}(t)+\kappa(t) \]

的变形.

统计量的公式

对于一个 Markov chain \(Z \rightarrow X_1,X_2\)，有 \[ \mathbb{E}_{X_1}(X_1)=\mathbb{E}_{Z}(\mathbb{E}_{X_1|Z}(X_1)), \]

\[ \operatorname{var}_{X_1}(X_1)=\mathbb{E}_{Z}(\operatorname{var}_{X_1|Z}(X_1))+\operatorname{var}_{Z}(\mathbb{E}_{X_1|Z}(X_1)), \]

\[ \operatorname{cov}_{X_1,X_2}(X_1,X_2)=\mathbb{E}_{Z}(\operatorname{cov}_{X_1,X_2|Z}(X_1,X_2))+\operatorname{cov}_{Z}(\mathbb{E}_{X_1|Z}(X_1),\mathbb{E}_{X_2|Z}(X_2)). \]

PCA, ICA, CCA, PLS

PCA, ICA

PCA 和 ICA 都是降维算法. 不同于 PCA 得到的主成分在时间和空间上都不相关（左右奇异向量都是正交的），ICA 得到的主成分只在一个部分上具有最大的统计独立性.

PCA 目标是找一组轴，使得数据在这组轴上的投影方差最大. ICA 的想法是观察到的信号是由一些独立的信号混合而成的. By the central limit theorem, any linear mixture of independent variables will be more "Gaussian" than the original variables. Thus, ICA seeks to create a new set of axes. The axes are oriented such that the projection of data points onto the axes is maximally non-Gaussian. We can use kurtois, negentropy, or mutual information to measure the non-Gaussianity.

Independent components can be interpreted as the dominant functional networks or modes of activity that contribute to the observed neuroimaging data^[1].

cvPCA

cross-validated PCA is a method to derive unbiased estimation of the (major part) of the eigenspectrum of the (sampled) covariance matrix.

Assume we have several trials of neural activity recordings \(\mathbf{F}^{r}\in \mathbb{R}^{N_c\times N_s}, r=1,2,\cdots ,p\). \(\mathbf{F}^{r}_{i,j} = f(c_i,s_j) + \nu_{r}(c_{i},s_{j})\) is the activity of the \(i\)-th neuron/ROI under the \(j\)-th stimulus (e.g., position) in the \(r\)-th trial. The 'signal' \(f(c_i,s_j)\) is the expected response. The 'noise' \(\nu_{r}(c_{i},s_{j})\) is the residual and is assumed to be i.i.d. across trials. It has zero expectation: \(\mathbb{E}_{\nu_{r}}[\nu_{r}(c,s)|c,s]=0\).

Theorem 1 and 2 in [SI, Stringer et al., 2019^[2]] tell us that how the cvPCA method is implemented. Note that the requirement in theorem 2 that one source of noise has dimensions orthogonal to the signal dimensions cannot be satisfied in some cases, e.g., the sample correlation matrix is full rank. However, we often have \(N_c > N_{s}\), so the sample correlation matrix is not full rank.

Let's now describe how cvPCA is implemented. Suppose we have a training recording \(\mathbf{F}_{train}\). We also have another test/cross-validating recording \(\mathbf{F}_{test}\).

First, we can do PCA on the training recording \(\mathbf{F}_{train}\): \[ \mathbf{F}_{train} = PC_{train} S_{train}, \]

where \(PC_{train}\in \mathbb{R}^{N_c\times N_{s}}, S_{train}\in \mathbb{R}^{N_{s}\times N_{s}}\). So the train scores are \[ S_{train} = PC_{train}^{\mathsf{T}}\mathbf{F}_{train}. \]

Next, we project the test recording \(F_{test}\) onto the principal components of the training recording to get the cross-validated test scores: \[ S_{test} = PC_{train}^{\mathsf{T}}\mathbf{F}_{test}. \]

The estimated \(n\)-th eigenvalue of the covariance matrix is \[ \lambda_{n} = PC^{\mathsf{T}}_{train,n}\mathbf{F}_{train}\mathbf{F}_{test}^{\mathsf{T}} PC_{train,n}^{\mathsf{T}}, n =1,2,\cdots ,N_{s}. \]

Canonical Correlation Analysis (CCA)

The goal of CCA is to relate two sets of data, \(\mathbf{X}_{n \times p}\) and \(\mathbf{Y}_{n \times q}\), with \(p\) and \(q\) variables in their respective columns and \(n\) observations in the row.

CVA (Friston et al. 1996, Strother et al. 2002), linear discriminant analysis (LDA), and multivariate analysis of variance are special case of CCA.

We want to create pairs of new variable that are linear combinations of the original variables in \(\mathbf{X}(\mathbf{XU}(i)_{n \times 1})\) and the variables in \(\mathbf{Y}(\mathbf{YV}(i)_{n \times 1})\) and that have the maximum correlation with each other. In addition, pairs of canonical variates are mutually orthogonal with all the other pairs. Factorize the adjusted correlation matrix \[ (\mathbf{X}^{\mathsf{T}}\mathbf{X})^{-1/2} \mathbf{X}^{\mathsf{T}}\mathbf{Y}(\mathbf{Y}^{\mathsf{T}}\mathbf{Y})^{-1/2}=\mathbf{USV}^{\mathsf{T}}. \]

Reference

McIntosh, Anthony R., & Bratislav Mišić. Multivariate Statistical Analyses for Neuroimaging Data. Annual Review of Psychology 64, 1 (2013): 499–525. https://doi.org/10.1146/annurev-psych-113011-143804. ↩︎
Stringer, Carsen, Marius Pachitariu, Nicholas Steinmetz, Matteo Carandini & Kenneth D. Harris. High-Dimensional Geometry of Population Responses in Visual Cortex. Nature 571, 7765 (2019): 361–65. https://doi.org/10.1038/s41586-019-1346-5. ↩︎

Statistics

#Math

几个数理统计相关的问题

http://example.com/2023/01/19/几个数理统计相关的问题/

Author

John Doe

Posted on

January 19, 2023

Licensed under

The Circular Law (1) Previous

庞加莱定理及其应用 Next