Takens's Theorem
Takens's Theorem
Takens's theorem shows connection between space and time, which is an important result in dynamical system. It also relates to ergodic theory and have applications in many fields. The proof is much obscure since it's mainly based on ideas from differential topology. The original proof is by, of course, Takens[8]. Sauer, Yorke and Casdgali[6] generalized Takens's work. One big point is that manifolds are generalized to subset of \(\mathbb{R}^{k}\) with certain box-counting dimension. Here we do not give the proof of Takens's theorem. Both original and explanatory materials exist[2] [5] [6] [8]. We focus on applications of Takens's theorem, including reconstruction of attractor from time series data[4] and change-point detection[1].
box-counting dimension
The Minkowski-Bouigand dimension, also known as Minkowski dimension or box-counting dimension.
Denote \(N(\varepsilon)\) is the number of boxes of side length \(\varepsilon\) required to cover the set. Then the box-counting dimension is defined as \[ \dim_{\text{box}}(S):=\lim_{\varepsilon\to 0}\frac{\log N(\varepsilon)}{\log 1 / \varepsilon} \]
If \(S\) is a smooth space (a manifold) of integer dimension \(d\), then \(N(1 / n) \thickapprox Cn^{d}\), which corresponds with the normal definition of dimension.
When the above limit does not exist, upper and lower box-counting dimensions can be defined, and they are strongly related to the Hausdorff dimension.
some definitions
First we need to clarify what is an attractor.
Def 1 Let \(\phi(t,x)\) is a flow. A positively invariant set \(A\) from a flow \(\phi(t,x)\) is a set such that if \(\phi(t_0,x) \in A\) for some \(t_0\), then \(\phi(t,x)\in A\) for all \(t\geqslant t_0\).
Def 2 A stable set from a continuous dynamical system of flow \(\phi(t,x)\) is a set such that there exists a neighborhood \(B\) of \(S\) satisfying that if \(\phi(t_0,x)\in B\), then \(\phi(t,x)\in B\) for all \(t\geqslant t_0\).
Futhermore, if there exists a neighborhood \(B\) such that, for every neighborhood \(B' \subset B\), if \(\phi(t_0,x) \in B\), then there exists \(t_1\geqslant t_0\) such that \(\phi(t_0,x)\in B'\) for every \(t\geqslant t_1\), then \(S\) is also an asymptotically stable set.
Def 3 An attracting set of an ODE is a closed, positively invariant and asymptotically stable set. An attractor of an ODE is an attracting set which contains a dense orbit.
Def 4 let \(M\) be a manifold. The set of \(C^{r}\) functions from \(M\) to itself which are also diffeomorphisms (have \(C^{r}\) inverse) is called \(Diff^{r}(M)\).
Takens' embedding theorem
Theorem 1 Let \(M\) be a compact manifold of dimension \(m\). For pairs \((\phi, y)\), with \(\phi \in Diff^{2}(M)\), \(y \in C^{2}(M, \mathbb{R})\), it is a generic property that the map \(\Phi_{(\phi, y)}\colon M \rightarrow \mathbb{R}^{2m+1}\), defined by \[ \Phi_{(\phi, y)}(x) = (y(x), y(\phi(x)), \cdots , y(\phi^{2m}(x))) \]
is an embedding. Recall Whitney embedding theorem. This mean that \(\Phi_{(\phi,y)}(M)\) is diffeomorphic to \(M\) (in the sense of \(C^{2}\))
Here 'generic' means open and dense, and we use the \(C^{1}\) topology. We refer to the functions \(y\in C^{2}(M,\mathbb{R})\) as measurement functions.
Another version focusing on one particular \(\phi\) is that:
Theorem 2 Let \(M\) be as above. Let \(\phi \colon M\rightarrow M\) be a diffeomorphism, with the properties: - the periodic points of \(\phi\) with periods less than or equal to \(2m\) are finite in number; - if \(x\) is any periodic point with period \(k\leqslant 2m\) then the eigenvalues of the derivative of \(\phi^{k}\) at \(x\) are all distinct.
Then for generic \(y \in C^{2}(M, \mathbb{R})\), the map \(\Phi_{(\phi,y)}\colon M \rightarrow \mathbb{R}^{2m+1}\), defined as in Theorem 1, is an embedding.
Remark When theorem 1 gives stronger results, theorem 2 gives a sufficient condition for the diffeomorphism \(\phi\), this may be informative in applications. Though the expression and proof of Takens's theorem is all about existence, we usually have alternatives for \(\phi\) and \(y\) in many set ups: if \(M\) is the strange attractor of a dynamical system, then \(\phi\) is usually the flow with a certain time delay, and \(y\) is the observable or measurement. However, we do not know if such \(\phi\) and \(y\) are appropriate.
Applications of Takens's theorem
I saw two papers [4] [1] before I determined to write this post. A framework, randomly distributed embedding (RDE) was proposed to reconstruct state space and predict future states. Later, the same group developed another framework, temporal change-point detection (TCD), to detect change-points, where RDE served as the very first step. They are not direct application of Takens's theorem, but they are based on the idea of embedding theorem.
Reconstruction of state space and prediction
The first idea roots in the theory of generalized embedding, where one can reconstruct the original system by both delayed embedding and non-delayed embedding.
Suppose we have a time series \(\{\bm{x}(t)\}\in \mathbb{R}^{n\times \mathbb{R}}\). \(x_s(t)\) is one of the interested observables of \(\bm{x}\). Then - A delayed embedding: \(\mathcal{N}=\{x_s(t),x_s(t+\tau),\cdots ,x_s(t+(E-1)\tau)\}\). - A non-delayed embedding: \(\mathcal{M}=\{x_{k_1}(t),x_{k_2}(t),\cdots ,x_{k_E}(t)\}\).
Here, theoretically, the embedding dimension \(E>2d\) is required and \(d\) is the box-counting dimension of the system dynamics. The determination of \(E\) in practice, however, is not easy.
In priciple, there exists a diffeomorphism \(\bm{\Psi}\colon \mathcal{M}\rightarrow \mathcal{N}\). Specifically, \[ x_s(t+\tau)=\psi(x_{k_1}(t),x_{k_2}(t),\cdots ,x_{k_E}(t)) \]
How to reconstruct the state space and do one-step prediction? The process consists of several steps:
- Randomly select a tuple \(\bm{k}=(k_1,\cdots ,k_{E})\)
- fit a predictor \(\psi{\bm{k}}\) by minimizing \(\left\| x_s(t+\tau)-\psi(x_{k_1}(t),x_{k_2}(t),\cdots ,x_{k_E}(t)) \right\|_{}, t=t_1,\cdots,t_q\). The authors used Gaussian Process Regression (GPR) to fit the predictor.
- Make a one-step prediction as \(\hat{x}_s^{\bm{k}}(T+\tau)=\psi_{\bm{k}}(x_{k_1}(T),x_{k_2}(T),\cdots ,x_{k_E}(T))\).
- Repeat steps 1-3 by randomly picking up another \(r\) tuples. Now we have \(r\) predictions \(\{\hat{x}_s^{\bm{k}}(T+\tau)\}\).
- Statistical analysis of \(\{\hat{x}_s^{\bm{k}}(T+\tau)\}\). Exclude the outliers by calculating the quartiles of the predictions.
- Using kernel density estimation to estimate the probability density function of the predictions, \(p(\hat{x}_s(T+\tau))\).
- Set the final predictions by calculating the expectation of the above density function \(\mathbb{E}_{p}(\hat{x}_s(T+\tau))\).
Using false nearest neighbor criterion to determine embedding dimension
The purpose to choose a large enough \(E\) is to eliminate all self-crossings of the attractor. \(E>2d\) is obviously sufficient, but often we don't know the exact value of \(d\) and \(E\leqslant 2d\) is enough. The false nearest neighbor criterion (FNN) was proposed by Kennel, Brown and Abarbanel[3] to determine \(E\) in a model-free way.
Denote \(\bm{x}_s(t,E)=(x_s(t),x_s(t+\tau),\cdots ,x_s(t+(E-1)\tau))\). For a fix \(t\), let \(\bm{x}_s(t^{(r)},E)\) be the \(r\)th nearest neighbor of \(\bm{x}_s(t,E)\). The square of the Euclidian distance between \(\bm{x}_s(t,E)\) and \(\bm{x}_s(t^{(r)},E)\) is
\[ R_{E}^{2}(t,r)=\sum_{k=0}^{E-1} [x_s(t+k\tau)-x_s(t^{(r)}+k\tau)]^{2} \]
The idea is if \(\bm{x}_s(t^{(r)},E)\) is a false nearest neighbor, meaning it's a self-crossing due to projection, then \(R_{E+1}^{2}(t,r)-R_{E}^{2}(t,r)\) is large when \(E\) passed the critical value. It is defined as a false nearest neighbor if \[ \left( \frac{R_{E+1}^{2}(t,r)-R_{E}^{2}(t,r)}{R_{E}^{2}(t,r)} \right)^{\frac{1}{2}}=\frac{\lvert x_s(t+E\tau)-x_s(t^{(r)}+E\tau) \rvert }{R_{E}(t,r)} > R_{tol} \]
Another situation is the limited data set size. The authors used another criterion to avoid false nearest neighbors that are not close in the original attractor. \[ \frac{R_{E+1}(t)}{R_{A}}>A_{tol} \]
\(R_{A}\) is a measure of the size of the attractor \[ R_{A}^{2}=\frac{1}{N}\sum_{n=1}^{N} [x_s(t_n)-\bar{x}_s]^{2} \\ \bar{x}_s=\frac{1}{N}\sum_{n=1}^{N} x_s(t_n) \]
where \(t_n\) are all time points of the time series. \(E\) is chosen so that there is no false nearest neighbor.
Using kernel density estimation to estimate the probability density function
The underlying probability density function is estimated as \[ \hat{f}(x)=\sum_{i}^{} \alpha_i K(x-x_i) \]
where \(K\) is a kernel, typically a Gaussian, centered at the data points, \(x_i, i=1,\cdots ,n\), and \(\alpha_i\) are weighting coefficients, typically uniform \(\alpha=\frac{1}{n}\). A good discussion of kernel estimation techniques can be found[7].
Change-point detection
TBC.
Reference
- J.-W. Hou, H.-F. Ma, D. He, J. Sun, Q. Nie, and W. Lin, “Harvesting random embedding for high-frequency change-point detection in temporal complex systems,” National Science Review, vol. 9, no. 4, p. nwab228, May 2022, doi: 10.1093/nsr/nwab228. ↩︎
- J. P. Huke, “Embedding Nonlinear Dynamical Systems: A Guide to Takens’ Theorem.” Mar. 2006. [Online]. Available: https://api.semanticscholar.org/CorpusID:55183186 ↩︎
- M. B. Kennel, R. Brown, and H. D. I. Abarbanel, “Determining embedding dimension for phase-space reconstruction using a geometrical construction,” Phys. Rev. A, vol. 45, no. 6, pp. 3403–3411, Mar. 1992, doi: 10.1103/PhysRevA.45.3403. ↩︎
- H. Ma, S. Leng, K. Aihara, W. Lin, and L. Chen, “Randomly distributed embedding making short-term high-dimensional data predictable,” Proc. Natl. Acad. Sci. U.S.A., vol. 115, no. 43, Oct. 2018, doi: 10.1073/pnas.1802987115. ↩︎
- J. Penalva Vadell, “Takens’s theorem proof and applications.pdf.” 2018. ↩︎
- T. Sauer, J. A. Yorke, and M. Casdagli, “Embedology,” J Stat Phys, vol. 65, no. 3–4, pp. 579–616, Nov. 1991, doi: 10.1007/BF01053745. ↩︎
- D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, 1st ed. in Wiley Series in Probability and Statistics. Wiley, 1992. doi: 10.1002/9780470316849. ↩︎
- F. Takens, “Detecting strange attractors in turbulence,” in Dynamical Systems and Turbulence, Warwick 1980, vol. 898, D. Rand and L.-S. Young, Eds., in Lecture Notes in Mathematics, vol. 898. , Berlin, Heidelberg: Springer Berlin Heidelberg, 1981, pp. 366–381. doi: 10.1007/BFb0091924. ↩︎