PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
From: Stanford University; Jure Leskovec, citation 6w+;
Problem:
subsequence clustering.
Challenging:
discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.
Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.
Introduction:
long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.
Key steps: simultaneously segment and cluster the time series.
Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.
how to discover interpretable structure in the data?
Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics
distance-based metrics, DTW.
距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。
Propose a new method for multivariate time series clustering TICC:
- define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
- each cluster is a markov random field.
- In thes MRFs, an edge represents a partial correlation between two variables.
- learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
- This network has multiple layers.
- the number of layers corresponds to the window size of a short subsequence.
- 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.
Related work:
time series clustering and convex optimization;
variations of dtw; symbolic representations; rule-based motif discovery;
However, these methods generally rely on distance-based metrics.
TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.
- define each cluster by a Gaussian inverse covariance.
- so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
- K clusters/ inverse covariances.
selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.
看不懂哇 T T
Supplementary knowledge:
1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。
2. Aarhus data, Martin, 做多变量time series 预测。
3. Toeplitz Matrices: 常对角矩阵。
4. ticc code
Reference:
最新文章
- Android UI体验之全屏沉浸式透明状态栏效果
- Saddest's polar bear Pizza offered new YorkShire home
- 放松跑、间歇跑、节奏跑和LSD
- Altium Designer 15 --- Design PCB Frame by Rhinoceros
- c_水程序
- 【leetcode】Subsets II (middle) ☆
- 爱默生UPS并机系统:进入与退出操作方法
- js:setTimeout 与 setInterval 比较
- Android底部TabHost API
- PHP学习心得(六)——变量
- Uber明年在中国将继续补贴,并大举进军100个城市!
- 自定义JSTL函数标签(一)
- R + ggplot2 Graph Catalog(转)
- 实现quartz定时器及quartz定时器原理介绍(转)
- System包含的信息
- [LeetCode] Self Dividing Numbers 自整除数字
- python fabric的用法
- WebSphere,WebLogic,Tomcat,IIS
- 灰度图的直方图均衡化(Histogram Equalization)原理与 Python 实现
- AC日记——双栈排序 洛谷 P1155