Tag Archives: clustering

Time Series Classification using a Frequency Domain EM Algorithm

Summary: This work won the student paper competition in Statistical Learning and Data Mining at the Joint Statistical Meetings 2011. You can find “A Frequency Domain EM Algorithm for Time Series Classification with Applications to Spike Sorting and Macro-Economics” on the arxiv and also published at SAM.

Let’s say you have n time series and you want to classify them into groups of similar dynamic structure. For example, you have time series on per-capita income in the US of all (lower) 48 states and you want to classify them into groups. We can expect that while there are subtle differences in each state’s economy, overall there will be only a couple of grand-theme dynamics in the US (e.g., east coast and mid-west probably have different economic dynamics). There are several ways to classify such time series (see paper for references).

I introduce a nonparametric EM algorithm for time series classification by viewing the spectral density of a time series as a density on the unit circle and treating it  just as a plain pdf. And what do we do to classify data in statistics/machine learning?: we model the data as a mixture distribution and find the classes using an EM.  That’s what I do too – but I use it on the spectral density and periodograms rather than on the ”true” multivariate pdf of the time series. Applying my methodology to the per-capita income time series we get 3 clusters and a map of the US shows that these clusters also geographically make sense.

frequency_em

 

May the ForeC be with you: R package ForeCA v0.2.0

I just submitted a new, majorly improved ForeCA R package to CRAN.  Motivated by a bug report on whiten() I went ahead and rewrote and tested lots of the main functions in the package; now ForeCA is as shiny as never before.

For R users there isn’t a lot that will change (changelog): just use it as usual as foreca(X), where X is your multivariate (approximately) stationary time series (as a matrix, data.frame, or ts object in R).

library(ForeCA)

ret <- ts(diff(log(EuStockMarkets)) * 100) 
mod <- foreca(ret, spectrum.control = list(method = "wosa"))
mod
summary(mod)
plot(mod)

I will add a vignette in upcoming versions.

Oops I did it again: winner of the JSM 2011 student paper competition

My paper “A Frequency Domain EM Algorithm to Detect Similar Dynamics in Time Series with Applications to Spike Sorting and Macro-Economics” was selected as (one out of three) major winners in the JSM 2011 student paper competition on Statistical Learning and Data Mining. Arxiv: 1103.3300.

This is the second time after my 2007 JSM award on the time varying long memory paper.