Why do we need supervised event detection in multivariate time series?

protonAutoML
5 min readJun 25, 2021

--

Supervised event detection in multivariate time series is an important research topic in data mining and has a wide range of applications in the industry. Efficient and accurate event detection helps companies to monitor their key metrics continuously and alert for potential incidents on time. In this article, we will see why do we need supervised event detection in multivariate time series.

In real-world applications, many data sets must be analyzed with methods that are not always applicable to the time series structure of the data. For instance, when examining systems with an unpredictable arrival rate at a certain hour in the day, there is no way for analysis methods to predict what this value will be, and consequently, this hour might have a negative number of events (which is impossible). Such irregular behavior can differ in degrees as well. For instance, if out of 5 cases where people arrive late at work 2 use public transportation which has been randomly delayed then only 1 case could be predicted beforehand. In some scenarios, we may even know why the change happened but it does not prevent us from using statistical models designed specifically for regularly sampled and continuous data.

If the number of irregular cases is small, we may disregard them and hope their effects to be as close to 0 as possible. Another common practice is to throw away some of those values that are considered too extreme for the model (a spike). If it turns out that these spikes occurred too frequently on both sides then more drastic measures have to be taken: either transform the series into a different time scale so that all values are between 0 and 1 (or some other range) or drop those outliers fully from the analysis. However, even if you perform one or several of these changes in order to fit your data into models designed exclusively for regularly sampled data, there is always a chance that they will fail because certain issues which arise with irregularly sampled data are not represented in those models and might not be predictable.

When analyzing irregularly sampled data, the first thing to do is find out which kind of irregularity you are dealing with. Of course, this depends on the nature of your problem but most likely it will involve at least one of the following characteristics:

Abrupt changes

Wind a series of small gradual changes leading to a very sudden change (i.e., what we see when plotting stock market prices). An event that leads to such an abrupt change does not necessarily have to be related in any way to previous changes. In many cases, these events can completely change the course and character of time series as they enter into a new regime or boundary condition. For instance, short-term flu can cause a drastic drop in productivity in a factory where the workers are used to having uncommonly high performance and showing no signs of weakness. Some studies have shown that changes like these occur more frequently than one would expect (and they might be relevant for modeling purposes). But even those abrupt changes which are not frequent enough for statistical analysis can be used as an additional source of information with respect to understanding specific patterns or trends.

Aperiodic events

A series does not consist entirely of periodic occurrences but certain values were recorded only once or almost never occur (i.e., what we see when plotting temperature of cities over long periods of time). Many real-world data sets fall into this because for many specific phenomena (e.g., earthquakes) record keeping is done by humans and oftentimes there are breaks in the information flow (in time). At first sight, this category might not seem to differ much from that of a regular series: most statistical analysis methods do not handle it any better or worse than that of an irregularly sampled data set. However, we can still use these events as additional clues for understanding the pattern/trend they are part of even if the model fits do not represent them correctly with some bias.

Bounded changes

A bounded change is one where within a certain range, values tend to oscillate around a mean value, and outside that range, everything else may happen including jumps up or down to values far away from the range. Although a bounded glitch will not affect a model somewhat (i.e., it can happen in an otherwise well-fitting data set), it might be interesting to see how different predictability of this kind of change is dependent on the parameters of your system and therefore they should still be taken into account.

Self-similarity

A series follows certain rules which cause some parts to resemble other parts (i.e., what you see when plotting coastlines). These rules are usually very specific for each type of structure we observe and do not necessarily have any direct relation with real-world physical laws that govern the system. It is common for these models to contain many more parameters than those necessary just to fit the time series at hand. This makes them a little harder to deal with especially for those models that are normally fitted which rely on the identification of order parameters. However, it might be worth looking into how many parameters and which specific ones are good predictors of outcomes even though they were not used in the process of identifying the model itself.

In addition, an irregularly sampled data set can have properties from multiple categories (e.g., it might contain some abrupt changes and some self-similarity): as a rule, these datasets will require more complex modeling methods because all these issues should be dealt with together rather than separating them by category as we did above. On average larger sample sizes will help us to deal with most of the mentioned issues and therefore it is always better to look for the longest time series possible (even if the phenomena it describes are not so important).

Supervised event detection algorithms are helpful for several reasons. First, they can be used to detect missing or incorrect data values. Second, they provide an indication of when unusual activity may have occurred relative to a prior baseline period. Third, their output could serve as the basis for subsequent statistical modeling efforts in which predictors are developed that can be applied during predictive modeling steps such as linear regression and classification trees. Supervised event detection (SED) analyzes multivariate time series of events such as stock markets and searches for specific patterns of activity. These patterns are chosen beforehand by an analyst, via the creation of a template. These templates identify what general pattern to look for in the data, and hence SED programs must be trained on past examples of these events before they can detect them in new data.

Originally published at https://protonautoml.com on June 25, 2021.

--

--

protonAutoML
protonAutoML

Written by protonAutoML

protonautoml.com mission is to make Internet companies AI driven. We are both consulting firm and automl software provider

No responses yet