At the latest New Relic FutureTalk, Dr. Tom Dietterich spoke about his research on anomaly detection, and he offered a number of opinions and results of interest for those of us in the Domain Specific Modeling Language (DSML) community.
Dr. Dietterich noted the relative lack of investment into anomaly detection by the machine learning research community despite the critical role anomaly detectors play in systems. One particular problem he highlighted is the absence of publicly available training data and lack of benchmarks of different algorithms in different situations.
Benchmarking anomaly detection algorithms
Dr. Dietterich’s benchmarking technique is described in part in Systematic Construction of Anomaly Detection Benchmarks from Real Data, which starts with 19 data sets matching criteria for anomaly detection from the UCI Machine Learning Repository and generates 25,685 synthetic data sets with well-defined variations in problem dimensions, including relative frequency, point difficulty, irrelevant features, and clusteredness.
Having generated the test set, Dr. Dietterich selected eight unsupervised anomaly detection algorithms and benchmarked each across the test set: robust kernel density estimation (rkde), ensemble of gaussian mixture models (egmm), one-class svm (ocsvm), support vector data description (svdd), local outlier factor (lof), kNN angle-based outlier detector (abod), Isolation Forest, and lightweight on-line detector of anomalies (loda).
Evaluating the performance of all eight algorithms against the synthetic test data allowed him to determine significant features in anomaly detection accuracy. The most significant feature was the underlying source dataset, suggesting some data domains are significantly harder than others. The second most significant feature was the relative frequency of anomalies, with rare anomalies being much easier to detect.
The algorithm choice ranked third in significance with Isolation Forest proving superior in both robustness and handling of tight data clusters.
The two SVD-based algorithms were clear losers, underperforming in accuracy as well as having higher computation cost compared to other models. The remaining five algorithms placed similarly in the middle. Dr. Dietterich highlighted both Isolation Forest and loda as having efficient online algorithms for scalable real-world implementations.
In addition to benchmarking performance, Dr. Dietterich provided some interesting benchmark results for generating automated human-readable explanations for detected anomalies. Modeling the problem with a goal of convincing a simulated analyst that a true positive candidate anomaly is real, the explanation algorithm is allowed to expose one feature at a time until the analyst is convinced. The research compared three techniques—an oracle that exposes the minimum possible number of features, a random feature selector, and sequential feature explanation. Dr. Dietterich found that sequential feature explanation generally performed very well, performing somewhere between the oracle and twice features of the oracle, and significantly outperforming the random feature selector in all cases.
Semi-supervised anomaly detection
Dr. Dietterich then extended the model to learn from supervised feedback where analysts reviewed cases highlighted by the anomaly detector, then gave feedback on whether the anomaly was real or not. This experiment extended the loda anomaly detector using the accuracy-at-the-top algorithm to reweigh the projections. The benchmark included a baseline with no learning, random ordering of cases, the technique under test (aad), AI2, and semi-supervised anomaly detector “margin and cluster” variant (SSAD-MC). Dr. Dietterich’s benchmark results found aad worked very well, comparably or considerably better than other methods and often doubling the capacity of the system for generating true alarms.
Practical applications of anomaly detection
As part of the TAHMO project, an interconnected network of approximately 20,000 weather sensors is being deployed across sub-Saharan Africa with automated data quality control. As lead of the SENSOR-DX team, Dr. Dietterich’s architecture pairs an anomaly detector with a probabilistic graphical model. The anomaly detector evaluates every data stream and generates a state variable. The graphical network then explains those anomalous results identifying the minimal set of failing components by conditioning on the state of the parents.
Dr. Dietterich also discussed Open Category Classification where anomaly detectors work well when paired with a classifier to enable recognition of unknown categories. In this architecture, the anomaly detector short-circuits the classifier and triggers a new class exception. Only nominal data is passed to the classifier. For example, a machine learning classifier in a self-driving car, when paired with an anomaly detector, could decide it did not have enough information to make a decision in certain cases and return to human control. A self-driving car that had never seen a kangaroo would rightly understand it could not predict the kangaroo’s behavior, whereas driving on the left side of the road in England was different than the training scenarios but not sufficiently different to be anomalous. In this case, anomaly detectors reduce the scope of responsibility and allow further specialization of the classifier, while increasing the overall safety of the system.
Don’t miss our next FutureTalks event
For more information about our FutureTalks series, make sure to join our Meetup group, New Relic FutureTalks PDX, and follow us on Twitter @newrelic for the latest developments and updates on upcoming events.
Note: Event dates, participants, and topics are subject to change without notice.