Large scale anomaly detection in mixed numerical and categorical input spaces
Use this link to cite
http://hdl.handle.net/2183/35327
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 4.0 Internacional
Collections
- GI-LIDIA - Artigos [64]
Metadata
Show full item recordTitle
Large scale anomaly detection in mixed numerical and categorical input spacesAuthor(s)
Date
2019Citation
Eiras-Franco, C., Martínez-Rego, D., Guijarro-Berdiñas, B., Alonso-Betanzos, A., Bahamonde, A. (2019) ‘Large scale anomaly detection in mixed numerical and categorical input spaces’, Information Sciences, 487, pp. 115-127. doi:10.1016/j.ins.2019.03.013.
Abstract
[Abstract]: This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is adjusted to input data, allowing low likelihood values to be tracked as anomalies. The main contribution of this method is that, to cope with the variable nature of the variables, we factorize the joint probability measure into two parts, namely, the marginal density of the continuous variables and the conditional probability of the categorical variables given the continuous part of the feature vector. The result is a model trained through a maximum likelihood objective function optimized with stochastic gradient descent that yields an effective and scalable algorithm. Compared with other well-known anomaly detection algorithms over several datasets, ADMNC is observed to both offer top level accuracy in datasets that are out of reach for the most effective existing methods and to scale up well to processing very large datasets. This makes it a powerful tool for solving a problem growing in popularity that currently lacks suitable scalable algorithms.
Keywords
Anomaly detection
Outlier detection
Scalability
Big data
Mixed data
Synthetic dataset generator
Outlier detection
Scalability
Big data
Mixed data
Synthetic dataset generator
Description
© 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article Eiras-Franco, C., Martínez-Rego, D., Guijarro-Berdiñas, B., Alonso-Betanzos, A., Bahamonde, A. (2019) ‘Large scale anomaly detection in mixed numerical and categorical input spaces’ has been accepted for publication in: Information Sciences, 487, pp. 115-127. The Version of Record is available online at https://doi.org/10.1016/j.ins.2019.03.013.
Editor version
Rights
Atribución-NoComercial-SinDerivadas 4.0 Internacional
ISSN
0020-0255