Large scale anomaly detection in mixed numerical and categorical input spaces

Use este enlace para citar
http://hdl.handle.net/2183/35327
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 4.0 Internacional
Colecciones
- Investigación (FIC) [1632]
Metadatos
Mostrar el registro completo del ítemTítulo
Large scale anomaly detection in mixed numerical and categorical input spacesAutor(es)
Fecha
2019Cita bibliográfica
Eiras-Franco, C., Martínez-Rego, D., Guijarro-Berdiñas, B., Alonso-Betanzos, A., Bahamonde, A. (2019) ‘Large scale anomaly detection in mixed numerical and categorical input spaces’, Information Sciences, 487, pp. 115-127. doi:10.1016/j.ins.2019.03.013.
Resumen
[Abstract]: This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is adjusted to input data, allowing low likelihood values to be tracked as anomalies. The main contribution of this method is that, to cope with the variable nature of the variables, we factorize the joint probability measure into two parts, namely, the marginal density of the continuous variables and the conditional probability of the categorical variables given the continuous part of the feature vector. The result is a model trained through a maximum likelihood objective function optimized with stochastic gradient descent that yields an effective and scalable algorithm. Compared with other well-known anomaly detection algorithms over several datasets, ADMNC is observed to both offer top level accuracy in datasets that are out of reach for the most effective existing methods and to scale up well to processing very large datasets. This makes it a powerful tool for solving a problem growing in popularity that currently lacks suitable scalable algorithms.
Palabras clave
Anomaly detection
Outlier detection
Scalability
Big data
Mixed data
Synthetic dataset generator
Outlier detection
Scalability
Big data
Mixed data
Synthetic dataset generator
Descripción
© 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article Eiras-Franco, C., Martínez-Rego, D., Guijarro-Berdiñas, B., Alonso-Betanzos, A., Bahamonde, A. (2019) ‘Large scale anomaly detection in mixed numerical and categorical input spaces’ has been accepted for publication in: Information Sciences, 487, pp. 115-127. The Version of Record is available online at https://doi.org/10.1016/j.ins.2019.03.013.
Versión del editor
Derechos
Atribución-NoComercial-SinDerivadas 4.0 Internacional
ISSN
0020-0255