Buscar

Mostrando ítems 1-8 de 8

Interpretable market segmentation on high dimension data

Eiras-Franco, Carlos; Guijarro-Berdiñas, Bertha; Alonso-Betanzos, Amparo; Bahamonde, Antonio (M D P I AG, 2018-09-17)

[Abstract] Obtaining relevant information from the vast amount of data generated by interactions in a market or, in general, from a dyadic dataset, is a broad problem of great interest both for industry and academia. Also, ...

Regression Tree Based Explanation for Anomaly Detection Algorithm

López-Riobóo Botana, Iñigo Luis; Eiras-Franco, Carlos; Alonso-Betanzos, Amparo (MDPI AG, 2020-08-18)

[Abstract] This work presents EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), a novel approach to address explanation using an anomaly detection algorithm, ADMNC, which provides accurate ...

Scalable Feature Selection Using ReliefF Aided by Locality-Sensitive Hashing

Eiras-Franco, Carlos; Guijarro-Berdiñas, Bertha; Alonso-Betanzos, Amparo; Bahamonde, Antonio (Wiley, 2021)

[Abstract] Feature selection algorithms, such as ReliefF, are very important for processing high-dimensionality data sets. However, widespread use of popular and effective such algorithms is limited by their computational ...

On the scalability of feature selection methods on high-dimensional data

Bolón-Canedo, Verónica; Rego-Fernández, Diego; Peteiro Barral, Diego; Alonso-Betanzos, Amparo; Guijarro-Berdiñas, Bertha; Sánchez-Maroño, Noelia (Springer, 2018)

[Abstract]: Lately, derived from the explosion of high dimensionality, researchers in machine learning became interested not only in accuracy, but also in scalability. Although scalability of learning methods is a trending ...

Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing

Eiras-Franco, Carlos; Martínez Rego, David; Kanthan, Leslie; Piñeiro, César; Bahamonde, Antonio; Guijarro-Berdiñas, Bertha; Alonso-Betanzos, Amparo (Association for Computing Machinery, 2020)

[Abstract]: The k-nearest-neighbors (kNN) graph is a popular and powerful data structure that is used in various areas of Data Science, but the high computational cost of obtaining it hinders its use on large datasets. ...

Large scale anomaly detection in mixed numerical and categorical input spaces

Eiras-Franco, Carlos; Martínez Rego, David; Guijarro-Berdiñas, Bertha; Alonso-Betanzos, Amparo; Bahamonde, Antonio (Elsevier, 2019)

[Abstract]: This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is ...

Fast anomaly detection with locality-sensitive hashing and hyperparameter autotuning

Meira, Jorge; Eiras-Franco, Carlos; Bolón-Canedo, Verónica; Marreiros, Goreti; Alonso-Betanzos, Amparo (Elsevier, 2022-08)

[Abstract]: This paper presents LSHAD, an anomaly detection (AD) method based on Locality Sensitive Hashing (LSH), capable of dealing with large-scale datasets. The resulting algorithm is highly parallelizable and its ...

Distributed correlation-based feature selection in spark

Palma Mendoza, Raúl José; Marcos, Luis de; Rodríguez, Daniel; Alonso-Betanzos, Amparo (Elsevier, 2019-09)

[Abstract]: Feature selection (FS) is a key preprocessing step in data mining. CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We ...