Figueira-Domínguez, J. GuzmánRemeseiro, BeatrizBolón-Canedo, Verónica2026-02-042026-02-042026-01-05Figueira-Domínguez, J. G., B. Remeseiro, and V. Bolón-Canedo. 2026. “ Optimising Image Feature Extraction and Selection: A Comprehensive Review With Spark Case Studies.” Expert Systems 43, no. 2: e70188. https://doi.org/10.1111/exsy.701881468-0394https://hdl.handle.net/2183/47231The data that support the findings of this study are openly available inImagenet Features Extracted with VGG-19 at https://zenodo.org/records/12791398[Abstract]: As benchmark image datasets expand in sample size and feature complexity, the challenge of managing increased dimensionality becomes apparent. Contrary to the expectation that more features equate to enhanced information and improved outcomes, the curse of dimensionality often hampers performance. This paper reviews existing literature on filter feature selection techniques applied to image features, highlighting their use in both classical and deep-learning-based feature extraction methods. Building on these findings, this study proposes a scalable approach for image feature extraction and selection using Big Data technologies, specifically Apache Spark, to efficiently process large and high-dimensional datasets. The proposed framework integrates filter-based feature selection methods within a distributed environment to evaluate their effectiveness in image analysis tasks. Several experiments were performed to compare the results using feature selection techniques with various reduction percentages. Results show that significant feature reduction can be achieved without compromising classification accuracy, demonstrating the potential of Spark-based distributed processing for large-scale image analytics.engAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/BenchmarkingData accuracyDeep learningDimensionality reductionFeature SelectionImage analysisImage enhancementLarge datasetsReductionOptimising Image Feature Extraction and Selection: A Comprehensive Review With Spark Case Studiesjournal articleopen access10.1111/exsy.70188