guardIAn: chatbot baseado en LLM’s especializado en ciberseguridade

UDC.coleccionTraballos académicos
UDC.tipotrabTFM
UDC.titulacionMáster Universitario en Enxeñaría Informática
dc.contributor.advisorBao, Eliseo
dc.contributor.advisorPérez, Anxo
dc.contributor.authorToirán Freire, Juan
dc.contributor.otherUniversidade da Coruña. Facultade de Informática
dc.date.accessioned2025-07-18T13:01:13Z
dc.date.available2025-07-18T13:01:13Z
dc.date.issued2025-06
dc.description.abstract[Resumo]: O espectacular crecemento dos modelos de linguaxe de gran tamaño (LLMs) abre novas oportunidades para automatizar tarefas críticas na defensa fronte ás ameazas dixitais. Con todo, moitos destes modelos operan como “caixas negras” e carecen de coñecemento actualizado sobre técnicas de ataque e boas prácticas de hardening. Neste traballo preséntase guardIAn, un chatbot open-source creado para servir de asistente especializado en ciberseguridade a equipos blue e red team, PEMEs e comunidade investigadora. O núcleo da ferramenta é un LLM LLaMA 3.1-8B afinado con Low-Rank Adaptation (LoRA) sobre un corpus curado de foros técnicos (Gentoo, Debian…) e reforzado cunha pipeline de Retrieval-Augmented Generation (RAG) que combina busca semántica en ElasticSearch con xeración contextual vía LangChain. O sistema xestiona rexistros (logs), fragmentos de código ou políticas de seguridade, recupera evidencias relevantes e ofrece recomendacións (análise de vulnerabilidades, pasos de mitigación ou scripts de explotación controlada). A interface, desenvolta con React e Django, permite diálogos multi-idioma, historial persistente e anexos para respostas a incidentes. A validación con expertos demostrou unha redución no tempo de diagnose. O proxecto libera todo o código; spiders, web, scripts de limpeza de datos, script de adestramento, é completamente open source con fin de facilitar a reproducibilidade e fomentar a colaboración futura.
dc.description.abstract[Abstract]: The spectacular growth of large language models (LLMs) is opening new opportunities to automate critical tasks in the defense against digital threats. Yet many existing models behave as “black boxes” and lack up-to-date knowledge of attack techniques and hardening best practices. This paper introduces guardIAn, an open-source chatbot conceived to serve as a specialized cybersecurity assistant for blue and red teams, SMEs, and the research community. At its core lies an 8-billion-parameter LLaMA 3.1 model fine-tuned via Low-Rank Adaptation (LoRA) on a curated corpus drawn from technical forums (Gentoo, Debian…) and fortified with a Retrieval-Augmented Generation (RAG) pipeline that blends semantic search in ElasticSearch with contextual generation through LangChain. The system ingests logs, code snippets, or security policies, retrieves relevant evidence, and delivers actionable guidance ranging from vulnerability analyses and mitigation steps to controlled exploitation scripts. A React + Django interface supports multilingual dialogue, persistent conversation history, and file attachments for incident response workflows. Expert validation showed a measurable reduction in diagnosis time. All assets crawlers, web components, data cleaning scripts, and training routines are released under an open source license to ensure reproducibility and foster future collaboration.
dc.description.traballosTraballo fin de mestrado (UDC.FIC). Enxeñaría Informática. Curso 2024/2025
dc.identifier.urihttps://hdl.handle.net/2183/45529
dc.language.isoglg
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectChatbot
dc.subjectModelo grande de linguaxe
dc.subjectCiberseguridade
dc.subjectRecuperación Aumentada por Xeración
dc.subjectSoftware libre
dc.subjectEndurecemento de sistemas
dc.subjectRed/blue Team
dc.subjectLangChain
dc.subjectLow-Rank Adaptation
dc.subjectEmbeddings e índices densos
dc.subjectLarge language model
dc.subjectCybersecurity
dc.subjectRetrieval-Augmented Generation
dc.subjectOpen-source software
dc.subjectHardening
dc.subjectEmbeddings and dense indexes
dc.titleguardIAn: chatbot baseado en LLM’s especializado en ciberseguridade
dc.typemaster thesis
dspace.entity.typePublication
relation.isAdvisorOfPublication99ed6581-6dee-442a-9b37-c35da63bef8a
relation.isAdvisorOfPublicationc673c8b1-1afc-48f6-85e9-8f29f9cffb91
relation.isAdvisorOfPublication.latestForDiscovery99ed6581-6dee-442a-9b37-c35da63bef8a

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ToiranFreire_Juan_TFM_2025.pdf
Size:
4.65 MB
Format:
Adobe Portable Document Format