guardIAn: chatbot baseado en LLM’s especializado en ciberseguridade
| UDC.coleccion | Traballos académicos | |
| UDC.tipotrab | TFM | |
| UDC.titulacion | Máster Universitario en Enxeñaría Informática | |
| dc.contributor.advisor | Bao, Eliseo | |
| dc.contributor.advisor | Pérez, Anxo | |
| dc.contributor.author | Toirán Freire, Juan | |
| dc.contributor.other | Universidade da Coruña. Facultade de Informática | |
| dc.date.accessioned | 2025-07-18T13:01:13Z | |
| dc.date.available | 2025-07-18T13:01:13Z | |
| dc.date.issued | 2025-06 | |
| dc.description.abstract | [Resumo]: O espectacular crecemento dos modelos de linguaxe de gran tamaño (LLMs) abre novas oportunidades para automatizar tarefas críticas na defensa fronte ás ameazas dixitais. Con todo, moitos destes modelos operan como “caixas negras” e carecen de coñecemento actualizado sobre técnicas de ataque e boas prácticas de hardening. Neste traballo preséntase guardIAn, un chatbot open-source creado para servir de asistente especializado en ciberseguridade a equipos blue e red team, PEMEs e comunidade investigadora. O núcleo da ferramenta é un LLM LLaMA 3.1-8B afinado con Low-Rank Adaptation (LoRA) sobre un corpus curado de foros técnicos (Gentoo, Debian…) e reforzado cunha pipeline de Retrieval-Augmented Generation (RAG) que combina busca semántica en ElasticSearch con xeración contextual vía LangChain. O sistema xestiona rexistros (logs), fragmentos de código ou políticas de seguridade, recupera evidencias relevantes e ofrece recomendacións (análise de vulnerabilidades, pasos de mitigación ou scripts de explotación controlada). A interface, desenvolta con React e Django, permite diálogos multi-idioma, historial persistente e anexos para respostas a incidentes. A validación con expertos demostrou unha redución no tempo de diagnose. O proxecto libera todo o código; spiders, web, scripts de limpeza de datos, script de adestramento, é completamente open source con fin de facilitar a reproducibilidade e fomentar a colaboración futura. | |
| dc.description.abstract | [Abstract]: The spectacular growth of large language models (LLMs) is opening new opportunities to automate critical tasks in the defense against digital threats. Yet many existing models behave as “black boxes” and lack up-to-date knowledge of attack techniques and hardening best practices. This paper introduces guardIAn, an open-source chatbot conceived to serve as a specialized cybersecurity assistant for blue and red teams, SMEs, and the research community. At its core lies an 8-billion-parameter LLaMA 3.1 model fine-tuned via Low-Rank Adaptation (LoRA) on a curated corpus drawn from technical forums (Gentoo, Debian…) and fortified with a Retrieval-Augmented Generation (RAG) pipeline that blends semantic search in ElasticSearch with contextual generation through LangChain. The system ingests logs, code snippets, or security policies, retrieves relevant evidence, and delivers actionable guidance ranging from vulnerability analyses and mitigation steps to controlled exploitation scripts. A React + Django interface supports multilingual dialogue, persistent conversation history, and file attachments for incident response workflows. Expert validation showed a measurable reduction in diagnosis time. All assets crawlers, web components, data cleaning scripts, and training routines are released under an open source license to ensure reproducibility and foster future collaboration. | |
| dc.description.traballos | Traballo fin de mestrado (UDC.FIC). Enxeñaría Informática. Curso 2024/2025 | |
| dc.identifier.uri | https://hdl.handle.net/2183/45529 | |
| dc.language.iso | glg | |
| dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.subject | Chatbot | |
| dc.subject | Modelo grande de linguaxe | |
| dc.subject | Ciberseguridade | |
| dc.subject | Recuperación Aumentada por Xeración | |
| dc.subject | Software libre | |
| dc.subject | Endurecemento de sistemas | |
| dc.subject | Red/blue Team | |
| dc.subject | LangChain | |
| dc.subject | Low-Rank Adaptation | |
| dc.subject | Embeddings e índices densos | |
| dc.subject | Large language model | |
| dc.subject | Cybersecurity | |
| dc.subject | Retrieval-Augmented Generation | |
| dc.subject | Open-source software | |
| dc.subject | Hardening | |
| dc.subject | Embeddings and dense indexes | |
| dc.title | guardIAn: chatbot baseado en LLM’s especializado en ciberseguridade | |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| relation.isAdvisorOfPublication | 99ed6581-6dee-442a-9b37-c35da63bef8a | |
| relation.isAdvisorOfPublication | c673c8b1-1afc-48f6-85e9-8f29f9cffb91 | |
| relation.isAdvisorOfPublication.latestForDiscovery | 99ed6581-6dee-442a-9b37-c35da63bef8a |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- ToiranFreire_Juan_TFM_2025.pdf
- Size:
- 4.65 MB
- Format:
- Adobe Portable Document Format

