Use this link to cite:
http://hdl.handle.net/2183/33361 Despregue e configuración automatizada dun clúster de computación de altas prestacións
Loading...
Identifiers
Publication date
Authors
Rega Domato, Sergio
Advisors
Other responsabilities
Universidade da Coruña. Facultade de Informática
Journal Title
Bibliographic citation
Type of academic work
Academic degree
Abstract
[Resumo]: Este proxecto propón o despregue e configuración dende cero dun clúster de servidores
baseado nunha distribución GNU/Linux para Computación de Alto Rendemento ou High Performance
Computing (HPC). Un dos obxectivos fundamentais que se persegue é que o proceso
de implantación e configuración dos servidores ou nodos do clúster póidase facer de forma centralizada
e automatizada, tratando de minimizar na medida do posible os procesos manuais de
configuración que son máis propensos a erros humanos. A implantación básica do clúster HPC
consistirá nun nodo frontal ou nodo cabeceira que actúa como único punto de acceso dende o
exterior, un nodo que actúa como servidor Network File System (NFS) para habilitar o almacenamento
compartido entre todos os nodos do clúster e, finalmente, un conxunto de nodos
de cómputo para a execución das aplicacións dos usuarios. O acceso ós recursos destes nodos
de cómputo realizarase exclusivamente a través da instalación e configuración dun software
xestor/planificador típico dos sistemas HPC, como Slurm, PBS ou Torque. Debido ás necesidades
de hardware requiridas por un clúster HPC como o descrito, o desenvolvemento deste
proxecto realízase a través de Virtual Machines (VMs) que permitan emular o sistema completo,
pero a solución desenvolta será igualmente válida para implantarse nun clúster sobre
máquinas físicas.
A automatización da implantación realízase mediante o uso de ferramentas que seguen o
paradigma Infrastructure as Code (IaC), como Ansible. Este paradigma permite xestionar sistemas
e infraestruturas a través do código fonte, automatizando así o aprovisionamento e xestión
de recursos informáticos como servidores, redes, bases de datos, almacenamento e outros compoñentes
dunha infraestrutura computacional.
O código fonte resultado deste proxecto encóntrase dispoñible públicamente no seguinte
repositorio de GitHub: https://github.com/srdomato/TFG_SRD_HPC.
[Abstract]: This project proposes the deployment and configuration from scratch of a cluster of servers based on a GNU/Linux distribution intended for High Performance Computing (HPC). One of the fundamental objectives is to be able to carry out the deployment and configuration process of the cluster nodes in a centralized and automated way, trying to minimize as much as possible any manual configurations that are more prone to human errors. The basic deployment of the HPC cluster will consist of a front node or head node that will act as the unique access point from the outside, a node that will act as a Network File System (NFS) server to enable shared storage between all cluster nodes, and finally, a set of compute nodes for running the applications from users. The access to the resources of these compute nodes will be exclusively through the installation and configuration of a queuing/scheduler software, which is very common in HPC systems, such as Slurm, PBS, or Torque. Due to the hardware requirements that an HPC cluster like the one previously described requires, this project will be carried out by relying on Virtual Machines (VMs) that allow emulating such a system, but the solution developed will be also valid for deploying a cluster on physical machines. The deployment automation will be carried out using tools that follow the Infrastructure as Code (IaC), such as Ansible. This paradigm allows managing systems and infrastructure through source code, thus automating the provisioning and management of computing resources such as servers, networks, databases, storage, and other components of a computing infrastructure. The source code resulting from this project is publicly available in the following GitHub repository: https://github.com/srdomato/TFG_SRD_HPC.
[Abstract]: This project proposes the deployment and configuration from scratch of a cluster of servers based on a GNU/Linux distribution intended for High Performance Computing (HPC). One of the fundamental objectives is to be able to carry out the deployment and configuration process of the cluster nodes in a centralized and automated way, trying to minimize as much as possible any manual configurations that are more prone to human errors. The basic deployment of the HPC cluster will consist of a front node or head node that will act as the unique access point from the outside, a node that will act as a Network File System (NFS) server to enable shared storage between all cluster nodes, and finally, a set of compute nodes for running the applications from users. The access to the resources of these compute nodes will be exclusively through the installation and configuration of a queuing/scheduler software, which is very common in HPC systems, such as Slurm, PBS, or Torque. Due to the hardware requirements that an HPC cluster like the one previously described requires, this project will be carried out by relying on Virtual Machines (VMs) that allow emulating such a system, but the solution developed will be also valid for deploying a cluster on physical machines. The deployment automation will be carried out using tools that follow the Infrastructure as Code (IaC), such as Ansible. This paradigm allows managing systems and infrastructure through source code, thus automating the provisioning and management of computing resources such as servers, networks, databases, storage, and other components of a computing infrastructure. The source code resulting from this project is publicly available in the following GitHub repository: https://github.com/srdomato/TFG_SRD_HPC.
Description
Editor version
Rights
Atribución-NoComercial-CompartirIgual 3.0 España








