Skip navigation
  •  Inicio
  • UDC 
    • Cómo depositar
    • Políticas do RUC
    • FAQ
    • Dereitos de Autor
    • Máis información en INFOguías UDC
  • Percorrer 
    • Comunidades
    • Buscar por:
    • Data de publicación
    • Autor
    • Título
    • Materia
  • Axuda
    • español
    • Gallegan
    • English
  • Acceder
  •  Galego 
    • Español
    • Galego
    • English
  
Ver ítem 
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems

Thumbnail
Ver/abrir
Pereira_Ruisanchez_Dariel_2023_Deep_Contextual_Bandit_and_Reinforcement_Learning_for_IRS_Assisted_MU_MIMO_Systems.pdf (1.601Mb)
Use este enlace para citar
http://hdl.handle.net/2183/34562
Coleccións
  • Investigación (FIC) [1682]
Metadatos
Mostrar o rexistro completo do ítem
Título
Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems
Autor(es)
Pereira-Ruisánchez, Dariel
Fresnedo, Óscar
Pérez-Adán, Darian
Castedo, Luis
Data
2023-07
Cita bibliográfica
D. Pereira-Ruisánchez, Ó. Fresnedo, D. Pérez-Adán and L. Castedo, "Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems," in IEEE Transactions on Vehicular Technology, vol. 72, no. 7, pp. 9099-9114, July 2023, doi: 10.1109/TVT.2023.3249353.
É version de
https://doi.org/10.1109/TVT.2023.3249353
Resumo
[Abstract]: The combination of multiple-input multiple-output (MIMO) systems and intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system. Both approaches aim to maximize the system sum-rate for every channel realization. The first proposed solution is a novel contextual bandit (CB) framework with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions, and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is appropriately met. Both proposals perform remarkably better than state-of-the-art heuristic methods in scenarios with high multi-user interference.
Palabras chave
Deep contextual bandit
DDPG
Deep reinforcement learning
Intelligent reflecting surfaces
MIMO
 
Descrición
© 2023 IEEE. This version of the article has been accepted for publication, after peer review. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The Version of Record is available online at: https://doi.org/10.1109/TVT.2023.3249353.
Versión do editor
https://doi.org/10.1109/TVT.2023.3249353
Dereitos
© 2023 IEEE. All rights reserved. Todos os dereitos reservados.
ISSN
0018-9545

Listar

Todo RUCComunidades e colecciónsPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulaciónEsta colecciónPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulación

A miña conta

AccederRexistro

Estatísticas

Ver Estatísticas de uso
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Suxestións