ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
| UDC.coleccion | Investigación | |
| UDC.conferenceTitle | CIKM2025 | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.endPage | 6327 | |
| UDC.grupoInv | Information Retrieval Lab (IRlab) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.startPage | 6323 | |
| dc.contributor.author | Bao, Eliseo | |
| dc.contributor.author | Pérez, Anxo | |
| dc.contributor.author | Parapar, Javier | |
| dc.date.accessioned | 2026-06-04T11:07:47Z | |
| dc.date.available | 2026-06-04T11:07:47Z | |
| dc.date.issued | 2025 | |
| dc.description | Presented at: CIKM '25: The 34th ACM International Conference on Information and Knowledge Management, Seoul, Republic of Korea, November 10 - 14, 2025 © Autores | ACM 2025. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management, https://doi.org/10.1145/3746252.3761610 | |
| dc.description.abstract | [Abstract]: Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional clinical access and pervasive stigma. Social media platforms, and Reddit in particular, offer rich, user-generated narratives that can reveal early signs of depressive symptomatology. However, existing computational approaches often label entire posts simply as depressed or not depressed, without linking language to specific criteria from the DSM-5, the standard clinical framework for diagnosing depression. This limits both clinical relevance and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms. For each label, the annotator also provides a concise clinical rationale grounded in DSM-5 methodology. We conduct an exploratory analysis of the collection, examining lexical, syntactic, and emotional patterns that characterize symptom expression in social media narratives. Compared to prior resources, ReDSM5 uniquely combines symptom-specific supervision with expert explanations, facilitating the development of models that not only detect depression but also generate human-interpretable reasoning. We establish baseline benchmarks for both multi-label symptom classification and explanation generation, providing reference results for future research on detection and interpretability. | |
| dc.description.sponsorship | This work was supported by the project PID2022-137061OB-C21 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, ERDF, A way of making Europe by the European Union); the Consellería de Educación, Universidade e Formación Profesional, Spain (grant number ED481A-2024-079 and accreditations 2019- 2022 ED431G/01 and GRC ED431C 2025/49); and the European Regional Development Fund, which supports the CITIC Research Center. | |
| dc.description.sponsorship | Xunta de Galicia; ED481A-2024-079 | |
| dc.description.sponsorship | Xunta de Galicia; 2019-2022 ED431G/01 | |
| dc.description.sponsorship | Xunta de Galicia; GRC ED431C 2025/49 | |
| dc.identifier.citation | Eliseo Bao, Anxo Perez, and Javier Parapar. 2025. ReDSM5: A Reddit Dataset for DSM-5 Depression Detection. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25), November 10–14, 2025, Seoul, Republic of Korea. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3746252.3761610 | |
| dc.identifier.doi | 10.1145/3746252.376161 | |
| dc.identifier.isbn | 979-8-4007-2040-6 | |
| dc.identifier.uri | https://hdl.handle.net/2183/48523 | |
| dc.language.iso | eng | |
| dc.publisher | ACM | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION | |
| dc.relation.uri | https://doi.org/10.1145/3746252.376161 | |
| dc.rights | © 2025 | |
| dc.rights.accessRights | open access | |
| dc.subject | Depression symptom detection | |
| dc.subject | Mental health | |
| dc.subject | DSM-5 | |
| dc.subject | Social media | |
| dc.subject | Health informatics | |
| dc.subject | NLP | |
| dc.subject | Language resources | |
| dc.title | ReDSM5: A Reddit Dataset for DSM-5 Depression Detection | |
| dc.type | conference output | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 99ed6581-6dee-442a-9b37-c35da63bef8a | |
| relation.isAuthorOfPublication | c673c8b1-1afc-48f6-85e9-8f29f9cffb91 | |
| relation.isAuthorOfPublication | fef1a9cb-e346-4e53-9811-192e144f09d0 | |
| relation.isAuthorOfPublication.latestForDiscovery | 99ed6581-6dee-442a-9b37-c35da63bef8a |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Bao_Eliseo_2025_ReDSM5.pdf
- Size:
- 713.7 KB
- Format:
- Adobe Portable Document Format

