Comparing LLM-generated and human-authored news text using formal syntactic theory

UDC.coleccionInvestigación
UDC.conferenceTitle63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
UDC.departamentoCiencias da Computación e Tecnoloxías da Información
UDC.endPage9060
UDC.grupoInvLingua e Sociedade da Información (LYS)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.startPage9041
dc.contributor.authorZamaraeva, Olga
dc.contributor.authorFlickinger, Dan
dc.contributor.authorBond, Francis
dc.contributor.authorGómez-Rodríguez, Carlos
dc.date.accessioned2026-02-23T09:23:00Z
dc.date.available2026-02-23T09:23:00Z
dc.date.issued2025
dc.descriptionThe Congress took place in Vienna, Austria from July 27 to August 1st, 2025
dc.description.abstract[Abstract]: This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.
dc.description.sponsorshipWe thank Ann Copestake and Emily M. Bender, and more generally the DELPH-IN community for the useful discussion related to the paper. We acknowledge grants SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and ERDF, EU; LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF, EU; and TSI-100925-2023-1 funded by Ministry for Digital Transformation and Civil Service and “NextGenerationEU” PRTR; as well as funding by Xunta de Galicia (ED431C 2024/02). CITIC, as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01).
dc.description.sponsorshipXunta de Galicia; ED431C 2024/02
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.identifier.citationO. Zamaraeva, D. Flickinger, F. Bond, y C. Gómez-Rodríguez, «Comparing LLM-generated and human-authored news text using formal syntactic theory», en Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria: Association for Computational Linguistics, 2025, pp. 9041-9060. doi: 10.18653/v1/2025.acl-long.443.
dc.identifier.doi10.18653/v1/2025.acl-long.443
dc.identifier.isbn979-8-89176-251-0
dc.identifier.issn0736-587X
dc.identifier.urihttps://hdl.handle.net/2183/47473
dc.language.isoeng
dc.publisherACL Anthology
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACION LINGUISTICA: SINTAXIS E INTEGRACION MULTITAREA (SCANNER-UDC)
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147129OB-C21/ES/TECNOLOGÍAS DEL LENGUAJE DESDE UNA PERSPECTIVA VERDE (LATCHING): DOMINIOS CON ESCASOS RECURSOS
dc.relation.projectIDinfo:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDES
dc.relation.urihttps://doi.org/10.18653/v1/2025.acl-long.443
dc.rights©2025 Association for Computational Linguistics
dc.rightsAttribution 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectLarge Language Models (LLMs)
dc.subjectHead-driven Phrase Structure Grammar (HPSG)
dc.subjectEnglish Resource Grammar (ERG)
dc.subjectFormal Syntactic Theory
dc.subjectLinguistic Diversity Analysis
dc.titleComparing LLM-generated and human-authored news text using formal syntactic theory
dc.typeconference output
dspace.entity.typePublication
relation.isAuthorOfPublicatione70a3969-39f6-4458-9339-3b71756fa56e
relation.isAuthorOfPublication.latestForDiscoverye70a3969-39f6-4458-9339-3b71756fa56e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zamaraeva_Olga_2025_Comparing_LLM_human_text.pdf
Size:
599.44 KB
Format:
Adobe Portable Document Format