Comparing LLM-generated and human-authored news text using formal syntactic theory
| UDC.coleccion | Investigación | |
| UDC.conferenceTitle | 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.endPage | 9060 | |
| UDC.grupoInv | Lingua e Sociedade da Información (LYS) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.startPage | 9041 | |
| dc.contributor.author | Zamaraeva, Olga | |
| dc.contributor.author | Flickinger, Dan | |
| dc.contributor.author | Bond, Francis | |
| dc.contributor.author | Gómez-Rodríguez, Carlos | |
| dc.date.accessioned | 2026-02-23T09:23:00Z | |
| dc.date.available | 2026-02-23T09:23:00Z | |
| dc.date.issued | 2025 | |
| dc.description | The Congress took place in Vienna, Austria from July 27 to August 1st, 2025 | |
| dc.description.abstract | [Abstract]: This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre. | |
| dc.description.sponsorship | We thank Ann Copestake and Emily M. Bender, and more generally the DELPH-IN community for the useful discussion related to the paper. We acknowledge grants SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and ERDF, EU; LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF, EU; and TSI-100925-2023-1 funded by Ministry for Digital Transformation and Civil Service and “NextGenerationEU” PRTR; as well as funding by Xunta de Galicia (ED431C 2024/02). CITIC, as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01). | |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2024/02 | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/01 | |
| dc.identifier.citation | O. Zamaraeva, D. Flickinger, F. Bond, y C. Gómez-Rodríguez, «Comparing LLM-generated and human-authored news text using formal syntactic theory», en Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria: Association for Computational Linguistics, 2025, pp. 9041-9060. doi: 10.18653/v1/2025.acl-long.443. | |
| dc.identifier.doi | 10.18653/v1/2025.acl-long.443 | |
| dc.identifier.isbn | 979-8-89176-251-0 | |
| dc.identifier.issn | 0736-587X | |
| dc.identifier.uri | https://hdl.handle.net/2183/47473 | |
| dc.language.iso | eng | |
| dc.publisher | ACL Anthology | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACION LINGUISTICA: SINTAXIS E INTEGRACION MULTITAREA (SCANNER-UDC) | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLES | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147129OB-C21/ES/TECNOLOGÍAS DEL LENGUAJE DESDE UNA PERSPECTIVA VERDE (LATCHING): DOMINIOS CON ESCASOS RECURSOS | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDES | |
| dc.relation.uri | https://doi.org/10.18653/v1/2025.acl-long.443 | |
| dc.rights | ©2025 Association for Computational Linguistics | |
| dc.rights | Attribution 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Large Language Models (LLMs) | |
| dc.subject | Head-driven Phrase Structure Grammar (HPSG) | |
| dc.subject | English Resource Grammar (ERG) | |
| dc.subject | Formal Syntactic Theory | |
| dc.subject | Linguistic Diversity Analysis | |
| dc.title | Comparing LLM-generated and human-authored news text using formal syntactic theory | |
| dc.type | conference output | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | e70a3969-39f6-4458-9339-3b71756fa56e | |
| relation.isAuthorOfPublication.latestForDiscovery | e70a3969-39f6-4458-9339-3b71756fa56e |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zamaraeva_Olga_2025_Comparing_LLM_human_text.pdf
- Size:
- 599.44 KB
- Format:
- Adobe Portable Document Format

