ChatGPT as an Instructor’s Assistant for Generating and Scoring Exams

Férnandez López, Alberto ÁngelLópez-Torres, MargaritaFernández Sánchez, Jesús JoséVázquez, Digna2026-04-242026-04-242024-08-14J. Chem. Educ. 2024, 101, 3780−37880021-95841938-1328https://hdl.handle.net/2183/48095Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Generative intelligence technologies like ChatGPT hold significant promise across various sectors, particularly in education. This study assessed ChatGPT’s proficiency in responding to questions from University Entrance Exams typically administered to senior secondary students. Our findings indicate that ChatGPT version 4.0 consistently outperformed students, achieving higher average scores across exams from the past four years. However, it still committed errors in about 20% of its responses. Despite this, ChatGPT 4.0 demonstrated a robust capability to comprehend and produce natural language within a chemical context. Consequently, by applying diverse prompt engineering techniques, this AI was able to create short-answer questions and numerical problems that closely mimic the format and conceptual content of University Entrance Exams. We also confirmed that ChatGPT 4.0 could grade exams, showing a significant correlation with scores given by human evaluators but lower than that among human graders. This discrepancy and other practical considerations limit its application in grading exams.engAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/General PublicCheminformaticsPublic Understanding/OutreachComputer-Based LearningTesting/AssessmentChatGPT as an Instructor’s Assistant for Generating and Scoring Examsjournal articleopen access10.1021/ACS.JCHEMED.4C00231