Human-AI Collaborative Feedback in Translator Training: A Mixed-Methods Study of Translation Quality, Revision Behavior, and Learner Perceptions
DOI:
https://doi.org/10.69760/aghel.026002009Keywords:
human-AI collaborative feedback, translation pedagogy, translator training, feedback literacy, translation quality assessmentAbstract
The integration of large language models (LLMs) into language education has prompted renewed interest in AI-assisted feedback, yet purely automated feedback remains vulnerable to contextual misalignment, cultural misreading, and reliability concerns that are particularly consequential in translation training. A human-AI collaborative feedback model, in which an instructor curates, corrects, and supplements LLM-generated commentary before students revise, offers a theoretically motivated alternative, yet its pedagogical effects in translator education remain empirically underexplored. This mixed-methods study examines the impact of such a hybrid feedback approach on undergraduate Chinese-to-English student translators. Forty senior undergraduates translated a 1,500-word cultural heritage text and received ChatGPT-4o-generated feedback subsequently reviewed and annotated by an experienced instructor using a color-coded transparency system. Quantitative analysis using a Multidimensional Quality Metrics (MQM) rubric revealed significant pre-to-post gains across all measured dimensions (overall MQM composite: Δ +1.20 on a 5-point scale, p < .001), with the largest improvements in terminology (Δ +1.47) and accuracy (Δ +1.32) and meaningful gains in cohesion, cultural adaptation, register, language conventions, and format (all p < .001). Think-aloud protocols revealed a consistent two-stage revision pattern and active source evaluation behavior, with students demonstrating greater decisiveness when AI and instructor annotations converged and deeper deliberation when they diverged. Student perception surveys indicated high ratings across clarity, trustworthiness, usefulness, and pedagogical value, with no significant differences between high- and low-performing students. Instructors reported meaningful workload relief on routine corrections while retaining pedagogical authority over higher-order feedback. These findings suggest the potential of a human-in-the-loop feedback framework for translator training in which AI handles systematic error detection while instructors validate, contextualize, and model evaluative judgment.
References
Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: Peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1), 23. https://doi.org/10.1186/s41239-024-00455-4
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
Cao, S., & Zhou, T. (2025). Exploring the efficacy of ChatGPT-based feedback compared with teacher feedback and self-feedback: Evidence from Chinese-English translation. SAGE Open, 15(3), 21582440251369204. https://doi.org/10.1177/21582440251369204
Carless, D. (2012). Trust and its role in facilitating dialogic feedback. In Feedback in Higher and Professional Education. Routledge.
Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315–1325. https://doi.org/10.1080/02602938.2018.1463354
Chauhan, S., & Daniel, P. (2023). A comprehensive survey on various fully automatic machine translation evaluation metrics. Neural Processing Letters, 55(9), 12663–12717. https://doi.org/10.1007/s11063-022-10835-4
Chen, J., Zhang, L. J., Wang, X., & Zhang, T. (2021). Corrigendum: Impacts of self-regulated strategy development-based revision instruction on EFL students’ self-efficacy for text revision: A mixed-methods study. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.747252
Chen, S., & Zhou, T. (2024). Culturally based semantic losses in Lonely Planet’s travel guides translations for Beijing, Shanghai, and Sichuan. Frontiers in Communication, 9. https://doi.org/10.3389/fcomm.2024.1343784
Chen, S., & Zhou, T. (2026). Prompt-induced cultural mediation and its limits: A micro-level analysis of LLM translation of Chinese tourism texts. Cogent Arts & Humanities, 13(1), 2631304. https://doi.org/10.1080/23311983.2026.2631304
Cheng, L., Li, Y., Su, Y., & Gao, L. (2023). Effect of regulation scripts for dialogic peer assessment on feedback quality, critical thinking and climate of trust. Assessment & Evaluation in Higher Education, 48(4), 451–463. https://doi.org/10.1080/02602938.2022.2092068
Dai, W., Lin, J., Jin, H., Li, T., Tsai, Y.-S., Gašević, D., & Chen, G. (2023). Can large language models provide feedback to students? A case study on ChatGPT. 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), 323–325. https://ieeexplore.ieee.org/abstract/document/10260740/
Derakhshan, A., & Taghizadeh, M. S. (n.d.). Does artificial intelligence (AI) nurture or hinder language learners’ higher-order thinking skills (HOTS)? A phenomenological study on L2 learners’ perspectives and lived experiences. https://doi.org/10.1111/ijal.12824
Er, E., Akçapınar, G., Bayazıt, A., Noroozi, O., & Banihashem, S. K. (2025). Assessing student perceptions and use of instructor versus AI-generated feedback. British Journal of Educational Technology, 56(3), 1074–1091. https://doi.org/10.1111/bjet.13558
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 57. https://doi.org/10.1186/s41239-023-00425-2
Group, P., Hurtado Albir (principal investigator), A., Galán-Mañas, A., Kuznik, A., Olalla-Soler, C., Rodríguez-Inés, P., & Romero (research team, in alphabetical order), Lupe. (2018). Competence levels in translation: Working towards a European framework. The Interpreter and Translator Trainer, 12(2), 111–131. https://doi.org/10.1080/1750399X.2018.1466093
Guo, K., Chen, X., & Qiao, S. (2024). Exploring a collaborative approach to peer feedback in EFL writing: How do students participate? RELC Journal, 55(3), 658–672. https://doi.org/10.1177/00336882221143192
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
Han, C., & Lu, X. (2023). Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom? Computer Assisted Language Learning, 36(5–6), 1064–1087. https://doi.org/10.1080/09588221.2021.1968915
Han, Y., & Xu, Y. (2021). Student feedback literacy and engagement with feedback: A case study of Chinese undergraduate students. Teaching in Higher Education, 26(2), 181–196. https://doi.org/10.1080/13562517.2019.1648410
Holstein, K., Aleven, V., & Rummel, N. (2020). A conceptual framework for human–AI hybrid adaptivity in education (pp. 240–254). https://doi.org/10.1007/978-3-030-52237-7_20
Inkpen, K., Chappidi, S., Mallari, K., Nushi, B., Ramesh, D., Michelucci, P., Mandava, V., Vepřek, L. H., & Quinn, G. (2023). Advancing human-AI complementarity: The impact of user expertise and algorithmic tuning on joint decision making. ACM Transactions on Computer-Human Interaction, 30(5), 1–29. https://doi.org/10.1145/3534561
Jiao, H., Hu, W., & Zhang, X. (2025). To eat or to feed: Can large language models provide useful feedback in translation education?
Kim, H. R., & Bowles, M. (2019). How deeply do second language learners process written corrective feedback? Insights gained from think-alouds. TESOL Quarterly, 53(4), 913–938. https://doi.org/10.1002/tesq.522
Kinder, A., Briese, F. J., Jacobs, M., Dern, N., Glodny, N., Jacobs, S., & Leßmann, S. (2025). Effects of adaptive feedback generated by a large language model: A case study in teacher education. Computers and Education: Artificial Intelligence, 8, 100349. https://doi.org/10.1016/j.caeai.2024.100349
Kiraly, D. (Ed.). (2015). Towards Authentic Experiential Learning in Translator Education (1st ed.). V&R Unipress. https://doi.org/10.14220/9783737004954
Koponen, M. (2016). Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. The Journal of Specialised Translation, 25(2), 131–148.
Kumar, S., Datta, S., Singh, V., Datta, D., Kumar Singh, S., & Sharma, R. (2024). Applications, challenges, and future directions of human-in-the-loop learning. IEEE Access, 12, 75735–75760. https://doi.org/10.1109/ACCESS.2024.3401547
Lau, G. R., Low, W. Y., Tay, L., Guevarra, Y., Gašević, D., & Hartanto, A. (2025). Understanding critical thinking in generative artificial intelligence use: Development, validation, and correlates of the critical thinking in AI use scale (arXiv:2512.12413). arXiv. https://doi.org/10.48550/arXiv.2512.12413
Li, M., Yu, S., Mak, P., & Liu, C. (2023). Exploring the efficacy of peer assessment in university translation classrooms. The Interpreter and Translator Trainer, 17(4), 585–609. https://doi.org/10.1080/1750399X.2023.2236920
Lin, Z., Song, X., Guo, J., & Wang, F. (2021). Peer feedback in translation training: A quasi-experiment in an advanced Chinese–English translation course. Frontiers in Psychology, 12, 631898. https://doi.org/10.3389/fpsyg.2021.631898
Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): A framework for declaring and describing translation quality metrics. Tradumàtica Tecnologies de La Traducció, 12, 455–463. https://doi.org/10.5565/rev/tradumatica.77
Ma, H., Ismail, L., & Han, W. (2024). A bibliometric analysis of artificial intelligence in language teaching and learning (1990–2023): Evolution, trends and future directions. Education and Information Technologies, 29(18), 25211–25235. https://doi.org/10.1007/s10639-024-12848-z
Mahapatra, S. (2024). Impact of ChatGPT on ESL students’ academic writing skills: A mixed methods intervention study. Smart Learning Environments, 11(1), 9. https://doi.org/10.1186/s40561-024-00295-9
Mariana, V., Cox, T., & Melby, A. (2015). The Multidimensional Quality Metrics (MQM) framework: A new framework for translation quality assessment. The Journal of Specialised Translation, 137–161. https://doi.org/10.26034/cm.jostrans.2015.343
Mellinger, C. D. (2019). Metacognition and self-assessment in specialized translation education: Task awareness and metacognitive bundling. Perspectives, 27(4), 604–621. https://doi.org/10.1080/0907676X.2019.1566390
Memarian, B., & Doleck, T. (2024). Human-in-the-loop in artificial intelligence in education: A review and entity-relationship (ER) analysis. Computers in Human Behavior: Artificial Humans, 2(1), 100053. https://doi.org/10.1016/j.chbah.2024.100053
Mohammed, T. A. S. (2025). Evaluating translation quality: A qualitative and quantitative assessment of machine and LLM-driven Arabic–English translations. Information, 16(6), 440. https://doi.org/10.3390/info16060440
Molenaar, I. (2022). Towards hybrid human-AI learning technologies. European Journal of Education, 57(4), 632–645. https://doi.org/10.1111/ejed.12527
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., & Fernández-Leal, Á. (2023). Human-in-the-loop machine learning: A state of the art. Artificial Intelligence Review, 56(4), 3005–3054. https://doi.org/10.1007/s10462-022-10246-w
Neunzig, W., & Tanqueiro, H. (2005). Teacher feedback in online education for trainee translators. Meta: Journal des Traducteurs / Meta: Translators’ Journal, 50(4). https://doi.org/10.7202/019873ar
Orak, S. D. (2025). Turkish EFL teachers’ perspectives on AI-generated feedback: Negotiating trust, control, and pedagogical adaptation in writing instruction. Applied Linguistics: Research, Measurement and Practice, 1(1), 74–94. https://doi.org/10.65334/n40kdg46
Ranalli, J. (2023). Automated writing evaluation: Student perception, use, and impact. Language Learning & Technology, 27(1), 1–24.
Sato, M., & Loewen, S. (2018). Metacognitive instruction enhances the effectiveness of corrective feedback: Variable effects of feedback types and linguistic targets. Language Learning, 68(2), 507–545. https://doi.org/10.1111/lang.12283
Selwyn, N. (2022). The future of AI and education: Some cautionary notes. European Journal of Education, 57(4), 620–631.
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Tabari, M. A., Sato, M., & Wang, Y. (2023). Engagement with written corrective feedback: Examination of feedback types and think-aloud protocol as pedagogical interventions. Language Teaching Research, 13621688231202574. https://doi.org/10.1177/13621688231202574
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, 101440.
Washbourne, K. (2014). Beyond error marking: Written corrective feedback for a dialogic pedagogy in translator training. The Interpreter and Translator Trainer, 8(2), 240–256. https://doi.org/10.1080/1750399X.2014.908554
Wiboolyasarin, W., Wiboolyasarin, K., Suwanwihok, K., Jinowat, N., & Muenjanchoey, R. (2024). Synergizing collaborative writing and AI feedback: An investigation into enhancing L2 writing proficiency in wiki-based environments. Computers and Education: Artificial Intelligence, 6, 100228. https://doi.org/10.1016/j.caeai.2024.100228
Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English Language Arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109. https://doi.org/10.1016/j.compedu.2016.05.004
Winstone, N. E., Mathlin, G., & Nash, R. A. (2019). Building feedback literacy: Students’ perceptions of the developing engagement with feedback toolkit. Frontiers in Education, 4. https://doi.org/10.3389/feduc.2019.00039
Wu, X., Xiao, L., Sun, Y., Zhang, J., Ma, T., & He, L. (2022). A survey of human-in-the-loop for machine learning. Future Generation Computer Systems, 135, 364–381.
Xu, S., Su, Y., & Liu, K. (2025). Investigating student engagement with AI-driven feedback in translation revision: A mixed-methods study. Education and Information Technologies, 30(12), 16969–16995. https://doi.org/10.1007/s10639-025-13457-0
Xu, X., Sun, F., & Hu, W. (2025). Integrating human expertise with GenAI: Insights into a collaborative feedback approach in translation education. System, 129, 103600. https://doi.org/10.1016/j.system.2025.103600
Yu, S., Zhang, Y., Zheng, Y., & Lin, Z. (2020). Written corrective feedback strategies in English-Chinese translation classrooms. The Asia-Pacific Education Researcher, 29(2), 101–111. https://doi.org/10.1007/s40299-019-00456-2
Zheng, Y., Zhong, Q., Yu, S., & Li, X. (2020). Examining students’ responses to teacher translation feedback: Insights from the perspective of student engagement. Sage Open, 10(2), 2158244020932536. https://doi.org/10.1177/2158244020932536
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0), which allows others to share, adapt, and build upon the work for any purpose, provided appropriate credit is given to the original author(s) and source. Authors are permitted to enter into separate agreements for non-exclusive distribution of the published version (e.g., post to a repository or publish in a book), with acknowledgement of its initial publication in this journal.


