Minghao Wu - The University of North Carolina at Chapel Hill
I am currently a Postdoctoral Research Associate at The University of North Carolina at Chapel Hill, working with Prof. Mohit Bansal. I obtained my Ph.D. degree in Computer Science from Monash University in 2025, where I was advised by Prof. Gholamreza (Reza) Haffari. Before that, I received my M.Eng. degree in Information Technology from The University of Melbourne in 2018, under the supervision of Prof. Trevor Cohn, and my B.Sc. degree in Statistics and Information Systems from The University of Sydney in 2016.
My research interests mainly lie in the following areas:
- Language Agents: My primary research interest is currently focused on multi-agent communication and collaboration. My recent work (Wu et al., 2025) proposed a multi-agent framework that mimics human translation workflows using language agents, achieving state-of-the-art results in literary translation tasks.
- Data Efficiency: I have explored data-efficient methods for training language models, including fine-tuning large language models with heterogeneous data sources (Wu et al., 2024, Wang et al., 2025), leveraging a bipartite graph to balance quality and diversity in data selection (Wu et al., 2025), and synthesizing large-scale datasets for fine-tuning language models (Wu et al., 2023).
- Multilinguality and Machine Translation: I have conducted research on multilinguality and machine translation, including multilingual supervised fine-tuning for large language models (Li et al., 2023), activation steering for cross-lingual transfer in multilingual language models (Wang et al., 2025, Wang et al., 2025), multilingual process reward modeling (Wang et al., 2025), document-level machine translation (Wu et al., 2023, Wu et al., 2024, Wu et al., 2024), and multilingual evaluation (Wu et al., 2025).
- Miscellaneous: Additionally, I have contributed to various other topics, including large language model evaluation (Wu and Aji, 2024), multi-modal large language models (Lyu et al., 2023, Wang et al., 2024), and recommender systems (Zhang et al., 2024).
My CV can be downloaded from here.
Education
- Ph.D. Computer Science, Monash University, 2022 - 2025, supervised by Prof. Gholamreza (Reza) Haffari.
- M.Eng. Information Technology, The University of Melbourne, 2016 - 2018, supervised by Prof. Trevor Cohn.
- B.Sc. Statistics and Information Systems, The University of Sydney, 2013 - 2016.
Work Experience
- Jun. 2025 - Present: Postdoc at The University of North Carolina at Chapel Hill, supervised by Prof. Mohit Bansal.
- Nov. 2024 - Apr. 2025: Research Intern at Alibaba Group, supervised by Dr. Longyue Wang.
- Jul. 2023 - Oct. 2023: Research Intern at Tencent AI Lab, supervised by Dr. Longyue Wang.
- Apr. 2023 - Jul. 2023: Visiting Researcher at Mohamed bin Zayed University of Artificial Intelligence, supervised by Dr. Alham Fikri Aji.
- Jul. 2020 - Jul. 2021: Research Intern at Huawei Noah’s Ark Lab, supervised by Dr. Meng Zhang.
- Aug. 2018 - Jul. 2019: Research Engineer at JD AI Research of JD.com, Inc, supervised by Dr. Xiaodong He.
Selected Publications
You can find the complete list of my articles on my Google Scholar profile.
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models. abs/2505.12300.
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. ExpertSteer: Intervening in LLMs through Expert Knowledge. abs/2505.12313.
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu, Jiahao Xu, Yulin Yuan, Gholamreza Haffari, Longyue Wang, Weihua Luo, and Kaifu Zhang. (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts. TACL 2025
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention. ACL 2025
The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari. The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph. ICML 2025
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang. The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks. abs/2504.15521.
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. Demystifying Multilingual Chain-of-Thought in Process Reward Modeling. abs/2502.12663.
TransAgents: Build Your Translation Company with Language Agents
Minghao Wu, Jiahao Xu, and Longyue Wang. TransAgents: Build Your Translation Company with Language Agents. EMNLP 2024
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, and Gholamreza Haffari. Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models. EMNLP 2024
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham Fikri Aji. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. EACL 2024
Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
Minghao Wu, Yufei Wang, George Foster, Lizhen Qu, and Gholamreza Haffari. Importance-Aware Data Augmentation for Document-Level Neural Machine Translation. EACL 2024
Adapting Large Language Models for Document-Level Machine Translation
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari. Adapting Large Language Models for Document-Level Machine Translation. abs/2401.06468.
Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation
Minghao Wu, George Foster, Lizhen Qu, and Gholamreza Haffari. 2023. Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation. EACL 2023
Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training
Minghao Wu, Yitong Li, Meng Zhang, Liangyou Li, Gholamreza Haffari and Qun Liu (2021) Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training. EMNLP 2021
Evaluating the Utility of Hand-crafted Features in Sequence Labelling
Minghao Wu, Fei Liu and Trevor Cohn. Evaluating the Utility of Hand-crafted Features in Sequence Labelling. EMNLP 2018