Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
Ph.D. student at Monash University, Australia.
This is a page not in th emain menu
Published:
This post is based on the talk I gave at the Institute for Language, Cognition and Computation (ILCC), University of Edinburgh, on November 14, 2025. The slides are available here.
Published:
This is my first blog post. I’m excited to start sharing my thoughts and experiences here. Stay tuned for more content!
Published:
Today’s research landscape showcases exciting advancements in multi-agent systems and model optimization, with several papers exploring how Large Language Models (LLMs) can collaborate more effectively. The theme of multi-agent collaboration appears prominently across multiple studies, including frameworks like MARS (Multi-Agent System for Deep ReSearch) and MATPO (Multi-Agent Tool-Integrated Policy Optimization), which demonstrate how specialized agent roles can enhance complex reasoning tasks. Another significant trend involves improving training efficiency through innovative approaches to combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), with methods like MIFO (Mitigating Forgetting Between Supervised and Reinforcement Learning) showing remarkable data efficiency gains. In the multilingual domain, analysis of Mixture-of-Experts (MoE) architectures reveals fascinating routing patterns, while new benchmarks like LLM-Hanabi provide sophisticated ways to evaluate Theory-of-Mind (ToM) capabilities in collaborative settings. These developments collectively point toward more efficient, collaborative, and capable AI systems that better mimic human reasoning processes.
Published:
Today’s research landscape showcases significant advancements in agentic systems and cross-lingual modeling, with a strong emphasis on memory architectures and optimization techniques. Several papers introduce novel frameworks for enhancing Large Language Model (LLM) capabilities: CAM (Constructivist Agentic Memory) draws from cognitive theory to build hierarchical memory structures for long-document comprehension, while AgentFlow introduces “in-the-flow” optimization using Flow-GRPO (Flow-based Group Refined Policy Optimization) to train planners within multi-turn agentic loops. Concurrently, ARM (Agentic Reasoning Modules) presents an evolutionary approach to discover specialized reasoning components, and Parallel Tokenizers proposes a new vocabulary alignment method to improve cross-lingual transfer in low-resource settings. These works collectively highlight a trend toward more modular, trainable, and cognitively-inspired agent architectures that demonstrate strong generalization and efficiency gains across diverse reasoning and multilingual tasks.
Published:
Today’s research landscape showcases significant advances in multi-agent collaboration frameworks, with several papers proposing innovative approaches to enhance reasoning capabilities through structured interaction. The Double-Loop Multi-Agent (DLMA) framework introduces a bilevel optimization strategy where “professor” agents evolve research plans while “doctoral student” agents execute them, achieving state-of-the-art results in automated scientific research. Similarly, Self-Signals Driven Multi-LLM Debate (SID) leverages internal model confidence and attention patterns to optimize multi-agent debate efficiency, while ToolMem enhances multimodal agents with learnable capability memories for improved tool selection. In reinforcement learning, λ-GRPO addresses length bias in Group Relative Policy Optimization (GRPO) through adaptive token weighting, and the PiKa dataset demonstrates that expert-level synthetic data can achieve superior alignment with just 30k examples—dramatically improving data efficiency. These works collectively highlight a trend toward more sophisticated, efficient, and self-aware AI systems capable of complex, multi-step problem-solving.
Published:
Today’s research highlights significant advancements in multi-agent systems and multilingual AI, revealing a clear trend toward collaborative intelligence and cross-lingual efficiency. A standout innovation is Guided Topology Diffusion (GTD), which dynamically generates optimized communication structures for multiple LLM agents, balancing performance with cost efficiency. In multilingual domains, Multilingual Generative Retrieval via Cross-lingual Semantic Compression (MGR-CSC) introduces a novel framework that unifies semantically equivalent keywords across languages into “atoms,” drastically reducing identifier space while improving retrieval accuracy. Meanwhile, WaltzRL refines safety alignment through multi-agent reinforcement learning, training a conversation agent and a feedback agent to collaboratively reduce unsafe outputs and overrefusals. These contributions underscore a broader movement toward more adaptive, resource-conscious, and robust AI systems.
Published:
Today’s literature highlights significant advances in multi-agent systems and model optimization, with several papers exploring how Large Language Models (LLMs) can collaborate effectively. Notable developments include LLM×MapReduce-V3, which introduces a hierarchically modular agent system using the Model Context Protocol (MCP) for dynamic, human-in-the-loop survey generation, and StoryBox, a hybrid bottom-up framework where agents interact in a simulated environment to produce coherent, long-form narratives. In optimization, PerSyn (Personalized data Synthesis) proposes a “Route then Generate” paradigm for multi-teacher distillation, efficiently assigning prompts to optimal teachers based on student learnability. Meanwhile, Rollout Routing Replay (R3) addresses instability in Reinforcement Learning (RL) for Mixture-of-Experts (MoE) models by aligning training and inference routers, preventing catastrophic collapse. Another study focuses on mitigating memorization risks during fine-tuning using n-gram-based early stopping and regularization. Together, these works underscore a trend toward more modular, efficient, and stable AI systems capable of complex, collaborative tasks.
Published:
Today’s research highlights an emerging focus on enhancing the reliability and equity of Large Language Models (LLMs) through introspection and infrastructure reform. A key theme is the drive to improve Retrieval-Augmented Generation (RAG) systems, with one study proposing CLEAR (Conflict-Localized and Enhanced Attention for RAG), a framework that uses hidden-state probing to detect and resolve knowledge conflicts for more faithful generation. Another paper tackles a fundamental bias in AI infrastructure, revealing systematic tokenization disparities that create computational and economic inequities for non-Latin and low-resource languages. Complementing these efforts to build more robust systems, a third work challenges the necessity of costly human annotations, introducing PARO (Pattern-Aware LLMs as Rationale AnnOtators), which shows that instilling correct reasoning patterns is more critical than the volume of human rationales for training LLMs on procedural tasks.
Published:
Today’s research landscape showcases significant advances in enhancing the reasoning and specialization of large language models (LLMs), with several papers focusing on structured reasoning frameworks like Chain-of-Thought (CoT) fine-tuning and Program-of-Thoughts (PoT). A notable trend is the use of evolutionary and multi-agent strategies to improve model performance: CoT-Evo applies evolutionary algorithms to distill high-quality reasoning traces for scientific domains, while EvoTest introduces a test-time learning framework where agents evolve their configurations across episodes. In parallel, methods like GatePro optimize Mixture-of-Experts (MoE) models by promoting expert diversity without additional parameters, and M²PO (Multi-Pair, Multi-Perspective Preference Optimization) refines preference learning for machine translation by integrating multi-perspective rewards. Industrial applications are also prominent, as seen in Meituan’s WOWService, which leverages multi-agent systems for scalable, real-world dialogue systems. Additionally, multilingual adaptation is advanced through sparse subnetwork fine-tuning, efficiently enhancing LLM capabilities for underrepresented languages.
Minghao Wu, Fei Liu and Trevor Cohn. Evaluating the Utility of Hand-crafted Features in Sequence Labelling. EMNLP 2018 [Link to ACL anthology|More info]
Minghao Wu, Yitong Li, Meng Zhang, Liangyou Li, Gholamreza Haffari and Qun Liu (2021) Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training. EMNLP 2021 [Link to ACL anthology|More info]
Minghao Wu, George Foster, Lizhen Qu, and Gholamreza Haffari. 2023. Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation. EACL 2023 [Link to ACL anthology|More info]
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari. Adapting Large Language Models for Document-Level Machine Translation. abs/2401.06468. [Link to arXiv|More info]
Minghao Wu, Yufei Wang, George Foster, Lizhen Qu, and Gholamreza Haffari. Importance-Aware Data Augmentation for Document-Level Neural Machine Translation. EACL 2024 [Link to ACL anthology|More info]
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham Fikri Aji. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions. EACL 2024 [Link to ACL anthology|More info]
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, and Gholamreza Haffari. Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models. EMNLP 2024 [Link to arXiv|More info]
Minghao Wu, Jiahao Xu, and Longyue Wang. TransAgents: Build Your Translation Company with Language Agents. EMNLP 2024 [Link to ACL anthology|More info]
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. Demystifying Multilingual Chain-of-Thought in Process Reward Modeling. abs/2502.12663. [Link to arXiv|More info]
Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang. The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks. abs/2504.15521. [Link to arXiv|More info]
Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari. The Best of Both Worlds: Bridging Quality and Diversity in Data Selection with Bipartite Graph. ICML 2025 [Link to arXiv|More info]
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention. ACL 2025 Outstanding Paper Award [Link to arXiv|More info]
Minghao Wu, Jiahao Xu, Yulin Yuan, Gholamreza Haffari, Longyue Wang, Weihua Luo, and Kaifu Zhang. (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts. TACL 2025 [Link to arXiv|More info]
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. ExpertSteer: Intervening in LLMs through Expert Knowledge. abs/2505.12313. [Link to arXiv|More info]
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models. abs/2505.12300. [Link to arXiv|More info]
Weixuan Wang*, Minghao Wu*, Barry Haddow, and Alexandra Birch. Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization. abs/2509.20900. [Link to arXiv|More info]
Published:
Many think that machine translation (MT) is a solved problem, but is it really? While significant progress has been made with models like GPT-4 and other large language models, challenges remain.
Published:
Test-time scaling (TTS) involves investing more computational resources during inference to enhance model performance on complex tasks, especially reasoning. There are various TTS methods, such as chain-of-thought reasoning, self-consistency, and majority voting. As long as the method involves additional computation during inference, it can be considered a TTS method.