ArXiv Daily Digests

There are hundreds of papers uploaded to arXiv every day, making it challenging to read them all. To stay updated with the latest research, I use a large language model to filter out papers that align well with my research interests. Below are the daily digests of selected papers that my AI assistant has identified as noteworthy.

If you found some dates missing, don’t be surprised. It could be due to various reasons, such as no papers matching my interests on that day, failure to fetch the data, or other technical issues. I will try my best to keep this page updated.

Change Log:

Disclaimer: All the content below is generated by a large language model and may not be entirely accurate or complete.

ArXiv Daily Digest on 2025-11-19

Published:

Today’s research highlights a significant trend toward optimizing human-agent and agent-agent collaboration frameworks across diverse domains. In human-computer interaction, the Computer-Use Agent (CUA) as Judge paradigm introduces a Coder-CUA collaboration framework that shifts interface design from human-centric aesthetics toward agent-native efficiency, achieving substantial improvements in task solvability and navigation success. Meanwhile, in language agent efficiency, DEPO (Dual-Efficiency Preference Optimization) addresses both step-level and trajectory-level efficiency, cutting token usage by up to 60.9% while maintaining performance. In clinical NLP, the OEMA (Ontology-Enhanced Multi-Agent Collaboration Framework) leverages SNOMED CT (Systematized Nomenclature of Medicine—Clinical Terms) to achieve near-supervised performance in zero-shot Named Entity Recognition (NER), demonstrating how multi-agent systems with structured knowledge can bridge the gap between zero-shot and supervised learning.

ArXiv Daily Digest on 2025-11-10

Published:

Today’s research highlights two significant advances in efficient and multilingual language model development. The first paper introduces Importance-Aware Data Selection, proposing a novel Model Instruction Weakness Value (MIWV) metric that leverages In-Context Learning (ICL) to identify high-impact training samples, achieving superior performance with just 1% of data compared to full-dataset training. The second work, “Beyond English,” presents the LMT (Large-scale Multilingual Translation) model suite, addressing English-centric bias through innovative techniques like Strategic Downsampling to counter directional degeneration and Parallel Multilingual Prompting (PMP) for enhanced cross-lingual transfer, establishing new state-of-the-art results in multilingual machine translation with remarkable parameter efficiency.

ArXiv Daily Digest on 2025-11-06

Published:

Today’s research landscape showcases a strong emphasis on enhancing the reliability and collaborative capabilities of large language models (LLMs) through neuro-symbolic and multi-agent frameworks. A key trend is the integration of formal logic and symbolic reasoning to validate and improve Chain-of-Thought (CoT) processes, as demonstrated by VeriCoT, which uses first-order logic and automated solvers to verify reasoning steps. Meanwhile, in the domain of multi-agent systems, studies like BAPPA and DR. WELL (Dynamic Reasoning and Learning with Symbolic World Model) explore how structured collaboration—through agent discussion, planner-coder pipelines, and dynamic world models—can significantly boost performance in complex tasks like Text-to-SQL generation and embodied planning, enabling more efficient, adaptive, and interpretable AI systems.

ArXiv Daily Digest on 2025-11-05

Published:

Today’s research highlights significant advances in enhancing computational efficiency and security for large language models (LLMs), with two key themes emerging: innovative architectural scaling for continual learning and sophisticated adversarial evaluation frameworks. In continual learning, the SCALE (Upscaled Continual Learning) architecture demonstrates that width upscaling with strategic parameter freezing can dramatically mitigate catastrophic forgetting in small language models (SLMs), enabling effective knowledge acquisition while preserving original capabilities. Concurrently, the EQ-Negotiator framework shows how dynamic emotional personas can empower SLMs to match or exceed LLM performance in complex tasks like credit negotiation, emphasizing strategic intelligence over model scale. Meanwhile, in evaluation methodologies, studies reveal how source-aware neural machine translation (MT) metrics can be reliably adapted for speech translation (ST) using synthetic sources, and novel attack frameworks like Dynamic Deceptor (DyDec) and Static Deceptor (StaDec) expose critical vulnerabilities in LLMs through adaptive, transferable adversarial examples, underscoring the urgent need for improved robustness measures.

ArXiv Daily Digest on 2025-11-04

Published:

Today’s research landscape reveals a strong emphasis on enhancing multi-agent systems through sophisticated coordination frameworks, with several papers addressing the critical challenge of “lazy agent” behavior—where one agent dominates interactions, undermining collaborative potential. Innovative solutions include reinforcement learning for budget-aware centralized controllers, test-time multimodal reasoning via model orchestration, and causal influence measurements to foster balanced deliberation. Notably, the identified “collaboration gap” highlights that individual model performance does not guarantee effective teamwork, prompting new benchmarks and strategies like “relay inference” to elicit latent collaborative skills. Concurrently, contributions in multilingual NLP introduce PragExTra, the first corpus for pragmatic explicitation, enriching machine translation with cultural adaptability through active learning. These works collectively advance scalable, efficient, and interpretable multi-agent and cross-lingual systems.

ArXiv Daily Digest on 2025-10-30

Published:

Today’s research landscape reveals a compelling focus on enhancing the reasoning and collaboration capabilities of large language models (LLMs), with several papers proposing innovative structural and methodological refinements. A key theme is the development of multi-agent systems and novel reasoning paradigms, such as “asynchronous thinking (AsyncThink),” which organizes internal model computations into concurrently executable structures for improved efficiency. This is complemented by data-driven frameworks for forming synergistic multi-agent teams through conversation analysis and community detection. Simultaneously, other studies address foundational training challenges, demonstrating that a simple switch from BF16 to FP16 floating-point precision can resolve the notorious training-inference mismatch in reinforcement learning (RL) fine-tuning, while another traces “value drifts” to show that a model’s ethical alignment is predominantly shaped during supervised fine-tuning (SFT), with preference optimization playing a surprisingly minor role.

ArXiv Daily Digest on 2025-10-29

Published:

Today’s research landscape showcases significant advancements in multi-agent AI systems, with several papers exploring how Large Language Model (LLM) agents can collaborate through sophisticated reasoning and communication frameworks. The papers collectively demonstrate a shift toward outcome-supervised training paradigms that incentivize autonomous exploration, particularly in knowledge-intensive tasks like Knowledge Base Question Answering (KBQA). Notable innovations include graph-based planning for parallel tool execution in GAP (Graph-based Agent Planning), socio-cognitive evaluation frameworks for proactive mediators in ProMediate, and large-scale empirical benchmarks like DEBATE for assessing authentic multi-agent dynamics. These works highlight growing emphasis on efficient collaboration under information asymmetry, with reinforcement learning (RL) and curriculum learning emerging as key techniques for developing more robust and human-aligned AI systems.

ArXiv Daily Digest on 2025-10-23

Published:

Today’s research landscape showcases a compelling focus on enhancing large reasoning models (LRMs) through tool integration, multilingual reasoning, and multi-agent collaboration. Several papers explore how LRMs can be trained to effectively use external tools like Code Interpreters (CIs), with frameworks such as CoRT (Code-Optimized Reasoning Training) demonstrating significant gains in mathematical reasoning efficiency. In multilingual contexts, studies reveal that while English often serves as a “reasoning lingua franca,” this approach risks “Lost in Translation” errors, highlighting the need for robust native-language reasoning capabilities. Meanwhile, multi-agent reinforcement learning systems like Mixture-of-Minds are achieving state-of-the-art results in complex tasks like table understanding by decomposing problems into specialized planning, coding, and answering roles. Another emerging trend is data efficiency, with methods like LM-Mixup showing that low-quality data, when properly distilled via instruction distillation, can rival the performance of training on full datasets. Finally, new evaluation techniques such as ThinMQM (Thinking-calibrated Multidimensional Quality Metrics) are calibrating LRM “thinking” to improve machine translation assessment, reducing computational budgets while boosting accuracy.

ArXiv Daily Digest on 2025-10-22

Published:

Today’s research landscape showcases significant advancements in enhancing large language model (LLM) reasoning through sophisticated reinforcement learning (RL) and multi-agent frameworks. A prominent trend is addressing the “learning cliff” phenomenon, where models plateau on problems beyond their current capabilities, with novel solutions like Scaf-GRPO (Scaffolded Group Relative Policy Optimization) providing hierarchical hints to restore learning signals. Concurrently, AgenticMath demonstrates the power of multi-agent systems for generating high-quality, mathematically rigorous synthetic data, proving that targeted data quality can outperform massive-scale alternatives. The scope of agent capabilities continues to expand, as evidenced by ColorAgent’s robust, personalized operating system agent, while multilingual performance sees a paradigm shift with prompt-space optimization—systematically transforming prompts for naturalness, cultural adaptation, and difficulty rather than merely translating them. Furthermore, LoongRL tackles advanced long-context reasoning by synthesizing high-difficulty tasks that induce emergent “plan–retrieve–reason–recheck” patterns, enabling impressive generalization from shorter training contexts.

ArXiv Daily Digest on 2025-10-21

Published:

Today’s research highlights innovative strategies for enhancing model performance while mitigating critical limitations. A prominent theme is the integration of Large Language Models (LLMs) with specialized frameworks to overcome data and semantic gaps: the HYDRE (HYbrid Distantly supervised Relation Extraction) framework combines distantly supervised models with in-context learning to tackle noisy annotations in relation extraction, achieving significant gains in both monolingual and cross-lingual settings. Meanwhile, CodeRL+ introduces execution semantics alignment into Reinforcement Learning with Verifiable Rewards (RLVR), effectively bridging the gap between textual patterns and functional code correctness. Complementing these advances, a systematic analysis of catastrophic forgetting reveals that reinforcement learning (RL), due to its use of on-policy data, consistently preserves prior knowledge better than supervised fine-tuning (SFT) during post-training, offering practical guidelines for continual adaptation.

ArXiv Daily Digest on 2025-10-20

Published:

Today’s research landscape showcases significant advancements in multi-agent systems and efficient model architectures, with several papers introducing novel frameworks for enhanced collaboration and scalability. A prominent theme is verification-aware planning, exemplified by VeriMAP, which integrates explicit verification functions into multi-agent workflows to improve robustness in complex reasoning tasks. Another key development is Enterprise Deep Research (EDR), a steerable multi-agent system that enables human-in-the-loop guidance for enterprise analytics, demonstrating the growing emphasis on controllable AI systems. In architectural innovations, ReXMoE (Reusing Experts with Mixture-of-Experts) presents a novel approach to Mixture-of-Experts (MoE) by enabling cross-layer expert reuse, while Contextual Attention Modulation (CAM) offers a new parameter-efficient fine-tuning method for multi-task adaptation in Large Language Models (LLMs). Complementing these, QueST introduces a sophisticated framework for generating challenging synthetic coding problems through difficulty-aware training, addressing the critical bottleneck of high-quality training data for reasoning tasks.

ArXiv Daily Digest on 2025-10-16

Published:

Today’s research landscape showcases a significant push towards enhancing the efficiency and robustness of large language models (LLMs), with a strong emphasis on reinforcement learning (RL), multi-agent systems (MAS), and multilingual adaptation. A key trend is the retrofitting of smaller models to match or surpass the performance of their larger counterparts, as demonstrated by a 300M-parameter model achieving retrieval scores comparable to 7B models. Innovations in RL are particularly prominent, with novel frameworks like Reinforcement Learning with Supervised Reward (RLSR) reframing supervised fine-tuning (SFT) within an RL loop, and methods such as Last-Token Self-Rewarding (LaSeR) and Information Gain-based Policy Optimization (IGPO) introducing lightweight, intrinsic rewards to tackle reward sparsity in multi-turn agents. Furthermore, research is increasingly tackling the challenges of complex reasoning and subjective evaluation, evidenced by frameworks that distill MAS capabilities into single models and new benchmarks that reveal the limitations of current preference learning methods in capturing nuanced creative quality.

ArXiv Daily Digest on 2025-10-15

Published:

Today’s research landscape showcases significant advances in enhancing the reasoning and specialization of large language models (LLMs), with several papers focusing on structured reasoning frameworks like Chain-of-Thought (CoT) fine-tuning and Program-of-Thoughts (PoT). A notable trend is the use of evolutionary and multi-agent strategies to improve model performance: CoT-Evo applies evolutionary algorithms to distill high-quality reasoning traces for scientific domains, while EvoTest introduces a test-time learning framework where agents evolve their configurations across episodes. In parallel, methods like GatePro optimize Mixture-of-Experts (MoE) models by promoting expert diversity without additional parameters, and M²PO (Multi-Pair, Multi-Perspective Preference Optimization) refines preference learning for machine translation by integrating multi-perspective rewards. Industrial applications are also prominent, as seen in Meituan’s WOWService, which leverages multi-agent systems for scalable, real-world dialogue systems. Additionally, multilingual adaptation is advanced through sparse subnetwork fine-tuning, efficiently enhancing LLM capabilities for underrepresented languages.

ArXiv Daily Digest on 2025-10-14

Published:

Today’s research highlights an emerging focus on enhancing the reliability and equity of Large Language Models (LLMs) through introspection and infrastructure reform. A key theme is the drive to improve Retrieval-Augmented Generation (RAG) systems, with one study proposing CLEAR (Conflict-Localized and Enhanced Attention for RAG), a framework that uses hidden-state probing to detect and resolve knowledge conflicts for more faithful generation. Another paper tackles a fundamental bias in AI infrastructure, revealing systematic tokenization disparities that create computational and economic inequities for non-Latin and low-resource languages. Complementing these efforts to build more robust systems, a third work challenges the necessity of costly human annotations, introducing PARO (Pattern-Aware LLMs as Rationale AnnOtators), which shows that instilling correct reasoning patterns is more critical than the volume of human rationales for training LLMs on procedural tasks.

ArXiv Daily Digest on 2025-10-13

Published:

Today’s literature highlights significant advances in multi-agent systems and model optimization, with several papers exploring how Large Language Models (LLMs) can collaborate effectively. Notable developments include LLM×MapReduce-V3, which introduces a hierarchically modular agent system using the Model Context Protocol (MCP) for dynamic, human-in-the-loop survey generation, and StoryBox, a hybrid bottom-up framework where agents interact in a simulated environment to produce coherent, long-form narratives. In optimization, PerSyn (Personalized data Synthesis) proposes a “Route then Generate” paradigm for multi-teacher distillation, efficiently assigning prompts to optimal teachers based on student learnability. Meanwhile, Rollout Routing Replay (R3) addresses instability in Reinforcement Learning (RL) for Mixture-of-Experts (MoE) models by aligning training and inference routers, preventing catastrophic collapse. Another study focuses on mitigating memorization risks during fine-tuning using n-gram-based early stopping and regularization. Together, these works underscore a trend toward more modular, efficient, and stable AI systems capable of complex, collaborative tasks.

ArXiv Daily Digest on 2025-10-09

Published:

Today’s research highlights significant advancements in multi-agent systems and multilingual AI, revealing a clear trend toward collaborative intelligence and cross-lingual efficiency. A standout innovation is Guided Topology Diffusion (GTD), which dynamically generates optimized communication structures for multiple LLM agents, balancing performance with cost efficiency. In multilingual domains, Multilingual Generative Retrieval via Cross-lingual Semantic Compression (MGR-CSC) introduces a novel framework that unifies semantically equivalent keywords across languages into “atoms,” drastically reducing identifier space while improving retrieval accuracy. Meanwhile, WaltzRL refines safety alignment through multi-agent reinforcement learning, training a conversation agent and a feedback agent to collaboratively reduce unsafe outputs and overrefusals. These contributions underscore a broader movement toward more adaptive, resource-conscious, and robust AI systems.

ArXiv Daily Digest on 2025-10-08

Published:

Today’s research landscape showcases significant advances in multi-agent collaboration frameworks, with several papers proposing innovative approaches to enhance reasoning capabilities through structured interaction. The Double-Loop Multi-Agent (DLMA) framework introduces a bilevel optimization strategy where “professor” agents evolve research plans while “doctoral student” agents execute them, achieving state-of-the-art results in automated scientific research. Similarly, Self-Signals Driven Multi-LLM Debate (SID) leverages internal model confidence and attention patterns to optimize multi-agent debate efficiency, while ToolMem enhances multimodal agents with learnable capability memories for improved tool selection. In reinforcement learning, λ-GRPO addresses length bias in Group Relative Policy Optimization (GRPO) through adaptive token weighting, and the PiKa dataset demonstrates that expert-level synthetic data can achieve superior alignment with just 30k examples—dramatically improving data efficiency. These works collectively highlight a trend toward more sophisticated, efficient, and self-aware AI systems capable of complex, multi-step problem-solving.

ArXiv Daily Digest on 2025-10-07

Published:

Today’s research landscape showcases significant advancements in agentic systems and cross-lingual modeling, with a strong emphasis on memory architectures and optimization techniques. Several papers introduce novel frameworks for enhancing Large Language Model (LLM) capabilities: CAM (Constructivist Agentic Memory) draws from cognitive theory to build hierarchical memory structures for long-document comprehension, while AgentFlow introduces “in-the-flow” optimization using Flow-GRPO (Flow-based Group Refined Policy Optimization) to train planners within multi-turn agentic loops. Concurrently, ARM (Agentic Reasoning Modules) presents an evolutionary approach to discover specialized reasoning components, and Parallel Tokenizers proposes a new vocabulary alignment method to improve cross-lingual transfer in low-resource settings. These works collectively highlight a trend toward more modular, trainable, and cognitively-inspired agent architectures that demonstrate strong generalization and efficiency gains across diverse reasoning and multilingual tasks.

ArXiv Daily Digest on 2025-10-06

Published:

Today’s research landscape showcases exciting advancements in multi-agent systems and model optimization, with several papers exploring how Large Language Models (LLMs) can collaborate more effectively. The theme of multi-agent collaboration appears prominently across multiple studies, including frameworks like MARS (Multi-Agent System for Deep ReSearch) and MATPO (Multi-Agent Tool-Integrated Policy Optimization), which demonstrate how specialized agent roles can enhance complex reasoning tasks. Another significant trend involves improving training efficiency through innovative approaches to combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), with methods like MIFO (Mitigating Forgetting Between Supervised and Reinforcement Learning) showing remarkable data efficiency gains. In the multilingual domain, analysis of Mixture-of-Experts (MoE) architectures reveals fascinating routing patterns, while new benchmarks like LLM-Hanabi provide sophisticated ways to evaluate Theory-of-Mind (ToM) capabilities in collaborative settings. These developments collectively point toward more efficient, collaborative, and capable AI systems that better mimic human reasoning processes.