Current location: Home > 英文 > News > Main text

Multiple Papers Accepted by Top-Tier Conferences ACL and EMNLP! DataCanvas Research Innovation Strength Gains International Recognition Once Again

Release date:2025.08.29 Browse:694 Font size:BigMediumSmall

Recently, four papers jointly completed by Professor Zhao Xins team from the Gaoling School of Artificial Intelligence at Renmin University of China and DataCanvas were successfully accepted by two top-tier academic conferences, ACL and EMNLP. The acceptance of this series of research achievements signifies that DataCanvas research results in the fields of algorithms and accessible AI computing power technology have gained widespread recognition from the international academic community.  

 

ACL, the Annual Meeting of the Association for Computational Linguistics, is one of the top-tier academic conferences in the fields of natural language processing (NLP) and computational linguistics. Its papers typically represent cutting-edge research achievements in the field. EMNLP (Conference on Empirical Methods in Natural Language Processing), as one of the most influential international conferences in the field of natural language processing (NLP), has played a significant role in promoting major technological breakthroughs such as pre-trained language models and machine translation.  

 

The series of research achievements accepted this time have made breakthrough progress in cutting-edge areas such as model reasoning, retrieval enhancement, and reinforcement learning, effectively addressing core pain points in the AI computing power industry.  

 

DataCanvas provides comprehensive artificial intelligence infrastructure, ample accessible AI computing power support, and toolchains, significantly shortening the innovation validation cycle for researchers. The company has built a trinity innovation ecosystem of "academia-technology-application" offering researchers end-to-end services from computing power support to industrial validation while feeding academic innovation through real-world application scenarios, forming a benign interaction mechanism between research and industry.  

 

It is reported that the multiple papers accepted from the joint team not only focus on breakthroughs in underlying AI foundational technologies but also propose innovative solutions to address pain points in the computing power industrys applications.  

 

The paper "YuLan-Mini: Pushing the Limits of Open Data-efficient Language Model" proposes a fully transparent training scheme. Validation based on the Alaya NeW Cloud platform shows: a 72% reduction in training stability loss and long-context support extended to 28K tokens. This efficient pre-training technology significantly lowers the computing power threshold for training large models with hundreds of billions of parameters.  

图片

Alaya NeW Cloud supports practical training of the papers methods


The paper "SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis" proposes the SimpleDeepSearcher framework. Experimental validation completed on the Alaya NeW Cloud platform addresses issues such as the lack of high-quality training trajectories and distribution mismatches in simulated environments through strategic data engineering (rather than complex training paradigms), achieving a 48.3% improvement in model performance with optimized GPU resources. It provides an accessible AI computing power solution for scenarios with limited computing resources, balancing computational efficiency and performance requirements.  

 

Addressing the challenges of inference efficiency and anti-interference capabilities in large language models (LLMs), DataCanvas and the joint research team propose a progressive information retrieval framework in the paper "CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability" balancing efficiency and accuracy in inclusive computing power scenarios, outperforming existing industry methods.  

 

The paper "R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning" innovatively proposes a two-stage reinforcement learning framework. Validation on the Alaya NeW Cloud platform shows that this framework supports a 30% reduction in retrieval times, can be seamlessly transferred to real-world search engine scenarios, and significantly reduces computing power costs.  

 

The aforementioned paper achievements effectively promote technological innovations such as improved asynchronous training computing power utilization and optimized reinforcement learning computational costs. Additionally, DataCanvas has conducted extensive and close research collaborations with top global universities such as Tsinghua University, Peking University, and Shanghai Jiao Tong University, laying a solid foundation for DataCanvas artificial intelligence infrastructure innovation system.  

 

Through continuous dedication to accessible AI computing power technology, DataCanvas has built a mature "inclusive computing power" technology system, successfully applied in various fields such as scientific research, intelligent manufacturing, and embodied intelligence. It has established solid technological barriers in key technical directions such as model inference, asynchronous training, retrieval enhancement, and reinforcement learning, continuously leading the development of accessible AI computing power industry in China.  

 

 

To download the series of papers

 

1. "CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability"  

Authors: Han Peng, Jinhao Jiang, Zican Dong, Xin Zhao, Lei Fang  

Download URL: https://arxiv.org/abs/2505.10063  

 

2. "Smart-Searcher: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning"  

Authors: Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Xin Zhao, Lei Fang, Ji-Rong Wen  

Download URL: https://arxiv.org/abs/2505.17005  

 

3. "SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis"  

Authors: Shuang Sun, Huatong Song, Yuhao Wang, Ruiyang Ren, Jinhao Jiang, Junjie Zhang, Fei Bai, Jia Deng, Xin Zhao, Zheng Liu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen  

Download URL: https://arxiv.org/abs/2505.16834  

 

4. "YuLan-Mini: Pushing the Limits of Open Data-efficient Language Model"  

Authors: Hu Yiwen, Huatong Song, Jie Chen, Jia Deng, jiapeng wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Zican Dong, Yang Lu, Xu Miao, Xin Zhao, Ji-Rong Wen  

Download URL: https://aclanthology.org/2025.acl-long.268/