混合任务场景下基于大语言模型的动态检索增强生成

余传明; 李昊轩

文章摘要

余传明,李昊轩.混合任务场景下基于大语言模型的动态检索增强生成[J].数字图书馆论坛,2025,21(7):1~12

混合任务场景下基于大语言模型的动态检索增强生成

Dynamic Retrieval-Augmented Generation for Mixed Task Scenarios Based on Large Language Models

投稿时间：2025-07-17

DOI：10.3772/j.issn.1673-2286.2025.07.001

中文关键词: 大语言模型；检索增强生成；强化学习；多任务学习；智能信息处理

英文关键词: Large Language Model; Retrieval-Augmented Generation; Reinforcement Learning; Multi-Task Learning; Intelligent Information Processing

基金项目:本研究得到国家自然科学基金面上项目“面向跨语言观点摘要的领域知识表示与融合模型研究”（编号：71974202）、“基于知识增强的科技文献创新识别与评价模型研究”（编号：72374219）以及中南财经政法大学项目“多语言自然语言处理优质教学案例建设与应用”（编号：ALJS202520）资助。

作者	单位
余传明	中南财经政法大学信息工程学院
李昊轩	中南财经政法大学信息工程学院

摘要点击次数: 214

全文下载次数: 722

中文摘要:

针对大语言模型在多任务多语言场景下面临的跨领域知识整合效率不足与任务泛化能力受限的问题，提出了一种面向混合任务场景的动态检索式知识增强框架，以提升大语言模型的内容生成质量。其检索机制核心为一种基于强化学习的神经网络分类树模型，通过标签化树形结构，将异构知识库模块化映射至叶子节点，将最优知识库作为目标进行检索，并从中提取数据增强后的知识与模型结合，从而达到根据输入自适应地匹配最优外部知识的效果。实验设计从知识检索与增强生成两个维度展开：一方面在混合任务场景中评估检索精度；另一方面以日语文本摘要任务为例，在XL-Sum和WikiLingua两个公开数据集上进行性能提升的实证研究。实验结果表明，所提框架在24个数据集规模的混合任务场景下检索有效知识的准确率表现优秀，在摘要任务上的ROUGE指标相较于传统的检索增强方法有更显著的提升。所提出的框架具有较好的实用性和可扩展性，为大语言模型在混合任务场景下的适配提供了有效的解决方案。

英文摘要:

To address the problems of insufficient cross-domain knowledge integration efficiency and limited task generalization ability faced by large language models in multi-task and multi-language scenarios, this paper proposes a dynamic retrieval-based knowledge enhancement framework for mixed task scenarios to improve the content generation quality of large language models. A neural network classification tree model based on reinforcement learning is proposed. Through the labeled tree structure, the heterogeneous knowledge base is modularly mapped to the leaf nodes, the optimal knowledge base is retrieved as the target, and the data-enhanced knowledge is extracted from it and combined with the model, so as to achieve the effect of adaptively matching the optimal external knowledge according to the input. The experimental design is carried out from two dimensions: knowledge retrieval and enhanced generation. On the one hand, the retrieval accuracy is evaluated in the mixed task scenario. On the other hand, taking he Japanese text summarization task as an example, an empirical study of performance improvement is conducted on two public datasets, XL-Sum and WikiLingua. Experimental results show that the proposed framework has excellent accuracy in retrieving effective knowledge in mixed task scenarios with 24 datasets, and has a more significant improvement in the ROUGE indicators on summary tasks than traditional retrieval enhancement methods. The proposed framework has good practicality and scalability, and provides an effective solution for the adaptation of large language models in mixed task scenarios.

查看全文查看/发表评论下载PDF阅读器

关闭