文章摘要
孟秋晴,郑铭瑞,田玥璐,刘逸品,王琼弟.面向在线健康社区UGC的医疗健康知识图谱构建研究——以小儿腹泻病为例[J].数字图书馆论坛,2024,20(8):9~18
面向在线健康社区UGC的医疗健康知识图谱构建研究——以小儿腹泻病为例
Construction of Medical Health Knowledge Map for UGC in Online Health Community: Taking Child Diarrheal Disease as an Example
投稿时间:2024-04-14  
DOI:10.3772/j.issn.1673-2286.2024.08.002
中文关键词: 知识图谱构建;在线健康社区;用户生成内容;LDA;知识抽取
英文关键词: Knowledge Map Construction; Online Health Community; UGC; LDA; Knowledge Extraction
基金项目:本研究得到贵州省科技厅科技计划“‘互联网+医疗’背景下基于用户特征挖掘的医疗资源推荐研究”(编号:黔科合基础-ZK[2021]一般336)、贵州省教育厅青年科技人才成长项目“基于知识图谱的在线医疗社区信息推荐研究”(编号:黔教合KY字[2022]192号)资助。
作者单位
孟秋晴 贵州财经大学信息学院 
郑铭瑞 贵州财经大学信息学院 
田玥璐 贵州财经大学信息学院 
刘逸品 贵州财经大学信息学院 
王琼弟 南京大学软件学院 
摘要点击次数: 219
全文下载次数: 200
中文摘要:
      构建面向在线健康社区用户生成内容(User Generated Content,UGC)数据的医疗健康知识图谱,探究基于用户潜在需求的健康知识抽取,对优化在线健康社区信息组织与检索,支撑在线健康社区知识服务创新具有重要意义。提出基于在线健康社区UGC数据的实体识别组合模型LDA-BERT-BiLSTM-CRF,首先利用LDA主题模型对在线健康社区UGC数据进行主题聚类分析从而提取实体类型,基于细分实体类型利用BERTBiLSTM-CRF模型进行命名实体识别;然后采用MC-BERT-CasRel模型抽取在线健康社区UGC数据中的重叠三元组,并通过SBERT模型实现实体对齐;最后利用Neo4j图数据库完成知识图谱的存储和可视化。以小儿腹泻病为例,基于所提方法最终构建包含939个实体和3 224个关系的小儿腹泻病知识图谱。与目前主流模型进行对比实验,结果表明,所采用的组合模型LDA-BERT-BiLSTM-CRF与关系抽取模型MC-BERT-CasRel较传统方法知识抽取更准确,实体分类也更具针对性。
英文摘要:
      It is of great significance to construct the medical health knowledge map oriented to the user generated content (UGC) data of online health community and explore the health knowledge extraction based on the potential needs of users to optimize the information organization and retrieval of online health community and support the knowledge service innovation of online health community. This paper proposes a combined entity recognition model LDABERT-BiLSTM-CRF based on UGC data of online health communities. We use the LDA topic model to perform thematic cluster analysis on UGC data of online health communities to extract entity types. Based on subdivision entity type, BERT-BiLSTM-CRF model is used to identify named entity. Then, MCBERT- CasRel model is used to extract overlapping triples from UGC data in online health communities. Entity alignment is realized by SBERT model. Finally, the storage and visualization of knowledge map are realized by using Neo4j graph database. Taking child diarrheal disease as an example, a knowledge map of child diarrheal disease containing 939 entities and 3 224 relationships is constructed based on this method. Compared with the current mainstream models, the results show that the combined model LDA-BERT-BiLSTM-CRF and the relationship extraction model MC-BERT-CasRel are more accurate than the traditional knowledge extraction methods, and the entity classification is more targeted.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮