Research on Railway Field Topic Discovery Based on Improved LDA Model
中文关键词: 主题发现;铁路领域;语义增强;LDA主题模型
英文关键词: Topic Discovery; Railway Field; Semantic Enhancement; LDA Topic Model
龙艺璇 中国铁道科学研究院科学技术信息研究所 
安源 中国铁道科学研究院科学技术信息研究所 
王东晋 中国铁道科学研究院科学技术信息研究所 
翟夏普 中国铁道科学研究院科学技术信息研究所 
伊惠芳 中国科学院文献情报中心 
摘要点击次数: 1274
全文下载次数: 1085
      The era of big data has brought difficulties for researchers in the railway field to quickly select the main research directions, obtain international research trends, and understand international research hotspots. Efficiently excavating the main content contained in the massive scientific and technological literature in the railway field has become an important problem to be solved urgently by researchers in the railway field. In view of the fact that the topic model represented by LDA is used as the mainstream method for topic discovery, there is a problem of limited use in the face of scientific and technological literature in the railway field with many multi-word phrases. In this study, we innovatively propose a semantic enhanced LDA topic model. On the basis of in-depth preprocessing of extracting nouns phrases, verb phrases, nouns and verbs, we combine TextRank algorithm and PMI algorithm to obtain keyword chunks. We use the sorted keyword chunks to replace the topic words in the LDA topic recognition results. In this study, we conduct an empirical study on the “traction power supply system” as an example. The results show that the semantic enhanced LDA topic model proposed in this paper can help to improve the interpretability and recognizability of topic discovery results in the railway field. In addition, it can also provide effective method support for the realization of knowledge services in scientific research management in the railway field.
查看全文   查看/发表评论  下载PDF阅读器
