数图专题,程煜华,王建冬.基于自动分类的中文博客主题分布研究[J].数字图书馆论坛,2008,(12): |
基于自动分类的中文博客主题分布研究 |
A Research into the Topic Distribution of Chinese Weblogs Based on Automated Classification |
投稿时间:2008-10-30 修订日期:2008-10-30 |
DOI: |
中文关键词: 文本自动分类 博客 主题分布 |
英文关键词: Automated Text Classification; Weblogs; Topic Distribution |
基金项目: |
|
摘要点击次数: 1339 |
全文下载次数: 642 |
中文摘要: |
本文以和讯博客为研究对象,建设了专门用于中文博客文章分类的分类体系和语料库,并采用支持向量机(SVM)和信息增益(IG)结合的分类方法对中文博客文章进行了分类。在此基础上,本文对中文博客文章和分类结果进行深度挖掘,对中文博客的主题单一性以及主题之间的相关性进行了定量化描述,并对结果的社会学原因进行了阐释。 |
英文摘要: |
Based on the investigation of Hexun Blog, this article first establishes the classification structure and corpora for the automated text classification of Chinese weblogs, and then classifies Chinese weblog articles using a classification system combined SVM and IG. Furthermore, by digging deeper about the articles and their classification results, this thesis also analyzed the singleness of the topic distribution of Chinese blogs and the relativity between different topics, we also discusses the social reasons of the mining results. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |