| Virtual health communities are the popular venues for people to obtain useful information from others and exchange their experiences.However,much of the valuable health information is hidden in the unstructured documents and very difficult to provide medical assistance to patients and doctors.Existing studies mostly focus on information extraction from the electronic health records by Natural Language Processing technologies.The health information in virtual communities is usually ignored.However,virtual health communities have become a new kind of health information communication medium,which contain a lot of health knowledge,the extraction and discovery of these knowledge is of great significance in auxiliary medical decision.This study proposes a new method for recognizing health-related named entities and entity assertion recognition in virtual communities,which is an essential part of information extraction and knowledge discovery.A new medical dictionary is developed based on Chinese medical websites,and the Chinese Unified Medical Language System is also adopted to identify health concepts in virtual communities.Based on this,Latent Dirichlet Allocation(LDA)and rule-based methods are adopted to extract features from text,then we apply BIEO method to label features,finally Conditional Random Fields(CRF)is used to recognize health named entities and their type.In addition,this paper proposes to recognize entity assertion by using semantic rules,including deny assertion,time assertion and assertion for inspection.We extract the semantic rules of assertion through the analysis of the Chinese text and then recognize the entity assertion.Finally,the data in Chinesevirtual health community are used for experiments,the comparison with other methods prove the effectiveness of the method in this paper. |