Sentence Extraction And Reduction For Indonesian Text Summarization

Posted on:2016-04-09

Degree:Master

Type:Thesis

Institution:University

Candidate:Ahmad Najibullah

Full Text:PDF

GTID:2308330470966802

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Automatic Summarization is a process to reduce a document based on computer system in order to generate a summary. Text summarization is the part of the research area in computational intelligence, machine learning and natural language processing. This thesis presents an automatic Indonesian text summarization research. The study is focuses on extraction based summarization approach; sentence extraction and sentence reduction. We use two approaches to generate a summary, Na?ve Bayes for sentence extraction and Hidden Markov Model for sentence reduction. The initial step in summarization is identification of important features. Each document is prepared by pre-processing including sentence segmentation, part of speech tagging, tokenization, stop word removal, and stemming.Firs part of this work is generating summary by sentence extraction. To determine the weight of sentence, we use text features, such as sentence position, sentence relative length, average term frequency, keyword extraction, key phrase extraction, sentence similarity to the title, sentence centrality, inclusion of numerical data, inclusion of entity name, and inclusion of news emphasize words. We also investigate the effect of semantic feature, using latent semantic analysis, on the summarization task. Our experiments show that semantic feature increases precision and F-measure by 9.8% and 2.4% respectively in case of 20% Compression Rate.The second work in this thesis, we present an Indonesian sentence reduction for automatically removing extraneous phrases from sentences based on Hidden Markov Model and part-of-speech. The compression result should be grammatical, shorter and preserving the important information. The Hidden Markov Model model is used for removing redundant and irrelevant information. The part-of-speech tagging is used for adding a tag in the preprocessing step and create Indonesian grammar pattern. The experiment shows that using part-of-speech for grammar checking can improve system performance by 65.2%.

Keywords/Search Tags:

Text summarization, Na?ve Bayes Classifier, Hidden Markov Model, text features, part-of-speech

PDF Full Text Request

Related items

1	Research On Problems Of Text-To-Speech System
2	The Research Of Part-of-speech Tagging Based On Hidden Markov Model
3	Research On Automatic Text Summarization Method Based On Deep Learning
4	Software Requirements Verification Based On Natural Language Processing
5	Algorithm Research For Text Information Extraction Based On Hidden Markov Model
6	HMM-based Chinese Part-of-Speech Tagging And Improvement
7	Research Of Web Text Mining Technology Based On Hidden Markov Model
8	Research On Statistical Parametric Speech Synthesis Integrating Speech Production Mechanisms
9	Research On Error Correction Technology Of Text Recognition Based On Hidden Markov Model
10	Research On Abstractive Short Text Automatic Summarization Method In Speech Recognition Scenarios