Automatic Summarization is a process to reduce a document based on computer system in order to generate a summary. Text summarization is the part of the research area in computational intelligence, machine learning and natural language processing. This thesis presents an automatic Indonesian text summarization research. The study is focuses on extraction based summarization approach; sentence extraction and sentence reduction. We use two approaches to generate a summary, Na?ve Bayes for sentence extraction and Hidden Markov Model for sentence reduction. The initial step in summarization is identification of important features. Each document is prepared by pre-processing including sentence segmentation, part of speech tagging, tokenization, stop word removal, and stemming.Firs part of this work is generating summary by sentence extraction. To determine the weight of sentence, we use text features, such as sentence position, sentence relative length, average term frequency, keyword extraction, key phrase extraction, sentence similarity to the title, sentence centrality, inclusion of numerical data, inclusion of entity name, and inclusion of news emphasize words. We also investigate the effect of semantic feature, using latent semantic analysis, on the summarization task. Our experiments show that semantic feature increases precision and F-measure by 9.8% and 2.4% respectively in case of 20% Compression Rate.The second work in this thesis, we present an Indonesian sentence reduction for automatically removing extraneous phrases from sentences based on Hidden Markov Model and part-of-speech. The compression result should be grammatical, shorter and preserving the important information. The Hidden Markov Model model is used for removing redundant and irrelevant information. The part-of-speech tagging is used for adding a tag in the preprocessing step and create Indonesian grammar pattern. The experiment shows that using part-of-speech for grammar checking can improve system performance by 65.2%. |