Font Size: a A A

Research Of Kazakh Parsing Based On Span

Posted on:2020-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:W ChaiFull Text:PDF
GTID:2415330590954692Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of neural network technology,parsing on Kazakh has made great progress.From the rule-based parsing and statistical-based parsing methods have gradually integrated to neural network technology,the accuracy of parsing on Kazakh improves a lot.Parsing technology divide into two main method: transition-based parsing and chart based parsing.In this paper,the methods the author used based on these two methods and the accuracy on Kazakh parsing improves.In this paper,the span is the minimal unit on the transition-based method.There are two main operations in the shift-reduce system: structure action and phrase label action.The structure label is storing the split points of the span,and the phrase label is mainly for the label the phrase tag of the span.Based on this transition system,using the Bi LSTM neural network to acquire the span feature,and training parameter trained with the multi-layer perceptron.We use dynamic programming,the greedy algorithm and the beam search to decode,and efficiencies are compared with each other.According to the experimental result,we can get the following conclusions:1)When using the Bi LSTM neural network to acquire the span feature,the two-layer Bi LSTM can acquire more feature information than the only one layer Bi LSTM.2)When the greedy algorithm is used in decode,the decode speed is faster,but the accuracy is not well.When using the beam search for decode,the parsing accuracy is well.3)Choosing the appropriate beam size on decode is very vital,the beam size will affect the accuracy of parsing.Through the experiment in this paper,we choose the beam size is 20.We also use the chart based parsing method,Bi LSTM neural network is used for feature acquiring.Multi-layer perceptron is used for training the parameter.The structure score and label score are separately to train,and the penalty function is set separately.The CKY algorithm is selected to decode.In the chart based parsing,the Bi LSTM hiddenlayer size influence the parsing accuracy and sentence length is also effect parsing.The following conclusions can be get through the experiments:1)when choosing the size of the hidden layer of Bi LSTM,the number of hidden layers increase,the parsing result also increase.But the number of hidden layers more than 200,the parsing accuracy improve is not obvious,so we choose the number of hidden layers is 200.2)Through experiments result,the length of the sentence also effect the parsing accuracy.Usually the longer sentence not have the good result.The main reason is that the long sentence have the complex phrase structure,so the parsing is difficult.
Keywords/Search Tags:span, BiLSTM, multi-layer perceptron
PDF Full Text Request
Related items