| Word segmentation is a basic task of Chinese language processing.The word is the smallest language unit that can be used independently.Chinese and English are different,which between English words it has the natural division of space,while it has the long with Chinese characteristics,if not word segmentation,the computer can not learn the exact boundaries of Chinese words,Chinese word has the more important tasks in helping Computer understand Chinese characters.In 2006,the concept of deep learning was proposed by researchers,and then it was applied to computer vision.Natural language processing and speech recognition and other fields,successfully achieved a lot of breakthrough progress,in which the recurrent neural networks have been widely used to solve Part-of-S-peech,translation,Named Entity Recognition and other related natural language processing problems.The problem of abstracting the majority of natural language processing problems into sequences is generated and processed by appropriate the recurrent neural network structure,which becomes the hotspot and mains-tream of current research.Chinese word segmentation is the prerequisite and basis in Chinese speech synthesis.Chinese word segmentation is the key technology in Chinese natural language processing.In natural language processing,the sequence annotation in the Chinese word segmentation has an extremely important application.The dominant state-of-the-art methods for Chinese word segmentation are based on traditional machine learning technology.However,there are some disadvantage in the traditional machine learning methods;artificially configuring and extracting features from Chinese texts.These methods don't make full use of context information to segment Chinese,and lack of long distance information constraints.In order to solve the above problems,the attention mechanism is added to the bi-direction long-term short memory(LSTM)memory unit,and LSTM is used to train the word segmentation model.The attention mechanism can use the information stored in the memory unit.Annotation set for Chinese word segmentation,you can get a text context information,to avoid the window on the context of the size of the restrictions.It can be effective to solve the gradient explosion and data sparse problems.The experimental results show that the algorithm can achieve 97.8% in Chinese word segmentation accuracy. |