| With the development of modern science and information technology,patent analysis plays an indispensable role in the measurement of technological competitiveness and forecast of tendency for technology,and provides valuable research results to industry,commerce,legal industry and academia.In patent analysis,it is a challenging task to extract patent features and measure similar patents efficiently and accurately.Deep learning(DL)has shown great advantages in Natural language processing(NLP),such as text classification and sentiment analysis,therefore,this thesis proposes a patent mining method based on DL,aiming at mining,extracting and quantifying features of patent texts,providing high-quality patent feature information for patent analysis tasks such as patent retrieval and similar patent measurement.In this thesis,a systematic feature extraction method for patent text is proposed.Firstly,a structured data set of patent texts is constructed based on the structure and the appearance features of patent texts.Secondly,the potential semantic structure of patent texts is extracted,dimensionality reduced and clustered using statistical models such as topic model.And the training set for DL is constructed based on the clustering results.Thirdly,a convolutional neural network(CNN)is constructed to extract and combine key information of patent texts by supervised learning,and then map the patent data from text to vectors in a vector space.Finally,a comparative experiment is conducted to measure the accuracy of patent text in different patent feature extraction methods.Moreover,order to improve semantic expression and feature extraction of the neural network model,the above CNN model is optimized by the use of various pre-trained word embedding model and a multi-layer classifier to improve the accuracy of the model and obtain a variety of feature vectors with different dimensions.On this basis,an improved recurrent and convolutional neural network(RCNN)based on text processing is constructed in this thesis,and k-max pooling is first used in the RCNN model.Finally,various experiments on different datasets and pre-trained word embedding models are conducted to evaluate the performances of the proposed method,which prove that k-max pooling has a better performance in patent text processing. |