Traditional Chinese Medicine(TCM)classics are the essence of TCM,and text vectorization is the basic work of TCM text processing tasks.High-quality,feature information-rich vector representation can guarantee the quality of entity recognition and other downstream tasks in the TCM field at the source,which is important for the intelligent learning and application of TCM texts.The deep pre-trained representation model BERT generates a vector representation rich in semantic and syntactic information by the superposition of multi-layer feature extractors,but the model has a large parameter size,which in turn leads to a large data size required.Meanwhile,although the shallow neural network model CBOW has a simple structure,it treats the words in a sentence equally,thus ignoring the semantic information represented by different components within the sentence and the word order information inherent to the sentence.To construct a lightweight word representation model with low computational complexity and to make the word vector retain rich feature information of TCM text,the main work is as follows:1)According to the "verb-core structure" theory,the language logic of Chinese medical texts is studied from two aspects of textual pattern and words.Using the fixed sentence structure with verbs as the core and different grammatical logic in sentences,nine sentence meaning representation rules with verbs as the core and classification criteria for words with different grammatical logic are formulated,thus enhancing the ability of word representation model to extract semantic features of TCM texts.2)To address the problem that the shallow word representation model CBOW is weak in extracting semantic and syntactic features of text,an enhanced word representation model based on semantic logic rules and incorporating syntactic logic such as lexicality and word order is proposed.For verb-centered words,the semantic information of the sentence is extracted by matching the sentence structure using the syntactic representation rules;For non-verb-centered words,the different semantic contributions of different words to the sentence meaning are used to strengthen the role of strong syntactic logical words in the word vector generation process.Then word order features are extracted by convolutional operations.Synonyms,antonyms and analogous word lists are introduced in the word vector generation stage to further enhance the characterization effect of word vectors on relevant semantic information.3)Several groups of experiments were conducted in both intrinsic similarity analysis and extrinsic quantitative comparison.The experimental results show that the proposed logic rule enhancement model achieved better results in both semantic similarity analysis and entity recognition tasks.In the entity recognition task,the F1 value is improved by4.66 percentage points compared to the traditional CBOW model.The model is a lightweight word representation model,which reduces the training time by 51%compared to the BERT,and has more advantages in terms of resource usage.Figure 26;Table 16;Reference 63... |