Font Size: a A A

Regular Expression Learning Method Based From Positive Examples

Posted on:2024-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:D Q LiuFull Text:PDF
GTID:2568307055975009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Inferring regular expressions from finite examples is one of the classic problems in computer theory.Currently,most of the existing methods need to use automata or production to accomplish inductive learning,and the regular expressions inferred by such methods have the problems of being lengthy and poorly readable.In order to alleviate this problem,this thesis proposes a learning method which can directly infer regular expressions from positive examples.The method mainly includes two stages: candidate space construction and candidate space optimization.The former constructs a complete and diversified regular expression candidate space,while the latter infers concise and readable regular expressions based on the candidate space.The candidate space construction phase includes four regular expression generalization mechanisms,namely,multiscaling augmentation,cycle pattern discovery,multiple strings generalization,and multiple regular expressions generalization.For a single sample,multiscaling augmentation is generalized from the perspective of improving the level of character abstraction and expanding the range of contrast.The cycle pattern discovery completes the generalization by identifying the cycle pattern of strings and regular expressions in the sample.For multiple examples,multiple strings generalization and multiple regular expressions generalization generalize regular expressions based on common subsequence and common subexpression respectively,and design search methods based on coding cost and metacharacter respectively,to generalize regular expressions with stronger expression ability while improving search efficiency.In the candidate space optimization phase,combining the MML idea,the regular expression optimization problem is transformed into a multi-objective optimization problem.Two optimization schemes are selected to ensure the applicability of the learning method,namely,using integer linear programming for small-scale sample sets and using carnivorous plant algorithms for large-scale sample sets.Experimental results on different data sets show that the length of regular expressions generated by the regular expression learning method from positive samples is 50% to 70% shorter than that of similar methods,which proves the effectiveness of the method.
Keywords/Search Tags:regular expression learning, regular inference, regular language, heuristic algorithm, integer programming
PDF Full Text Request
Related items