Regular Expression Learning Method Based From Positive Examples

Posted on:2024-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Liu

Full Text:PDF

GTID:2568307055975009

Subject:Software engineering

Abstract/Summary:

Inferring regular expressions from finite examples is one of the classic problems in computer theory.Currently,most of the existing methods need to use automata or production to accomplish inductive learning,and the regular expressions inferred by such methods have the problems of being lengthy and poorly readable.In order to alleviate this problem,this thesis proposes a learning method which can directly infer regular expressions from positive examples.The method mainly includes two stages: candidate space construction and candidate space optimization.The former constructs a complete and diversified regular expression candidate space,while the latter infers concise and readable regular expressions based on the candidate space.The candidate space construction phase includes four regular expression generalization mechanisms,namely,multiscaling augmentation,cycle pattern discovery,multiple strings generalization,and multiple regular expressions generalization.For a single sample,multiscaling augmentation is generalized from the perspective of improving the level of character abstraction and expanding the range of contrast.The cycle pattern discovery completes the generalization by identifying the cycle pattern of strings and regular expressions in the sample.For multiple examples,multiple strings generalization and multiple regular expressions generalization generalize regular expressions based on common subsequence and common subexpression respectively,and design search methods based on coding cost and metacharacter respectively,to generalize regular expressions with stronger expression ability while improving search efficiency.In the candidate space optimization phase,combining the MML idea,the regular expression optimization problem is transformed into a multi-objective optimization problem.Two optimization schemes are selected to ensure the applicability of the learning method,namely,using integer linear programming for small-scale sample sets and using carnivorous plant algorithms for large-scale sample sets.Experimental results on different data sets show that the length of regular expressions generated by the regular expression learning method from positive samples is 50% to 70% shorter than that of similar methods,which proves the effectiveness of the method.

Keywords/Search Tags:

regular expression learning, regular inference, regular language, heuristic algorithm, integer programming

Related items

1	The Design And Implementation Of Regular Expression Engines Based On Deterministic Finite Automata
2	Research On Multi-dimensional Regular Expression Matching Algorithm For Network Security
3	Application Of Programming By Example In Text Search
4	GPU Based High Speed Regular Expression Matching Engine
5	Gpu Based High Speed Regular Expression Matching Engine
6	The Research And Implementation For The Matching System Of Regular Expression In Bgp Protocol
7	Research On Regular Expression Matching Algorithm
8	An Efficient Algorithm For TCAM-based Regular Expression Matching
9	Research On Regular Expression Matching Of Network Data Flow
10	An Optimization Algorithm For Matching Text With Regular Expressions