![]() |
Home | Libraries | Author | Links |
![]() |
The SysToMath Base C Library module Match provides abstract data types to perform pattern matching in character strings using regular expressions.
Files | |
file | match.h |
Specification of data types and functions for pattern matching in strings by means of regular expressions. | |
Modules | |
Match Test | |
Demonstration of the usage and tests of the implementation of pattern matching in strings by means of regular expressions. | |
Abstract Data Type StmRe | |
StmRe is an abstract data type used to represent regular expressions. |
reStr
serves the purpose of pattern matching in a character string str
and is a string with the following syntax in yacc
format (deviating from yacc
optional parts are included in brackets, informal descriptions of nonterminals are quoted with '"' characters and comments extending till the end of the line are preceded by two slash characters '//'): reStr // The regular expression reStr // matches, if : alt [ '||' reStr ] // alt matches, or else its regular // subexpression reStr matches. ; alt // The alternative alt matches, if : term [ '|' alt ] // term matches with at least as many // characters as alt or else alt // matches. ; term // The term term matches, if : qualifRe [ term ] // qualifRe and, if applicable,its // subterm term matches. ; qualifRe // The qualified regular expression // qualifRe matches, if there is a : optInverter // positive or inverse (see below) simpleRe // match of simpleRe optQualifier // q (see below) times. optActions // If so, perform action functions optGoback // and optionally go back just before // the part of str matched by qualifRe. ; optInverter : // Positive match: the corresponding // qualifRe matches, if the following // simpleRe matches q times. | '!' // Inverse match: the corresponding // qualifRe matches, if the following // simpleRe matches q times not (for // each of the q non matches one // character of str is consumed). ; simpleRe // The simple regular expression // simpleRe matches, if the regular : '(' reStr ')' // subexpression reStr matches, | '&' reName '&' // reStr reName matches (see below), | '[' group ']' // group matches (see below), | '%' escSeq // character escSeq matches, | '.' // arbitrary character matches, or | reChar // character reChar matches. ; optQualifier // Conditions for the effective // repetition factor q (see above): : // q = 1, | '?' // q = 0 or q = 1, | '+' // q >= 1, | '*' // q >= 0, or | '{' range '}' // q in range (see below). ; range // Effective repetition factor q: : minmax // q = minmax and q > 0, | min ',' // q >= min, or | min ',' max // min <= q and q <= max. ; optActions : | ':' actionNames ':' // If the corresponding qualifRe // matches, the action functions in // actionNames are executed. ; actionNames : actionName | actionNames ',' actionName ; optGoback : | ''' // If the corresponding qualifRe // matches, go back just before the // part of str matched by qualifRe. // This way for a part of str an and // connection (conjunction) between a // qualifRe and any following qualifRe // may be achieved. ; minmax min max : "unsigned non-negative decimal number not bigger than INT_MAX" ; reName : "one or more C identifiers separated by single '-' characters" // Name of a named regular expression // defined by function stmReCreate(). ; actionName : "one or more C identifiers separated by single '-' characters" // Name of a qualified regular // expression action defined by // function stmQreActDef(). ; group // group matches, if : [ '^' ] // optionally no character, else [ ('[' | '-') ] // one exactly from '[' or '-' or groupEnd // groupEnd matches. ; groupEnd : | groupElement groupEnd ; groupElement : groupChar [ '-' groupChar ] // matches, if groupChar matches or if // any character from the character // range groupChar '-' groupChar // matches. ; groupChar : '%' escSeq | greChar ; escSeq : 'n' // matches newline. | 't' // matches horizontal tab. | 'v' // matches vertical tab. | 'b' // matches backspace. | 'r' // matches carriage return. | 'f' // matches formfeed. | 'a' // matches audible alert. | 'x' hexDigit [ hexDigit ] // matches a hexadecimal number. | octDigit [ octDigit ] [ octDigit ] // matches an octal number. | "any character (except \0 for stmReMatch()) not mentioned above matches itself" ; hexDigit : "any hexadecimal digit fron the character ranges 0-9, a-f or A-F" ; octDigit : "any octal digit from the character range 0-7" ; greChar : "any reChar except [ and -" ; reChar : "any character except +?*|()[{&%.!:' (and \0 for stmReMatch())" ;
Only when using function stmReMatchSub() '\0' characters may be contained in the string str
to be matched, since using function stmReMatch() a '\0' character would be interpreted as end of string condition.
Generally a character or a character string from a regular expression is said to match, if it coincides with the next character(s) of the string str
to be matched after these have been transformed (see stmReCtrlCreate ()), if applicable.
The elements of the sequence of terms of a reStr
separated by '|' or '||' are numbered beginning with 0. The uniquely determined term
of this sequence causing reStr's
match is called its matching term and the number of this term
its matching term number.
The effective repetition factor q
of a qualifRe
specified by optQualifier
always is chosen maximal so that a possibly following term
with which qualifRe
forms another term
, matches.
The match length of the first qualifRe
of a term
always is chosen maximal so that the possibly following subterm of term
matches.
If the minimal value of the effective repetion factor q
of a qualified regular expression qualifRe
is 0 (which is the case for optQualifier
'?', '*', or '{0,max}'), qualifRe
is called potentially optional. If a term term
consists of all potentially optional qualified regular expressions also term
is called potentially optional. If an alternative alt
contains at least one potentially optional term also alt
is called potentially optional. Finally if a regular expression reStr
contains at least one potentially optional alternative, reStr
is called potentially optional.
Before a regular expression with the syntax reStr
can be used for pattern matching by means of the functions stmReMatch() or stmReMatchSub() it has to be named and defined by the function stmReCreate().
The regular expressions with the syntax reStr
referenceable by the construct '&'reName
'&' also have to be defined by the function stmReCreate().
The action functions for qualified regular expressions qualifRe
referenceable by the construct ':'actionNames
':' have to be defined by function stmQreActDef().
© Copyright Tom Michaelis 2002-2007
Distributed under the SysToMath Software License (See the accompanying file license.txt or a copy at www.SysToMath.com).