Home | Libraries | Author | Links |

[SysToMath Base C Library]

Collaboration diagram for Match: Pattern Matching with Regular Expressions:

The SysToMath Base C Library module Match provides abstract data types to perform pattern matching in character strings using regular expressions.

## Files | |

file | match.h |

Specification of data types and functions for pattern matching in strings by means of regular expressions. | |

## Modules | |

Match Test | |

Demonstration of the usage and tests of the implementation of pattern matching in strings by means of regular expressions. | |

Abstract Data Type StmRe | |

StmRe is an abstract data type used to represent regular expressions. |

`reStr`

serves the purpose of pattern matching in a character string `str`

and is a string with the following syntax in `yacc`

format (deviating from `yacc`

optional parts are included in brackets, informal descriptions of nonterminals are quoted with '"' characters and comments extending till the end of the line are preceded by two slash characters '//'): reStr // The regular expression reStr // matches, if : alt [ '||' reStr ] // alt matches, or else its regular // subexpression reStr matches. ; alt // The alternative alt matches, if : term [ '|' alt ] // term matches with at least as many // characters as alt or else alt // matches. ; term // The term term matches, if : qualifRe [ term ] // qualifRe and, if applicable,its // subterm term matches. ; qualifRe // The qualified regular expression // qualifRe matches, if there is a : optInverter // positive or inverse (see below) simpleRe // match of simpleRe optQualifier // q (see below) times. optActions // If so, perform action functions optGoback // and optionally go back just before // the part of str matched by qualifRe. ; optInverter : // Positive match: the corresponding // qualifRe matches, if the following // simpleRe matches q times. | '!' // Inverse match: the corresponding // qualifRe matches, if the following // simpleRe matches q times not (for // each of the q non matches one // character of str is consumed). ; simpleRe // The simple regular expression // simpleRe matches, if the regular : '(' reStr ')' // subexpression reStr matches, | '&' reName '&' // reStr reName matches (see below), | '[' group ']' // group matches (see below), | '%' escSeq // character escSeq matches, | '.' // arbitrary character matches, or | reChar // character reChar matches. ; optQualifier // Conditions for the effective // repetition factor q (see above): : // q = 1, | '?' // q = 0 or q = 1, | '+' // q >= 1, | '*' // q >= 0, or | '{' range '}' // q in range (see below). ; range // Effective repetition factor q: : minmax // q = minmax and q > 0, | min ',' // q >= min, or | min ',' max // min <= q and q <= max. ; optActions : | ':' actionNames ':' // If the corresponding qualifRe // matches, the action functions in // actionNames are executed. ; actionNames : actionName | actionNames ',' actionName ; optGoback : | ''' // If the corresponding qualifRe // matches, go back just before the // part of str matched by qualifRe. // This way for a part of str an and // connection (conjunction) between a // qualifRe and any following qualifRe // may be achieved. ; minmax min max : "unsigned non-negative decimal number not bigger than INT_MAX" ; reName : "one or more C identifiers separated by single '-' characters" // Name of a named regular expression // defined by function stmReCreate(). ; actionName : "one or more C identifiers separated by single '-' characters" // Name of a qualified regular // expression action defined by // function stmQreActDef(). ; group // group matches, if : [ '^' ] // optionally no character, else [ ('[' | '-') ] // one exactly from '[' or '-' or groupEnd // groupEnd matches. ; groupEnd : | groupElement groupEnd ; groupElement : groupChar [ '-' groupChar ] // matches, if groupChar matches or if // any character from the character // range groupChar '-' groupChar // matches. ; groupChar : '%' escSeq | greChar ; escSeq : 'n' // matches newline. | 't' // matches horizontal tab. | 'v' // matches vertical tab. | 'b' // matches backspace. | 'r' // matches carriage return. | 'f' // matches formfeed. | 'a' // matches audible alert. | 'x' hexDigit [ hexDigit ] // matches a hexadecimal number. | octDigit [ octDigit ] [ octDigit ] // matches an octal number. | "any character (except \0 for stmReMatch()) not mentioned above matches itself" ; hexDigit : "any hexadecimal digit fron the character ranges 0-9, a-f or A-F" ; octDigit : "any octal digit from the character range 0-7" ; greChar : "any reChar except [ and -" ; reChar : "any character except +?*|()[{&%.!:' (and \0 for stmReMatch())" ;

Only when using function stmReMatchSub() '\0' characters may be contained in the string `str`

to be matched, since using function stmReMatch() a '\0' character would be interpreted as end of string condition.

Generally a character or a character string from a regular expression is said to match, if it coincides with the next character(s) of the string `str`

to be matched after these have been transformed (see stmReCtrlCreate ()), if applicable.

The elements of the sequence of terms of a `reStr`

separated by '|' or '||' are numbered beginning with 0. The uniquely determined `term`

of this sequence causing `reStr's`

match is called its matching term and the number of this `term`

its matching term number.

The effective repetition factor `q`

of a `qualifRe`

specified by `optQualifier`

always is chosen maximal so that a possibly following `term`

with which `qualifRe`

forms another `term`

, matches.

The match length of the first `qualifRe`

of a `term`

always is chosen maximal so that the possibly following subterm of `term`

matches.

If the minimal value of the effective repetion factor `q`

of a qualified regular expression `qualifRe`

is 0 (which is the case for `optQualifier`

'?', '*', or '{0,max}'), `qualifRe`

is called potentially optional. If a term `term`

consists of all potentially optional qualified regular expressions also `term`

is called potentially optional. If an alternative `alt`

contains at least one potentially optional term also `alt`

is called potentially optional. Finally if a regular expression `reStr`

contains at least one potentially optional alternative, `reStr`

is called potentially optional.

Before a regular expression with the syntax `reStr`

can be used for pattern matching by means of the functions stmReMatch() or stmReMatchSub() it has to be named and defined by the function stmReCreate().

The regular expressions with the syntax `reStr`

referenceable by the construct '&'`reName`

'&' also have to be defined by the function stmReCreate().

The action functions for qualified regular expressions `qualifRe`

referenceable by the construct ':'`actionNames`

':' have to be defined by function stmQreActDef().

© Copyright Tom Michaelis 2002-2007

Distributed under the SysToMath Software License (See the accompanying file license.txt or a copy at www.SysToMath.com).