CS 342 Lecture -*- Outline -*- * Lexical syntax typically described as a regular language (because very simple) ** Lexical conventions *** blanks, whitespace often used to separate tokens (words) *** reserved words: if, then, etc. cannot be used as identifiers *** keywords: (in Algol 60) are distinguished (by font, quotes etc) *** keywords in context: keywords only recognized in certain contexts ------------------- /* PL/I example */ IF IF=THEN THEN THEN=ELSE; ELSE; IF=THEN=ELSE; ------------------- ** Graphical notation for programs that recognize tokens do recognizer for identifiers ** Regular expressions --------------- Denotational description of regular expressions Syntax: ::= | \epsilon | | '|' | * | + | ( ) Semantics: c in , r1 r2 in M[[c]] = {"c"} M[[\epsilon]] = {""} M[[r1 r2]] = {st | s in M[[r1]], t in M[[r2]]} M[[r1 | r2]] = M[[r1]] \cup M[[r2]] M[[r1*]] = {"","s","ss",...| s in M[[r1]]} M[[r1+]] = M[[r1 (r1 *)]] M[[(r1)]] = M[[r1]] --------------- Examples: ab denotes {"ab"} a|b denotes {"a", "b"} a* denotes {"", "a", "aa", ...} (all sequences of a's) (ab)* denotes {"", "ab", "abab", "ababab", ...} (a|b)* denotes {"", "a", "b", "ab", "ba", "aaa", ...} (\epsilon|a)(a)*b denotes a*b --many equivalent forms "(a|b|c|...|z|_|"")*" -- string literals with "" for " inside Each regular language can be described by a regular expression (and vice versa). ** Regular grammars: BNF, no recursion, use Kleene star and + (nonstandard definition, a la MacLennan) (since no recursion, can recover regular expression) Generate regular languages (only) Example: ::= " * " ::= a | b | ... | z | | "" ::= _ *** Advantages can identify logical units, abstraction (vs. reg. exps) *** Limitations: cannot express matching structures (nesting, parentheses). For example: the language of ::= () | ( ) is not regular.