CS 342 Lecture -*- Outline -*-

* Lexical syntax
        typically described as a regular language (because very simple)

** Lexical conventions
***	blanks, whitespace
		often used to separate tokens (words)
***     reserved words: if, then, etc. cannot be used as identifiers
***     keywords: (in Algol 60) are distinguished (by font, quotes etc)
***      keywords in context: keywords only recognized in certain contexts
-------------------
/* PL/I example */
IF IF=THEN THEN THEN=ELSE; ELSE; IF=THEN=ELSE;
-------------------

** Graphical notation for programs that recognize tokens
        do recognizer for identifiers


** Regular expressions
---------------
Denotational description of regular expressions
    Syntax:
	<re> ::= <char>
	  | \epsilon
	  | <re> <re>
	  | <re> '|' <re>
	  | <re> *
	  | <re> +
	  | ( <re> )

    Semantics:
	 c in <char>, r1 r2 in <re>
	      M[[c]] = {"c"}
	      M[[\epsilon]] = {""}
	  M[[r1 r2]] = {st | s in M[[r1]], t in M[[r2]]}
	M[[r1 | r2]] = M[[r1]] \cup  M[[r2]]
	    M[[r1*]] =  {"","s","ss",...| s in M[[r1]]}
	    M[[r1+]] = M[[r1 (r1 *)]]
	   M[[(r1)]] = M[[r1]]
---------------
    Examples:
                ab denotes {"ab"}
                a|b denotes {"a", "b"}
                a* denotes {"", "a", "aa", ...}  (all sequences of a's)
                (ab)* denotes {"", "ab", "abab", "ababab", ...}
                (a|b)* denotes {"", "a", "b", "ab", "ba", "aaa", ...}
                (\epsilon|a)(a)*b denotes a*b  --many equivalent forms
                "(a|b|c|...|z|_|"")*"  -- string literals with "" for " inside

        Each regular language can be described by a regular expression
                (and vice versa).


** Regular grammars: BNF, no recursion, use Kleene star and +
	(nonstandard definition, a la MacLennan)
        (since no recursion, can recover regular expression)

        Generate regular languages (only)

        Example:
                <string> ::= " <string elem>* "
                <string elem> ::= a | b | ... | z | <blank> | ""
                <blank> ::= _

***	Advantages can identify logical units, abstraction (vs. reg. exps)


***	Limitations: cannot express matching structures (nesting, parentheses).
        For example: the language of <S> ::= () | ( <S> ) is not regular.