CS 541 Lecture -*- Outline -*-

* Lexical (or micro-) syntax
** motivation is abstraction (working at the right level)
	want to describe syntax of language at a higher level than characters
		the higher level things are called tokens (words)
	also may want to have different sets of tokens
		(publication vs. working)

	but to do that have to define the syntax of tokens

        typically described as a regular language
		because very simple
		regularity guarantees not ambiguous
		very fast to parse.

** Lexical conventions
*** blanks, whitespace
		often used to separate tokens (words)
*** reserved words: if, then, etc. cannot be used as identifiers
*** keywords: (in Algol 60) are distinguished (by font, quotes etc)
*** keywords in context: keywords only recognized in certain contexts

/* PL/I example */
IF IF=THEN
THEN THEN=ELSE;
ELSE; IF=THEN=ELSE;


** Regular expressions (Watt, section 2.2.2)
	go quickly...
---------------
   Denotational semantics of
      REGULAR EXPRESSIONS

Syntax:
re ::= char
  | \epsilon
  | re re
  | re '|' re
  | re *
  | ( re )
  | re +


Semantics: for all c in char,
                   r1, r2 in re
M[[c]] = {"c"}
M[[\epsilon]] = {""}
M[[r1 r2]] = {st | s in M[[r1]],
                   t in M[[r2]]}
M[[r1 | r2]] = M[[r1]] \union  M[[r2]]
M[[r1*]] = {"", s,ss,...| s in M[[r1]]}
M[[(r1)]] = M[[r1]]
M[[r1+]] = M[[r1 (r1 *)]]
---------------
    Examples:
                ab denotes {"ab"}
                a|b denotes {"a", "b"}
                a* denotes {"", "a", "aa", ...}  (all sequences of a's)
                (ab)* denotes {"", "ab", "abab", "ababab", ...}
                (a|b)* denotes {"", "a", "b", "ab", "ba", "aaa", ...}
                (\epsilon|a)(a)*b denotes a*b  --many equivalent forms
                "(a|b|c|...|z|_|"")*"  -- string literals with "" for " inside

        Each regular language can be described by a regular expression
                (and vice versa).

	Precedence: * has highest priority, | lowest
		(Exercise, rewrite the grammar to show that)

** regular exps vs. regular grammars
	grammars help by giving names to things
		better for syntax-directed documentation.
	res closer to finite automata, and thus convenient for compilers

*** translation
	see Watt 2.2.3