CS 641 Lecture -*- Outline -*-

* Type reconstruction algorithm overview
	important as a review of compilation, abstract types

** Data Structures
***	environment (map from ids to types)
***	type expressions (type variables, type constructors)
		expressions may be represented by
			parenthesized strings, trees
				(exponentially large structures generated
					h(f(X), f(X))  vs.  h(f(g(a)), f(g(a)))
				        have to substitute g(a) for X twice)
			graph (dag) that allows sharing of subterms
***	stack of type variables (allows generic/non-generic distinction)
***	substitutions

** Operations
***	environment (expand, get value, ...)
***	type variables
		-generic?
		-equal?
		-copy (=new)
***	type constructors
		-equal?
		-arity?
***	type expressions
		-is it a variable?
		-get type constructor and arguments (if not variable)
		-apply substitution
		-occurs? (variable in type expression)
		-copy
		-unify (make types equal)
***	stack of type variables (push, pop)
***	substitutions
		-create (singleton)
		-compose
		-apply (to a variable)

** Unification
	skip the details unless asked

*** Specification

****	Definitions

*****		Terms, Term algebra
	V = set of variables
	Sigma = set of function symbols, not including "=", each with arity
			function symbols of arity 0 called constants
	Sigma and V are disjoint

	Term algebra (T_{Sigma}(X)) =
		v in V, constant in Sigma, f(t1,...,tn)
	Ground terms (Herbrand Universe, T_{Sigma}) =
		set of terms with no variables
		assume this set has >1 elements

	vars: syntactic thing -> set of variables in it

*****		Substitutions
	substitution = finite map of variables to terms
		such that s(x) ~= x for any variable x
		grounding subst is one that maps to ground terms

	extension to terms: s(t) = s(x) if t is variable x
				   a    if t is constant a
				   f(s(t1),...s(tn))  otherwise

	exercise: define composition of substitutions such that
		for all terms t, (s1 o s2) t = s1(s2(t))
		Q: does this make the term algebra into a category?
			no, get category with objects: sets of terms
				arrows: substitutions

*****		Unifiers, most general unifiers	
	terms t1 and t2 are *unifiable* iff there is a substitution s
		such that s(t1) is syntactically identical to s(t2)
		such an s is called a *unifier* for t1 and t2

	a substitution is a *unifier for an equation set* iff
		it unifies the terms of each equation

	a substitution s is *idempotent* iff s = (s o s)
		Lemma: iff domain(s) does not intersect vars(range(s))

	ordering s1 is more general than s2 if exists s3 s.t. s2 = (s3 o s1)

	s is a most general unifier (mgu) of a set of equations E
		iff whenever s1 is a unifier of E,
			then s is more general than s1.

	Lemma: If s is an idempotent mgu of E,
		then for any unifier s1 of E, s1 = (s1 o s)

****	the problem
		given an equation e between terms
			return an mgu for e,
			or fail (if there is no unifier for e).
		can recover solution to any variables of interest from mgu

****	basic algorithm
	------------------
	fun unify(VAR x,VAR y) =
		if Variable.equal(VAR x, VAR y) then Substitution.null
		else Substitution.singleton(VAR x, VAR y)
	| unify(x as TERM(f,args), VAR y) = unify(VAR y, x)
	| unify(VAR x, y as TERM(f,args)) =
		if Term.occurs(VAR x, y) then raise Failure
		else Substitution.singleton(VAR x, y)
	| unify(x as TERM(f,fargs), y as TERM(g,gargs)) =
		if not (Constructor.equal(f,g) orelse
			length(fargs) = length(gargs))
		then raise Failure
		else let fun unifyargs(t1::rest1, t2::rest2, sigma) =
			let val tau = unify(t1,t2)
			in let val sigma' = Substitution.compose(tau, sigma)
			   in if rest1 = []
				then sigma'
				else unifyargs(rest1,rest2,sigma')
			   end
			end
		     in
			unifyargs(fargs,gargs,Substitution.null)
		     end;
	------------------
	
** complexity
	Unification algorithms exist that
		run in time linear in size of terms to be unified
		is log-space complete for P,
			so hard to do in really small space

	But deciding whether an ML expression is well-typed is
		 exponential! (complete for deterministic exponential time)

	Why?
	
----------------
-fun pair x y = (fn z => z x y);
val pair = fn : 'a -> 'b -> ('a -> 'b -> 'c) -> 'c

- let fun x0 z = z
in
   let val x1 = pair x0 x0
   in
     let val x2 = pair x1 x1
     in
       x2
     end
   end
end;
val it = fn : (((('a -> 'a) -> ('b -> 'b) -> 'c) -> 'c)
	    -> ((('d -> 'd) -> ('e -> 'e) -> 'f) -> 'f) -> 'g) -> 'g
----------------
	The type includes Omega(2^{cn}) distinct type variables
		for an expression of size O(n).

----------------
-let fun x1 y = pair y y
in
    let fun x2 y = x1(x1(y))
    in
	x2 (fn z => z)
    end
end;
val it = fn : (((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b)
	    -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c
----------------
	The type has length Omega(2^{2^{cn}}) when printed as a string,
		or, as a graph has Omega(2^{cn}) nodes
		for an expression of size O(n).

	So (in essence), for a program of size O(n),
		the system of equations generated may have size O(2^n),
			even when represented compactly.