COP 5021 meeting -*- Outline -*-

* Type and Effect Systems (1.6)

** setting
------------------------------------------
           SETTING

Type checking is usually syntax-directed
     and compositional

So can be implemented by:


------------------------------------------
   ... a recursive walk over the AST (of a program)

       e.g., each expression has a type,
             and the type is a function
             of the types of its subexpressions

** goals

   To fit program analysis into
     the syntax-directed machinery of type checking.
   To extend type systems with information about effects
     (or other semantic information).

** idea

   Q: What's the basic idea?
------------------------------------------
       TYPE AND EFFECT SYSTEMS (1.6)

Basic idea?


------------------------------------------
  ... Use types to express the analysis.
      Ignores control flow (conventionally)

   Two standard techniques:
      - annotated type systems
      - effect systems

** annotated type system example

*** type notation
------------------------------------------
     NATURAL DEDUCTION STYLE NOTATION

Type systems are written with rules
   of the form

         G |- f : T1 -> T,
           G |- e : T1
[e-rule] _________________   if C
          G |- f(e1) : T

[int c]    |- 0 : Int
------------------------------------------
      ... explain that:
            "[e-rule]" and "[int c]" are rule names,
            the horizontal line separates
               hypotheses (above) from the conclusion(s) below
            all (meta-)variable names used (like G, f, e, T1, T, C)
               are universally quantified over the entire rule
            ":" means "has type"
            "|-" (\vdash) means [the context to the left] proves
                 the conclusion to the right
            "G |- e : T" is called a judgment
            Other symbols are typically in the grammar of the language
                or the type system.
             G (\Gamma) is a type environment usually,
                which is a finite mapping from variable names to types
             e is an expression,
             T (often \tau) is a type
             C is a condition (written in mathematical notation),
               The rule can be used in a proof (only) when C holds
         Axioms, like [int c] can also be written just as conclusions,
                 with no hypotheses, but are often just written like this

         Conventionally, the rules only state what can be proven
            failure to find a proof means that the program is ill-typed
         We say that the proof gets "stuck" on type errors
         This assumes that there is only one way to do a proof
            (the system is deterministic or at least Church-Rosser,
             meaning it always comes to the same conclusion)

*** annotated type system example

   For taint analysis, we seek sets of variables that may
   have be tainted up to a program point.

------------------------------------------
 ANNOTATED TYPE SYSTEM FOR TAINT ANALYSIS

Types are sets of variables
   that may be tainted
   
[asg]  [x := a]^l : T1 -> T2
              if T2 = (T1 - {x}) \cup {x | FV(a) \cap T1 \neq {} }

[skip] [skip]^l : T -> T

       S1 : T1 -> T2,  S2: T2 -> T3
[seq]  ____________________________
             S1; S2 : T1 -> T3

       S1 : T1 -> T2,  S2: T1 -> T2
[if] ___________________________________
     if [b]^l then S1 else S2 : T1 -> T2

            S : T1 -> T1
[wh]  _____________________________
      while [b]^l then S : T1 -> T1

[re]  [read x]^l : T1 -> T2
                       if T2 = T1 \cup {x}

[sa]  [sanitize x]^l : T1 -> T2
                       if T2 = T1 - {x}

[pr]  [print x]^l : T1 -> T1
                       if x \not\in T1

      S : T2 -> T3
[sub] _____________  if T1 \subseteq T2,
      S : T1 -> T4      T3 \subseteq T4

------------------------------------------

        Q: What's the meaning of the basic types?
           set of variables that might be tainted

        Q: What's the meaning of the arrow types, like T1 -> T4?
           if the input flows to the statement have T1 as the set of
           possibly tainted variables, then at the end 
           the statement taints no more than T4
           (see the sub rule)

        Q: How would you explain the assignment rule?
        Q: How would you explain the if rule?
        Q: Why don't we have to check the type of the condition in an  if?
             the grammar does that and it doesn't affect the tainting
        Q: How would you explain the while rule?
        Q: How would you explain the sub rule?
           subsumption, says the meaning of arrow types (as above)
           Note: the subsumption rule is correct as written!

        Q: What would we need to change if we wanted to do an
            information flow security analysis instead of a taint analysis?
           Use the tainted variable tested in conditions
             to taint each assignment in the body (or bodies)

**** type checking
------------------------------------------
             EXAMPLE

  [y := 0]^1;
  [print y]^2;
  [read x]^3;
  while [x < 0]^4
  do ([y := y+1]^5;
      [print y]^6;
      [read x]^7);
  [z := x]^8

------------------------------------------

------------------------------------------
       TYPE CHECKING

Idea: accumulate constraints.

  [y := 0]^1: T1 -> T2  [asg]
  [print y]^2: T2 -> T3 [pr]
  ___________________________________ [seq]
  ([y := 0]^1;[print y]^2)
     : T1 -> T3
        if T2 = T1-{y}
        and y \not\in T2
        and T3 = T2


------------------------------------------
 ...
  ([y := 0]^1;[print y]^2): T1 -> T3 [seq]
  [read x]^3: T3 -> T4               [re]
  ___________________________________ [seq]
  ([y := 0]^1;[print y]^2; [read x]^3)
    : T1 -> T4
       if T2 = T1-{y}
       and y \not\in T2
       and T4 = T3 \cup {x}


  [y := y+1]^5 : T5 -> T6 [asg]
  [print y]^6 : T6 -> T7  [pr]
  ___________________________________ [seq]
  ([y := y+1]^5; [print y]^6) : T5 -> T7
     if T6 = (T5 - {y}) \cup ({y} \cap T5)
     and T7 = T6 and y \not\in T6


  ([y := y+1]^5;[print y]^6) : T5 -> T7 [seq]
  [read x]^7: T7 -> T8  [re]
  ___________________________________ [seq]
  ([y := y+1]^5; [print y]^6; [read x]^7) : T5 -> T8
     if T6 = (T5 - {y}) \cup ({y} \cap T5)
     and T7 = T6 and y \not\in T6
     and T8 = T7 \cup {x}

if we try to apply the [wh] rule, we see that we have to have the same
type (of form T -> T) for the body and the statement itself, but the
body has type T5 -> T8, so we need to use the sub rule...

 ([y := y+1]^5; [print y]^6; [read x]^7): T5 -> T8
 _________________________________________________ [sub]
 ([y := y+1]^5; [print y]^6; [read x]^7): T9 -> T9
       if T9 \subseteq T5
       and T8 \subseteq T9

     these constraints mean
        T9 \subseteq T5 
        T6 = (T5 - {y}) \cup ({y} \cap T5)
        T7 = T6 and y \not\in T6
        T8 = T7 \cup {x}
        T8 \subseteq T9

        
 ([y := y+1]^5; [print y]^6; [read x]^7): T9 -> T9 [sub]
  ___________________________________ [wh]
  while [x < 0]^3
  do ([y := y+1]^5; [print y]^6; [read x]^7) : T9 -> T9


  ([y := 0]^1;[print y]^2; [read x]^3) : T1 -> T4 [seq]
  while [x < 0]^3
  do ([y := y+1]^5; [print y]^6; [read x]^7) : T9 -> T9 [wh]
 _______________________________________________ [seq]
 ([y := 0]^1;[print y]^2; [read x]^3);
  while [x < 0]^3
  do ([y := y+1]^5; [print y]^6; [read x]^7) : T1 -> T9
    if T4 = T9

 ([y := 0]^1;[print y]^2; [read x]^3);
  while [x < 0]^3
  do ([y := y+1]^5; [print y]^6; [read x]^7) : T1 -> T9
 [z := x]^8 : T9 -> T10
 _______________________________________________ [seq]
 (([y := 0]^1;[print y]^2; [read x]^3);
  while [x < 0]^3
  do ([y := y+1]^5; [print y]^6; [read x]^7); 
  [z := x]^8) : T1 -> T10
    if T10 = T9-{z} \cup ({x} \cap T9)

       Q: So what are all the constraints?
------------------------------------------
             CONSTRAINTS

  T2 = T1-{y}
  y \not\in T2
  T3 = T2
  T4 = T3 \cup {x}
  T6 = (T5-{y}) \cup ({y} \cap T5)
  T7 = T6
  y \not\in T6
  T8 = T7 \cup {x}
  T9 \subseteq T5
  T8 \subseteq T9
  T4 = T9
  T10 = T9-{z} \cup ({x} \cap T9)

So, what's a solution?


------------------------------------------
     ...
          If T1 = {}, we have
          T2 = {}   (note: y \not\in T2 follows)
          T3 = {}
          T4 = {x}
          let's guess T5 = {x}, by analogy to data flow analysis
          T6 = {x}
          T7 = {x}
          T8 = {x}
          T9 = {x}

    Can solve constraints as previously.
    Can accumulate constraints in different orders.

*** annotated type constructors

    Ideas:
      - express modification to analysis results (deltas)
      - annotate the type constructors instead of using analysis
          results as the types.
      - simultaneously do definitely/possibly (must/may) analyses

------------------------------------------
            EXAMPLE

Judgments:
                    XMust
     S : Sigma -------------> Sigma
                    YMay

Where XMust and YMay are sets of variables
(that S must assign and may assign).

                           {x}
[asg]  [x := a]^l : Sigma ----> Sigma
                           {x}

                          {}
[skip] [skip]^l : Sigma -----> Sigma
                          {}

                    X1
       S1 : Sigma -----> Sigma,
                    Y1
                       X2
          S2 : Sigma -----> Sigma
                       Y2
[seq]  --------------------------------
                         X3
         S1; S2 : Sigma ----> Sigma
                         Y3
                if X3 = X1 \cup X2,
                   Y3 = Y1 \cup Y2

                    X1
       S1 : Sigma -----> Sigma,
                    Y1
                       X2
          S2 : Sigma -----> Sigma
                       Y2
[if]  --------------------------------
     if [b]^l then S1 else S2
                         X3
                : Sigma ----> Sigma
                         Y3
                if X3 = X1 \cap X2,
                   Y3 = Y1 \cup Y2

                      X
          S : Sigma -----> Sigma
                      Y
[wh]  --------------------------------
      while [b]^l then S 
                         {}
                : Sigma ----> Sigma
                         Y

                 X
      S : Sigma ----> Sigma
                 Y
[sub] ----------------------
                  X'
      S : Sigma -----> Sigma
                  Y'
           if X' \subseteq X,
              Y \subseteq Y'
------------------------------------------

    Q:  Why does the assignment rule make sense?
    Q:  How do you explain the if rule?
    Q:  How do you explain the while rule?
    Q:  How would you deal with assert statements?
    Q:  How would you deal with a try statement?

------------------------------------------
        TYPE CHECKING EXAMPLE
       TYPE CHECKING

Idea: accumulate constraints.

                    {q}
  [q := 0]^1: Sigma ---> Sigma , [asg]
                    {q}

                    {r}
  [r := x]^2: Sigma ---> Sigma [asg]
                    {r}
  _________________________________ [seq]
  ([q := 0]^1;[r := x]^2)
             {q,r}
     : Sigma ----> Sigma
             {q,r}

                      {r}
  [r := r-y]^4: Sigma ---> Sigma, [asg]
                      {r}

                      {q}
  [q := q+1]^5: Sigma ---> Sigma [asg]
                      {q}
  _________________________________ [seq]
  ([r := r-y]^4;[q := q+1]^5)
             {q,r}
     : Sigma ----> Sigma
             {q,r}
  ___________________________________[wh]
  while [r >= y]^3
  do ([r := r-y]^4;[q := q+1]^5)
              {}
     : Sigma ----> Sigma
             {q,r}

so by the seq rule, 


------------------------------------------

  ([q := 0]^1;[r := x]^2);
  while [r >= y]^3
  do ([r := r-y]^4;[q := q+1]^5)
             {q,r}
     : Sigma ----> Sigma
             {q,r}

And with seq again

  (([q := 0]^1;[r := x]^2);
  while [r >= y]^3
  do ([r := r-y]^4;[q := q+1]^5));
  assert [0<=r and r<y and q*y+r == x]^6
             {q,r}
     : Sigma ----> Sigma
             {q,r}

    Q:  How is this different than with annotated base types?
      It presents the changes (deltas) instead of the accumulations.
      This is "higher-order".

** effect systems

   These are similar to annotated base types, except that in
   higher-order languages, we have latent effects associated with
   functions.

*** example: call tracking analysis

    we want to know what functions may be called by a given expression
------------------------------------------
              EXAMPLE

Judgments:

   Gamma |- e : t & phi

 where Gamma : Var -> Type
       e : Expression
       t : Type
       phi : Effect
                              phi
       Type = int | bool | t1 ---> t2

       phi : Powerset(FunName)


[var] Gamma |- x : t & {}, if Gamma(x) = t

     Gamma[x |-> tx] |- e : t & phi
[fn] --------------------------------
     Gamma |- fn_pi x => e
                     phi2
               : tx ------> t & {}
            if phi2 = phi \cup {pi}

                       phi
      Gamma |- e1 : t2 ---> t & phi1,
      Gamma |- e2 : t2 & phi2
[app] --------------------------------
      Gamma |- e1 e2 : t & phi3
         if phi3 = phi1 \cup phi2 \cup phi
------------------------------------------

    Q:  What do the judgments mean?
    Q:  What do the arrow types mean?
    Q:  How would you explain the fn rule?
    Q:  How would you explain the app rule?

    One advantage of this is that "effect systems are often
    implemented as extensions of type inference algorithms."