Com S 641 Lecture -*- Outline -*-

* Theoretical Properties (2.2)

  this section uses structural operational semantics
  to show that the live variables analysis is correct.

  Q:  Why are formal correctness proofs useful for program analysis?
  a way to find subtle errors and correct them

** kinds of semantics
------------------------------------------
      KINDS OF SEMANTICS

Axiomatic

  Semantics as declarative specifications
     specifications + proof techniques

Denotational

  Semantics as functional programming
     domains + meaning functions

Operational

  Semantics as logic programming
     configurations + rewrite rules

------------------------------------------

*** denotational example
------------------------------------------
   DENOTATIONAL SEMANTICS OF EXPRESSIONS
           FOR WHILE LANGUAGE

Syntax (1.2):

  x,y \in Var
  n \in NumericLiteral
  a \in AExp
  b \in BExp
  op_a \in Op_a
  op_b \in Op_b
  op_r \in Op_r

  a ::= x | n | a1 op_a a2
  b ::= true | false | not b | b1 opb b2
        | a1 op_r a2

Domains:
        Z = { ..., -1, 0, 1, ... }
        T = { true, false }
  s \in State = Var -> Z

Meaning Functions:

  A : Aexp -> State -> Z

  A[[x]]s = s(x)
  A[[n]]s = N(n)
  A[[a1 op_a a2]]s
     = (O_a[[op_a]]) (A[[a1]]s) (A[[a2]]s)

where

  N : NumericLiteral -> Z
  N[[n]] = the value of literal n

  O_a : Op_a -> (Z x Z -> Z)

  O_a[[+]] = \n1 n2 . n1 + n2
  ...

------------------------------------------

        E.g., in C or C++, N[[015]] = 13, since 015 is in octal.

     Q:  Why is the state a parameter to the meaning of arithmetic
         expressions?
         it's needed to evaluate variable references

     Q: How would you define B : Bexp -> (State -> T)?


** structural operational semantics (2.2.1)

*** varieties
------------------------------------------
              VARIATIONS
 ON STRUCTURAL OPERATIONAL SEMANTICS

Big step = map from initial to final state

Small step = map from a state to next

------------------------------------------

*** terminal transition system, little step, computation semantics
	terminal transition system (Plotkin),
	little step (C. Gunter), computation (M. Hennessy)

	essentially using rewriting as a universal machine

**** idea
	To give the semantics of a programming language,
		use two auxiliary functions: input and output
------------------------------------------
   COMPUTATION (LITTLE STEP) SEMANTICS

	       Meaning
     Programs <-------> Answers
       |                 ^
 input |                 | output
       |                 |
       v     reducesto*  |
     Gamma <-----------> T


def: a in Meaning[[P]] iff


------------------------------------------
	the Meaning relation is often partial,
		and defined by going around the diagram

        ... there is a g in T
           such that input[[P]] reducesto* g
                 and output(g) = a.

	Meaning[[P]] is a function
		when reducesto* is Church-Rosser (confluent)

**** terminal transition systems
	this is the guts of the system,
	defining the abstract machine by defining --> (reducesto)
------------------------------------------
    TERMINAL TRANSITION SYSTEM (TTS)

(Gamma, -->, Terminal)

Gamma : 

--> :

Terminal : 


reducesto*
	reflexive, transitive closure
------------------------------------------
        ... a set of configurations (e.g., g)

	i.e., configurations of the abstract machine

        ... a binary relation on Gamma

	--> is sometimes written as ==> or reducesto

        ... subset of terminal configurations (Plotkin called it T),
          must be such that
          if g in Terminal,
          then there is no g'
             such that (g --> g')

*** semantics of the while language (2.2.1)

------------------------------------------
      CONFIGURATIONS AND TRANSITIONS

Configurations (Gamma):

     Gamma = (Stmt x State) + State
s in State = Var -> Z

Terminal Configurations:

    Terminal = State
   
------------------------------------------

    Q:  What does a configuration of the form (S, s) mean?
        statement S is executing and
        state s is the current state

    Q:  What does a configuration of the form s mean?
        state s is a terminal (final) state

    Q:  Must every execution reach a final state?
        not in the while language...

    Note that we're ignoring errors like division by zero...

    Q:  What part of the state does an expression depend on?
        only the values of its free variables

        Lemma: Let a \in AExp be given.
        If (\forall x \in FV(a) :: s1(x) = s2(x))
        then A[[a]]s1 = A[[a]]s2.

        The proof is by structural induction

------------------------------------------
     TRANSITIONS (Table 2.6)

[ass] ([x := a]^l, s) --> s[x |-> A[[a]]s]

[skip] ([skip]^l, s) --> s

          (S1, s) --> (S1', s')
[seq1] --------------------------
       (S1;S2, s) --> (S1';S2, s')

          (S1, s) --> s'
[seq2] --------------------------
       (S1;S2, s) --> (S2, s')

[if1] (if [b]^l then S1 else S2, s)
         --> (S1, s)
                       if B[[b]]s = true

[if2] (if [b]^l then S1 else S2, s)
         --> (S2, s)
                       if B[[b]]s = false


[wh1] (while [b]^l do S, s)
         --> (S; while [b]^l do S, s)
                       if B[[b]]s = true

[wh2] (while [b]^l do S, s)
         --> s
                       if B[[b]]s = false

------------------------------------------

        Q:  What do the sequence rules do?
        Q:  Do the rules allow evaluation of the true or false part of
            an if-statement before evaluating the condition?
            no

------------------------------------------
           EXAMPLE

Let sqrxy be the state in which
    q has value q, ..., y has value y
E.g, s7062(q) = 7, s0062(x) = 6.

  <[q := 0]^1; [r := x]^2;
   while [r >= y]^3 do ([r := r-y]^4;
                        [q := q+1]^5),
   s7062>
--> {by [seq2] and [ass]}
  <[r := x]^2;
   while [r >= y]^3 do ([r := r-y]^4;
                        [q := q+1]^5),
   s0062>
--> {by [seq2] and [ass]}
  <while [r >= y]^3 do ([r := r-y]^4;
                        [q := q+1]^5),
   s0662>
--> 


------------------------------------------

        Q:  How are the labels used in the semantics?

        Q:  What would be a rule for a one-armed if-statement?
        Q:  How would you add a rule for for loops?

*** properties of the semantics

    To try to prove something about an analysis based on the semantics
    we have to be able to relate the flow graph and the
    configurations.

    Q:  As the configurations evolve, does the flow graph for the
    statement in the configuration change?  If so, how?

------------------------------------------
        PROPERTIES OF THE SEMANTICS

How does the the flow graph change
as the configurations change?

Case 1:  <S,s> --> <S',s'>

  Compare           vs.
           final(S)    final(S')
           flow(S)     flow(S')
           blocks(S)   blocks(S')

Case 2:  <S,s> --> s'

  what can we say about the graph of S?

------------------------------------------
      ... \supseteq
      ... \supseteq
      ... \supseteq  (a homework problem)

      ... final(S) = {init(S)}  (S is an elementary block)

      This is all in lemma 2.1.4

** correctness of the live variables analysis (2.2.2)

*** failed attempt
------------------------------------------
          EQUATION SYSTEM

LV^=(S*) defined by:

LVexit(l) = 
  if l \in final(S*) then {}
  else \bigcup { LVentry(l') |
                 (l', l) \in flow^R(S*) }
LVentry(l) =
 (LVexit(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

Functional form, for a given S*:

F^S*_LV: ({entry,exit} -> Lab* -> P(Var*))
      -> ({entry,exit} -> Lab* -> P(Var*))

F^S*_LV(F)(exit)(l) = 
  if l \in final(S*) then {}
  else \bigcup { F(entry)(l') |
                 (l', l) \in flow^R(S*) }
F^S*_LV(F)(entry)(l) =
 (F(exit)(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

Solutions:

  live: {entry,exit} -> Lab* -> P(Var*)

def: live solves LV^=(S*),
     written live |= LV^=(S*),
     iff live is a fixpoint of F^S*_LV.

------------------------------------------

        Book uses F^S_LV instead of F^S*_LV
             doesn't require live be a function of entry and exit

        Q: Why use functions instead of tuples for solutions?
           less messy notation, don't have to assume labels ordered

------------------------------------------
              EXAMPLE

Q: What is F^S*_LV for:

      [x := 3]^1;
      [y := x+2]^2;
      [y := y+1]^3


------------------------------------------

     ... calculate the following

F^S*_LV(F)(exit)(3) = {}
F^S*_LV(F)(entry)(3) =
 (F(exit)(3) \ {y}) \cup {y}

F^S*_LV(F)(exit)(2) = F(entry)(3)
F^S*_LV(F)(entry)(2) =
 (F(exit)(2) \ {y}) \cup {x}

F^S*_LV(F)(exit)(1) = F(entry)(2)
F^S*_LV(F)(entry)(1) =
 (F(exit)(l) \ {x}) \cup {}

   Q: What do we want to prove for correctness?

   Have to compare the static analysis
    vs. "the truth" of the operational semantics.

------------------------------------------
   DEFINING LIVE VARIABLES SEMANTICALLY

What should a solution to LV mean?

  At a given point,
  only the live variables matter.

def: States s1 and s2 are similar with
     respect to a set of variables V,
     written s1 ~_V s2, iff
      (\forall x \in V :: s1(x) = s2(x)).

Conjecture: Let S be label consistent.
Suppose live |= LV^=(S) and
        s1 ~_{live(entry)(init(S))} s2.
Then:

(i) if (S, s1) --> (S', s1'),
    then there is some s2' such that
        (S, s2) --> (S', s2')
    and s1' ~_{live(entry)(init(S'))} s2'.

(i) if (S, s1) --> s1'
    then there is some s2' such that
        (S, s2) --> s2'
    and s1' ~_{live(exit)(init(S))} s2'.

------------------------------------------

     Q: What happens to ~_V as V shrinks?
        it relates more and more states.

  Antimonotonicity Lemma: Suppose V1 \supseteq V2.
      Then ~_V1 \subseteq ~_V2.  That is, s1 ~_V1 s2 ==> s1 ~_V2 s2.

      The antimonotonicity lemma is essentially similar to Lemma 2.20.

     Q: How would we prove the theorem?

     Try by induction on the inference used to establish the
     semantics, see why this fails.

     Proof attempt: by induction on the inferences uses to establish
     (S, s1) --> (S', s1') or (S, s1) --> s1'.
     Let live |= LV^=(S) and let s1 and s2 be such that

        s1 ~_{live(entry)(init(S))} s2      (1)

     Base cases (can skip): 

     Suppose the rule applied was [ass].
     Then S = [x:=a]^l, and by [ass]

          ([x:=a]^l, s1) --> s1[x|->A[[a]]s1]
          ([x:=a]^l, s2) --> s2[x|->A[[a]]s2]

     So a choice for s2' is s2[x|->A[[a]]s2].  Let's see if that is
     related as desired by calculation.

         s1 ~_{live(entry)(init(S))} s2 
        = <by init(S) = l>
         s1 ~_{live(entry)(l)} s2 
        = <by definition of LV analysis and live is a solution>
         s1 ~_{(live(exit)(l) \ {x}) \cup FV(a)} s2 
      ==> <by definition of store extension and ~>
         s1[x|->A[[a]]s1] ~_{(live(exit)(l) \cup {x} FV(a)} s2[x|->A[[a]]s2]
      ==> <by set theory and antimonotonicity lemma>
         s1[x|->A[[a]]s1] ~_{live(exit)(l)} s2[x|->A[[a]]s2]
        = <by init(S) = l>
         s1[x|->A[[a]]s1] ~_{live(exit)(init(S))} s2[x|->A[[a]]s2]

     (end of [ass] case)

     Suppose the rule applied was [skip].
     Then S = [skip]^l, and by [skip]

          ([skip]^l, s1) --> s1
          ([skip]^l, s2) --> s2

     So a choice for s2' is s2, which works because:

         s1 ~_{live(entry)(init(S))} s2
       = <by definition of init>
         s1 ~_{live(entry)(l)} s2
       = <by LV analysis and live is a solution>
         s1 ~_{live(exit)(l)} s2
       = <by definition of init>
         s1 ~_{live(exit)(init(S))} s2

     (end of skip case)

     The inductive hypothesis is that for all substatements Si,
     if live |= LV^=(Si) and s1 ~_{live(entry)(init(Si))} s2,
     then (i) and (ii) follow with Si for S.

     Inductive cases:

     Suppose the rule applied was [seq1].
     Then S = S1; S2 and by [seq1]

           (S1, s1) --> (S1', s1')
        ____________________________
       (S1;S2, s1) --> (S1';S2, s1')

     We need to exercise the inductive hypothesis, so we must show
     that live |= LV^=(S1;S2) implies live |= LV^=(S1)
     (so we can find an s2'...).

     But this isn't true.  

     Counterexample:

         Let S1;S2 be [x := 3]^1; [y := x]^2
         Then if live |= LV^=(S1;S2), it must have live(exit)(1) = {x}.
         But live does not model LV^=(S1), since live(exit)(1) = {x}
         and by the LV analysis, LVexit(1) = {} when l \in final(S1).

     (end of counterexample)

     (end of proof attempt)

*** fix using constraint system
------------------------------------------
           CONSTRAINT SYSTEM

LV^{\subseteq}(S*) defined by:

LVexit(l) \supseteq 
  if l \in final(S*) then {}
  else \bigcup { LVentry(l') |
                 (l', l) \in flow^R(S*) }
LVentry(l) \supseteq 
 (LVexit(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

Functional form, for a given S*, as above

F^S*_LV: ({entry,exit} -> Lab* -> P(Var*))
      -> ({entry,exit} -> Lab* -> P(Var*))

F^S*_LV(F)(exit)(l) =
  if l \in final(S*) then {}
  else \bigcup { F(entry)(l') |
                 (l', l) \in flow^R(S*) }
F^S*_LV(F)(entry)(l) =
 (F(exit)(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

Solutions:

  live: {entry,exit} -> Lab* -> P(Var*)

def: live solves LV^{\subseteq}(S*),
     written live |= LV^{\subseteq}(S*),
     iff live \sqsupseteq F^S*_LV(live).

Lemma 2.15: If S* is label consistent,
 then live |= LV^=(S*)
      implies
      live |= LV^{\subseteq}(S*).
Furthermore, the least solutions coincide.
------------------------------------------

     Q: What lemma would give us the property where we got stuck?

------------------------------------------
    SOLUTIONS WORK FOR SUBSTATEMENTS

Lemma 2.16:
Suppose S1 is label consistent,
and live |= LV^{\subseteq}(S1).
If flow(S1) \supseteq flow(S2),
   blocks(S1) \supseteq blocks(S2),
then S2 is label consistent and
     live |= LV^{\subseteq}(S2).

------------------------------------------

    Q: Why is this true?

------------------------------------------
        SOLUTIONS PRESERVED

Corollary 2.17:
Suppose S is label consistent,
and live |= LV^{\subseteq}(S).
If (S,s) --> (S',s'),
then live |= LV^{\subseteq}(S').

------------------------------------------

     Q: Why does that follow?
        using facts about --> (lemma 2.14)
     
------------------------------------------
     SOLUTION CAN ONLY SHRINK FORWARD

Lemma 2.18:
Suppose S is label consistent,
and live |= LV^{\subseteq}(S).
Then for all (l,l') \in flow(S),

  live(exit)(l) \supseteq live(entry)(l').

Proof: construction of LV^{\subseteq}(S).
------------------------------------------

------------------------------------------
        GETTING CORRECTNESS RIGHT

Theorem 2.21: Let S be label consistent.
Suppose live |= LV^{\subseteq}(S) and
        s1 ~_{live(entry)(init(S))} s2.
Then:

(i) if (S, s1) --> (S', s1'),
    then there is some s2' such that
        (S, s2) --> (S', s2')
    and s1' ~_{live(entry)(init(S'))} s2'.

(i) if (S, s1) --> s1'
    then there is some s2' such that
        (S, s2) --> s2'
    and s1' ~_{live(exit)(init(S))} s2'.

------------------------------------------
     
     Proof: by induction on the inferences uses to establish
     (S, s1) --> (S', s1') or (S, s1) --> s1'.
     Let live |= LV^{\subseteq}(S) and let s1 and s2 be such that

        s1 ~_{live(entry)(init(S))} s2      (1)

     Base cases (can skip): 

     Suppose the rule applied was [ass].
     Then S = [x:=a]^l, and by [ass]

          ([x:=a]^l, s1) --> s1[x|->A[[a]]s1]
          ([x:=a]^l, s2) --> s2[x|->A[[a]]s2]

     So a choice for s2' is s2[x|->A[[a]]s2].  Let's see if that is
     related as desired by calculation.

         s1 ~_{live(entry)(init(S))} s2 
        = <by init(S) = l>
         s1 ~_{live(entry)(l)} s2 
      ==> <by live is a solution and antimonotonicity lemma (different)>
         s1 ~_{(live(exit)(l) \ {x}) \cup FV(a)} s2 
      ==> <by definition of store extension and ~>
         s1[x|->A[[a]]s1] ~_{(live(exit)(l) \cup {x} FV(a)} s2[x|->A[[a]]s2]
      ==> <by set theory and antimonotonicity lemma>
         s1[x|->A[[a]]s1] ~_{live(exit)(l)} s2[x|->A[[a]]s2]
        = <by init(S) = l>
         s1[x|->A[[a]]s1] ~_{live(exit)(init(S))} s2[x|->A[[a]]s2]

     (end of [ass] case)

     Suppose the rule applied was [skip].
     Then S = [skip]^l, and by [skip]

          ([skip]^l, s1) --> s1
          ([skip]^l, s2) --> s2

     So a choice for s2' is s2, which works because:

         s1 ~_{live(entry)(init(S))} s2
       = <by definition of init>
         s1 ~_{live(entry)(l)} s2
     ==> <by live is a solution and antimonotonicity lemma (different)>
         s1 ~_{live(exit)(l)} s2
       = <by definition of init>
         s1 ~_{live(exit)(init(S))} s2

     (end of skip case)

     The inductive hypothesis is that for all substatements Si,
     if live |= LV^{\subseteq}(Si) and s1 ~_{live(entry)(init(Si))} s2,
     then (i) and (ii) follow with Si for S.

     Inductive cases:

     Suppose the rule applied was [seq1].
     Then S = S1; S2 and by [seq1]

           (S1, s1) --> (S1', s1')
        ____________________________
       (S1;S2, s1) --> (S1';S2, s1')

     By lemma 2.16, live |= LV^{\subseteq}(S1), so (different)
     by the inductive hypothesis, there is some s2' such that

       (S1, s2) --> (S1', s2') and s1' ~_{live(entry)(init(S1'))} s2'.
     
     So, by the [seq1] rule:

       (S1, s2) --> (S1', s2')
       ____________________________
       (S1;S2, s2) --> (S1';S2, s2')

     So it remains to prove that s1' ~_{live(entry)(init(S1';S2))} s2',
     but this follows immediately from the definition of init.

     (end of [seq1] case)

     See the book for the ohter cases, of which [seq2] also relies on
     lemma 2.16.

     (end of proof)

     Q: What does this theorem tell us about execution sequences?
        By induction on length, they also preserve solutions.
        See Corollary 2.22