COP 5021 Lecture -*- Outline -*-

* Intraprocedural Analysis (2.1)

  Several examples of data flow analysis for the WHILE language.
    Goal: learn how to precisely define them (implement in computer)

  Analyses defined by pairs of entry/exit functions,
   that map labels to analysis information (sets).

  The analysis operates on the control flow graph of the program...

** definitions and notation

   The point of these is to represent the control flow graph.
   These are all defined by structural induction.

*** initial and final labels

------------------------------------------
       INITIAL LABEL

init: Stmt -> Lab

  init([x := a]^l) = l
  init([skip]^l) = l
  init(S1; S2) =
  init(if [b]^l then S1 else S2) = 
  init (while [b]^l do S) = 

------------------------------------------

        ... init(S1)
            l
            l

     Q:  What would the initial label of a nondeterministic choice
         statement be?  Of a parallel composition statement?

     Q:  How would we generalize the formalism to handle such statements?
         perhaps return a set of labels.

------------------------------------------
      FINAL LABELS

final: Stmt -> Powerset(Lab)

  final([x := a]^l) = {l}
  final([skip]^l) = {l}
  final(S1; S2) =
  final(if [b]^l then S1 else S2) = 
  final (while [b]^l do S) = 

------------------------------------------

        ... final(S2)
            final(S1) \cup final(S2)
            l

     Q:  What would the final label set of a nondeterministic choice
         statement be?  Of a parallel composition statement?

*** blocks and labels

------------------------------------------
      ELEMENTARY BLOCKS

blocks: Stmt -> Powerset(Block)

  
------------------------------------------
     Q:  How would you define the set of elementary blocks in a statement?

            blocks([x := a]^l) = {[x := a]^l}
            blocks([skip]^l) = {[skip]^l}
            blocks(S1; S2) = blocks(S1) \cup blocks(S2)
            blocks(if [b]^l then S1 else S2)
                  = {[b]^l} \cup blocks(S1) \cup blocks(S2)
            blocks(while [b]^l do S) = {[b]^l} \cup blocks(S)

     Q:  What would the set of blocks in an assert statement be?
     Q:  What would be set of blocks in a nondeterministic choice
          statement be?

     Q:  How would you define the set of labels in a statement?

         labels: Stmt -> Powerset(Lab)
         labels(S) = {l | [B]^l \in blocks(S)}

*** flows and reverse flows

------------------------------------------
      FLOWS

flow: Stmt -> Powerset(Lab x Lab)

  flow([x := a]^l) = {}
  flow([skip]^l) = {}
  flow(S1; S2) = 
                   
  flow(if [b]^l then S1 else S2) =
 
  flow(while [b]^l do S) =
  
------------------------------------------
        ... flow(S1) \cup flow(S2)
            \cup {(l, init(S2))| l \in final(S1)}

            flow(S1) \cup flow(S2)
            \cup {(l,init(S1)} \cup {(l,init(S2)}

            flow(S) \cup {(l, init(S))}
            \cup {(l', l)| l' \in final(S)}

   Q:  How would we use these functions to represent the nodes and
   edges of a control flow graph?

   Q: What are labels and edges of 
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?             

   Q:  How you formulate a set of reverse flows?

       flow^R: Stmt -> Powerset(Lab xLab)
       flow^R(S) = {(l', l) | (l, l') \in flow(S)}

   Q: What is flow^R of
         if [x > 3]^1 then [y:=2]^2 else [z:=3]^3
      ?             
      flow = {(1,2),(2,3),(3,1)} so flow = {(2,1),(3,2),(1,3)}
  
   Q: What are the initial nodes of flow^R?
      the final nodes of flow

*** program of interest

    In the book, the stars are subscripts,
    but the ascii version here uses them in line S* vs. S
                                                         *
------------------------------------------
           PROGRAM OF INTEREST

 S* = the top level statement

 Lab* = labels(S*)

 Var* = FV(S*)

 Blocks* = blocks(S*)

------------------------------------------

------------------------------------------
      ISOLATED ENTRIES AND EXITS

def: S* has isolated entries iff


def: S* has isolated exits iff


------------------------------------------
        ... (\forall l \in Lab :: (l, init(S*)) \not\in flow(S*))

        ... (\forall l1 \in final(S*) ::
               (\forall l2 \in Lab ::
                   (l1, l2) \not\in flow(S*)))

   Q: What kind of programs would not have isolated entries?
      Those that do not start with a while-loop.

   Q: What kind of programs would not have isolated exits?
      Those that do not end with an if-then-else statement or while....   

   Q: Could we convert programs to have both isolated entries and exits?
      Yes, add skip statements as necessary
      So no loss of generality in assuming these

------------------------------------------
           LABEL CONSISTENT

def: S is label consistent if and only if
     no two blocks in S
     have the same label.

------------------------------------------

   Q:  How would you formalize that?

       [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2

   Q:  Is there any reason not to have label consistent programs?

** available expressions analysis (2.1.1)

   Now we're going to look at 4 different analyses,
   to illustrate the formalism and to see how to precisely define
   dataflow analyses.

*** trivial and non-trivial expressions
      Q:  What's a trivial expression?
          a single variable or a constant, in other words, a base case

------------------------------------------
       SUBEXPRESSIONS

 Aexp(a) = non-trivial arithmetic
           subexpressions of a

 Aexp(b) = non-trivial arithmetic
           subexpressions of b

 Aexp* = nontrivial arithmetic expressions
             in S*

------------------------------------------

*** idea, goal
------------------------------------------
      AVAILABLE EXPRESSIONS ANALYSIS

 "For each program point,
  which [non-trivial] expressions 
  must have already been
  computed, and not later modified,
  on all paths to that program point."

Example:

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do {  [t := a+k]^3;
         [j := j+1]^4;
         [k := i*j-1]^5  }
------------------------------------------

    Q: What non-trivial expressions are available at entry to block 2?
       i*j and (i*j)-1

*** formalization

    The basic idea is to define functions on each elementary block,
    using two sub-functions: gen and kill...

------------------------------------------
        FORMAL DEFINITION

AEentry(l) = 
   if l = init(S*) then {}
   else \bigcap {AEexit(l')
                   | (l',l) \in flow(S*)}

AEexit(l) = 
 (AEentry(l) \ killAE(B^l)) \cup genAE(B^l)
    where B^l \in blocks(S*)

killAE: Blocks* -> Powerset(Aexp*)

killAE([x:= a]^l) =
      {a' \in Aexp* | x \in FV(a')}
killAE([skip]^l) = {}
killAE([b]^l) = {}

genAE: Blocks* -> Powerset(Aexp*)

genAE([x:= a]^l) =
      {a' \in Aexp(a) | x \not\in FV(a')}
genAE([skip]^l) = {}
genAE([b]^l) = Aexp(b)
------------------------------------------

    Q: What's the role of the control flow graph here?
       it's an implicit parameter,
       so these equations generate particular sets for every control flow
       graph (program).

    Q: What does the kill function mean?
       It says what expressions are no longer available,
       since they were assigned (what to take out of the analysis).

    Q: What does the gen function mean?
       It says what expressions become available.

    Q: Why don't we have to define the analysis for while loops and if
       statements? 
       it's implicit in the DFG

    Q: What are we assuming with this formalism?
       That the program is label consistent.  Why?
            because of the use of blocks(S*)
       That the program has isolated entries.  Why?
            because of the use of l == init(S*)

    Q: How would we adjust this if we didn't have isolated entries?
       (1) put a skip at the beginning of the program.
       (2) simulate the effect of that, by intersecting {} with other exits

*** observations
    Q: This is a forward analysis, Why?
       because we don't use flow^R
       
    Q: What makes a "solution" unsafe?
       too much in the set
    Q: What makes it imprecise?
       too little in the set

    Q: We want the largest (safe) sets, Why?
       because we want more information to use in optimizations

    Q: Note the use of the word "must", what impact does that have on the
    analysis?
       intersections of entry information, makes us want the largest set

*** example revisited
------------------------------------------
        EXAMPLE

   [k := i*j-1]^1;
   while [i*j-1 < n]^2
   do ([t := a+k]^3;
       [j := j+1]^4;
       [k := i*j-1]^5)

What's the control flow graph?


------------------------------------------

------------------------------------------
           KILL AND GEN

What are killAE and genAE for this?

l  killAE(l)      genAE(l)
============================
1
2
3
4
5
------------------------------------------


------------------------------------------
     EXAMPLE EQUATIONS

AEentry(1) =
AEentry(2) =
AEentry(3) =
AEentry(4) =
AEentry(5) =

AEexit(1) =
AEexit(2) =
AEexit(3) =
AEexit(4) =
AEexit(5) =

------------------------------------------

    Q: So what sets do we start with to find a solution?
       Aexp*, because this is a must (intersection) analysis

    Q:  So what would the solution be?

** Reaching Definitions Analysis (2.1.2)

------------------------------------------
   REACHING DEFINITIONS ANALYSIS (2.1.2)

"For each program point,
 which assignments may have been made
 and not overwritten,
 when program execution reaches that point
 along some path?"


------------------------------------------

   Q: Is this a forward or backward analysis?
      forward, uses flow not flow^R
   Q: What makes the analysis imprecise?
      larger sets
   Q: So what solution do we want?
      the smallest set

   Q: Note the use of the word "may" in the analysis statement,
      what impact does that have on the analysis?
       unions of entry information, makes us want the smallest set


** Very Busy Expressions Analysis (2.1.3)

*** idea and goals
    This can be useful in hoisting expressions:
    evaluating an expression
    and storing it for later use (e.g., in a register).

------------------------------------------
      VERY BUSY EXPRESSIONS

def: A non-trivial expression e 
     is *very busy*
     at exit from block l if,
     e must always be used before some
     x \in FV(e) is assigned.

At what points is a+b very busy in:

  [x := a+b]^1;
  [y := a+b]^2

  if [a-b > a+b]^3
  then [x := a+b]^4 else [y := a+b]^5

  [q := r]^7;
  [z := a+b]^8;

  if [a>b]^9
  then [x := a+b]^10 else [y := a+b]^11

  if [a>b]^12
  then [x := a+b]^13 else [y := 641]^14

------------------------------------------

        ... exits of 1, 3, 7, 9
            (not 12 or any of the others)

       "an expression is very busy at exit from a block if it is very
       busy at the entry to every block that follows."

       "However, no expressions are very busy at the exit from any
       final block"


------------------------------------------
      VERY BUSY EXPRESSIONS ANALYSIS

 "For each program point,
  which [non-trivial] expressions 
  must be very busy
  at the exit from the point."

------------------------------------------

  Q: When should variables be live at the end?
     If a program is a procedure body and the variables are all either
     global or call-by-reference parameters

  Q:  What could we use this for?
      Hoisting, which is to store the value of the expression for
      later use.

*** formal definition
------------------------------------------
         FORMAL DEFINITION

VBexit(l) = 
  if l \in final(S*) then {}
  else \bigcap { VBentry(l') |
                 (l', l) \in flow^R(S*) }
VBentry(l) =
 (VBexit(l) \ killVB(B^l)) \cup genVB(B^l)
   where B^l \in blocks(S*)

killVB: Blocks* -> Powerset(Aexp*)

killVB([x:= a]^l) =
      {a' \in Aexp* | x \in FV(a')}
killVB([skip]^l) = {}
killVB([b]^l) = {}

genVB: Blocks* -> Powerset(Aexp*)

genVB([x:= a]^l) = Aexp(a)
genVB([skip]^l) = {}
genVB([b]^l) = Aexp(b)

------------------------------------------

   Q:  Is this a forward or backward analysis?
       backwards, because we use flow^R.

   Q: Does this analysis need isolated exits?
      the authors say so, why?
      because of the initial condition for LVexit()
      it's a "may" analysis

   Q:  Why is there an intersection for VBexit?
      because this is a "must" analysis and we care about all paths

   Q:  Do we want the largest or the smallest solution?
       largest, because smaller is imprecise

*** example
------------------------------------------
              EXAMPLE

  if [a-b > a+b]^1
  then [x := a+b]^2
  else [y := a+b]^3;
  [z := a]^4

l  killVB(l)   genVB(l)
=======================
1
2
3
4

VBentry(1) =
VBexit(1) =
VBentry(2) =
VBexit(2) =
VBentry(3) =
VBexit(3) =
VBentry(4) =
VBexit(4) =
------------------------------------------

** Live Variables Analysis (2.1.4)

*** idea and goals
------------------------------------------
           LIVE VARIABLES

def: A variable x is *live*
     at exit from label l
     if there is a path from l to a use
     of x that does not redefine x.

Which variables are live at exit from 1?

   [x := 3]^1;
   if [z > 0]^2
   then [y := x+2]^3
   else [q := q+1]^4

   [x := 3]^1;
   [y := x+2]^2;
   [y := y+1]^3

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3
   while [z > 0]^4
   do ([y := x+2]^5;
       [z := z-1]^6)

------------------------------------------

------------------------------------------
         LIVE VARIABLES ANALYSIS

 "For each program point,
  which variables may be live
  at the exit from the point."

Example:

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

------------------------------------------
   Q: Which variables are live at label 1? Label 2?
   none and z.

   The authors take the view "and that no variables are live at the
   end of the program" however they remark that "for some
   applications it might be better to assume that all variables are
   live at the end of the program."

   Q:  What can we use this for?
   dead code elimination: if the variable is not live at exit from a
   block that assigns to it, the assignment can be eliminated.

*** definitions and formalization

------------------------------------------
         FORMAL DEFINITION

LVexit(l) = 
  if l \in final(S*) then {}
  else \bigcup { LVentry(l') |
                 (l', l) \in flow^R(S*) }
LVentry(l) =
 (LVexit(l) \ killLV(B^l)) \cup genLV(B^l)
   where B^l \in blocks(S*)

kill: Blocks* -> Powerset(Var*)

killLV([x:= a]^l) = {x}
killLV([skip]^l) = {}
killLV([b]^l) = {}

genLV: Blocks* -> Powerset(Var*)

genLV([x:= a]^l) = FV(a)
genLV([skip]^l) = {}
genLV([b]^l) = FV(b)
------------------------------------------

   Q:  Is this a forward or backward analysis?
       backwards, because we use flow^R.

   Q: Does this analysis need isolated exits?
      the authors say so, why?

   Q:  Why is there a union for LVexit?

   Q:  Do we want the largest or the smallest solution?
       smallest, because larger is imprecise
       We want to know exactly what the dead variables are.

------------------------------------------
            EXAMPLE

   [x := 3]^1;
   [z := 4]^2;
   [x := z+2]^3

l  killLV(l)    genLV(1)
==========================
1
2
3

LVentry(1) =
LVexit(1) =
LVentry(2) =
LVexit(2) =
LVentry(3) =
LVexit(3) =
------------------------------------------

** Derived Data Flow Information (2.1.5)

------------------------------------------
  LINKING DEFINITIONS AND USES

Use-definition (ud) chain:
   links use of var to its last assignment

Definition-use (du) chain:
   links last assignment of var to its use
------------------------------------------

    Q:  What might this be useful for?
    dead code elimination
    code motion (reordering)

*** formal definitions
------------------------------------------
           DEFINITIONS AND USES

definition clear path for x

  clear(x, l, l') =
    (\exists l1, ..., ln ::
         l = l1 & ln = l' & n > 0
       & (\forall i : 1 <= i < n :
          (li, li+1) \in flow(S*))
       & (\forall i : 1 <= i < n :
          not(def(x, li)))
       & use(x, ln))

  def(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in killLV([B]^l))

  use(x, l) = (\exists B :
                 [B]^l \in blocks(S*) :
                  x \in genLV([B]^l))

------------------------------------------

     Q:  Why are the def and use functions correct?
     Q:  How do you interpret the notion of a clear path?
     Q:  Does clear(y, 3, 7) tell you anything about the use of y?
        yes, it's used at 7

------------------------------------------
           UD AND DU

ud: Var* x Lab* -> Powerset(Lab*?)
ud(x, l') =
   {l | def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
  \cup
   {? | clear(x, init(S*), l')}

du: Var* x Lab*? -> Powerset(Lab*)
du(x, l) =
 if l != ?
 then
   {l'| def(x, l),
       (\exists l2 : (l, l2) \in flow(S*):
                     clear(x, l2, l'))}
 else
   {l'| clear(x, init(S*), l')}

------------------------------------------

     Q: Do these require isolated entries?
        yes

     Q: Are these must or may analyses?
        may

     Q: Can we define du in terms of ud?

*** example
------------------------------------------
        EXAMPLE

   [z := 3]^1;
   if [y > 0]^2
   then [y := z+2]^3
   else [y := y+1]^4

    ud(x, l)

 l \ x |   y    z
===================== 
   1
   2
   3
   4

    du(x, l)

 l \ x |   y    z
===================== 
   1
   2
   3
   4
------------------------------------------

*** computation

    Q: How could we use RD and LV to compute ud chains?
    see p. 54

    UD : Var* x Lab* -> Powerset(Lab*)
    UD(x,l) = if x \in genLV(B^l) then {l' | (x,l') \in RDentry(l)}
              else {}

    computing DU is an exercise.