COP 5021 meeting -*- Outline -*-

* constraint based analysis (1.4)

** goals

  The main goal is control flow analysis...

  Q:  What's the difference between a data flow analysis and a control
      flow analysis?
------------------------------------------
    DATA FLOW VS. CONTROL FLOW ANALYSIS

Main difference?


------------------------------------------

    ... in a data flow analysis we're interested in properties of
      variables and other data

    ... In a control flow analysis we are interested in how control
      passes from one elementary block to another.

  Q:  Isn't control flow obvious in all languages?
      no, not with:
      - lack of structure, such as go to
      - or with advanced control features, such as lambda,
         or object oriented dispatch

  In such languages, the number of successors and predecessors of the
  node is no longer small.

  Control information is needed for interprocedural flow analysis.

** setting

------------------------------------------
             SETTING

1. Convert all control structures to
   functions and function calls.

2. Analysis finds what functions
   can be called from each point
------------------------------------------

   Q:  Why switch to a functional language for this section?

   Because a language with first-class functions can simulate all
   other control structures

------------------------------------------
        CONTINUATION PASSING STYLE

An intermediate language
   with one control structure

Idea: every expression takes a
      "continuation"
      to which it sends its result

Examples:

  x < 0
==>
  [fn k => [[[%< x] 0] k]]


  if [x < 0]
  then [y := 22]
  else [z := 33]
==>
  [fn k =>
    [[[%< x] 0]
     [%if [[y := 22] k]
          [[z := 33] k]]]]
------------------------------------------

   Explain how the primitives manipulate the argument continuation

   You can think of the continuation as printing or returning the
   final result or passing control to it

------------------------------------------
            LANGUAGE (p. 140)

Work in a functional language:

  e \in Exp
  t \in Term
f,x \in Var
  c \in Const
 op \in Op
  l \in Lab

 e ::= t^l
 t ::= c
    | x 
    | fn x => e_0    "non-recursive fun"
    | fun f x => e_0 "recursive fun def"
    | e_1 e_2
    | if e_0 then e_1 else e_2
    | let x = e_1 in e_2
    | e_1 op e_2
------------------------------------------

    Q:  What are the atomic blocks being labeled here?
        all subexpressions

** idea

   Q: What are the main ideas in this approach?

------------------------------------------
   IDEAS OF CONSTRAINT BASED ANALYSIS

- assume no side effects
  ==> associate information with labels

- use a pair of functions, (C,p):
    C: Lab* -> Powerset(Value)
    C(l) contains possible values for
         subexpression at label l

    p: Var* -> Powerset(Value)
    p(x) constains possible values for
         variable x
------------------------------------------

        "C" is an "abstract cache"
        p is an "abstract environment"

        Q:  What's the alternative to associating information directly
        with labels?
             it would be associating information with entries and exits

        Q:  How could this information be useful in an object-oriented
            program?
             to know what code is called in a dynamic dispatch
             (e.g., through an interface in a Java-like language)

        Q:  How is this different than a type system?

            not much, but we're allowing ourselves more flexibility by
            using sets of values instead of "types"

------------------------------------------
           APPROACH

- collect constraints

  for function abstractions:

    e.g., given
    
      [fn x => [x]^1]^2
    
    get
    
      {[fn x => [x]^1]} \subseteq C(2)

------------------------------------------

   Q:  What's the value of a function definition?

       A term (representing a closure)

   Q: Why do they just use the term instead of a closure?
        Because the abstract environment contains all the necessary values at
        every program point.

        Because it would be impossible to precisely determine the
        actual environment at every program point

   Q:  What's the general pattern here?

------------------------------------------
  for variables:

   e.g., given

       [x]^1

   get

       p(x) \subseteq C(1)

  for applications:

    e.g., given

       [[f]^1 [e]^2]^3

    get

       {v | g \in C(1), a \in C(2), and
            v = (g a)}
       \subseteq C(3)

------------------------------------------

   Q:  What's the general rule?
       Use the cache symbolically... That is:
       The fn expression at the given label is a subset of the context
       for that label.
       When using variables, we use all the possible values it may have.

   Q:  What happens in  [[fn x => [x]^1]^2 [fn y => [y]^3]^4]^5 ?

       the book uses conditional constraints for this
         {fn x => [x]^1} \subseteq C(2) ==> C(4) \subseteq p(x)

       what's the least solution?

   Q:  What would be the constraints for an expression of the form
         [[e1]^1 op [e2]^2]^3 ?

   Q:  What would be the constraints for an if-then-else expression?

   Q:  What would be the constraints for a let expression,
       like  let x = e1 in e2 ?