COP 5021 Lecture -*- Outline -*- * Shape Analysis (2.6) Back to ignoring procedures! But now we will look at heap-allocated data Goals: finite approximations of data structures (shapes) useful for other software tools (e.g., null dereference detection) can be used for some kinds of verification of invariants (e.g., procedure doesn't introduce cycles into a list) ** syntax For reference, which we've seen and don't need to do again: a \in AExp "arithmetic expressions" b \in BExp "Boolean expressions" S \in Stmt "statements" x,y \in Var "variables" n \in Num "numeric literals" l \in Lab "labels" opa \in Op_a "arithmetic operators" opb \in Op_b "Boolean operators" opr \in Op_r "relational operators" ------------------------------------------ NEW AND MODIFIED SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dereference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assert [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Note: this allows pointer arithmetic to be expressed becuase p is an arithmetic expression However, you can't field dereference an expression that uses pointer arithmetic (because field dereferences are only allowed on p, not a. Q: Could we also rule out pointer arithmetic in a type system? Yes, if we make the types of pointers different than int Q: Can we test for equality of pointers? Yes, we can because p is an arithmetic expression ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f == t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ ** structural operational semantics (2.6.1) *** domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" Storable = Z + Loc + {<>} s \in State = Var* ->fin Storable h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ <> is the nil value we assume that Loc is infinite and can be distinguished from integers (Z) and from <> ->fin is for finite functions, and hence partial Var* is a finite set of variables (occurring in the program) Q: How would you explain the model of the heap? it's a finite map from location-selector pairs to storable values it describes the fields of an object at a given location *** denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ PExp* is subset of PExp such that (\forall p \in PExp* :: FV(p) \subseteq Var*) Q: Do we have to modify the semantics of arithmetic and Boolean expressions now? Yes, since pointers can be involved in the syntax, and since their semantics is partial and needs the heap... ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = let v_1 = A[[a_1]](s,h) in let v_2 = A[[a_2]](s,h) in if v_1 in Z and v_2 in Z then (OP_a[[op_a]])(v_1)(v_2) else undef ------------------------------------------ Q: What happens if P[[p]](s, h) is undefined? Then A[[p]](s, h) is undefined (returns undef) Q: Are there any other cases where A is undefined? Yes, trying to do pointer arithmetic, or cases like division by 0... Q: How do we prevent pointer arithmetic? by checking that each argument to an arithmetic operator is an integer (not a pointer) Could also change the meaning of the operators (e.g., OP_a[[+]] = \n\m . if n in Z and m in Z then n+m else undef) ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T where T = {tt,ff} B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = (v = <>) ------------------------------------------ Q: What kinds of changes would be needed to OP_r, if any? Want to allow equality comparison of pointers (i.e., locations), but not mixed comparisons between locations and integers; the mixed cases should be undefined. Also if either argument is undef, then undef results (strictness): e.g., OP_r[[<]] = \a.\b. if a in Z and b in Z then a < b else if a in Loc and b in Loc then a < b else undef *** operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ Q: How does that change of configurations affect the description of the transitions? have to add the heap everywhere... ------------------------------------------ TRANSITIONS WITH THE HEAP [asgn] ([x := a]^l, s, h) --> (s',h) if A[[a]](s,h) is defined and s' = s[x |-> A[[a]](s,h)] [fasg] ([x.sel := a]^l, s, h) --> (s,h') if s(x) \in Loc and A[[a]](s,h) is defined and h' = h[(s(x),sel) |-> A[[a]](s,h)] [mal] ([malloc x]^l, s, h) --> (s',h) if xi \in Loc does not occur in s or h and s' = s[x |-> xi] [fmal] ([malloc x.sel]^l, s, h) --> (s,h') if xi \in Loc does not occur in s or h and s(x) \in Loc and h' = h[(s(x), sel) |-> xi] ------------------------------------------ Note: a value is defined if it is not undef Note: we say that xi is "fresh" in [mal] and [fmal] Q: Does x.sel need to already be allocated when an fasg is executed? No, this "allocates" the field. Q: What happens if the side conditions are not met in [asgn] or [fasg]? the semantics is "stuck", because no transitions can take place, and since the configuration is not terminal, this indicates an error Q: Can that happen in [fmal]? yes Q: In [mal]? no, Loc is infinite (and the rule adds to the domain of s) Q: What would the skip rule look like? seq1? [skip] ([skip]^l, s, h) --> (s, h) (S1, s, h) --> (S1', s', h') [seq1] ---------------------------------- (S1;S2, s, h) --> (S1';S2, s', h') Don't need to show the following... (S1, s, h) --> (s', h') [seq2] ---------------------------- (S1;S2, s) --> (S2, s', h') Q: How does the partiality of the Boolean semantics affect the if rules? [if1] (if [b]^l then S1 else S2, s, h) --> (S1, s, h) if B[[b]](s, h) = true [if2] (if [b]^l then S1 else S2, s, h) --> (S2, s, h) if B[[b]](s, h) = false Thus, if B[[b]](s, h) is undefined, then the semantics is "stuck", which indicates an error. The same would happen with the "while" rules [wh1] (while [b]^l do S, s, h) --> (S; while [b]^l do S, s, h) if B[[b]](s,h) = true [wh2] (while [b]^l do S, s, h) --> s if B[[b]](s,h) = false ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We write L0 and L1 for locations. Let S_i be the statement with label i Let S_1234 be S_1; (S_2; (S_3; S_4)). Let S_234 be S_2; (S_3; S_4), etc. Let s0? be the state {x |-> L0} Let s01 be the state s.t., s01(x) = L0 and s01(y) = L1 Let h0 = {} h1 = {(L0,next) |-> L1} h2 = {(L0,next) |-> L1, (L1,next) |-> L0} (S_1234, s??, h0) --> {by [seq2]} * ([malloc x]^1, s??, h0) --> {by [mal], with L0 as new loc.} (s0?, h0) . (S_234, s0?, h0) --> {by [seq2]} * ([malloc x.next]^2, s0?, h0) --> {by [fmal], with L1 as new loc.} (s0?, h1) . (S_34, s0?, h1) --> {by [seq2]} * ([y := x.next]^3, s0?, h1) --> {by [asgn], A[[x.next]](s0?,h1)=L1} (s01, h1) . ([y.next := x]^4, s01, h1) --> {by [fasg], A[[x]](s01,h1) = L0} (s01, h2) ------------------------------------------ ** a shape analysis This follows the paper: ------------------------------------------ SHAPE ANAYSIS WITH PREDICATE ABSTRACTION Based on (largely quoted from): Roman Manevich, E. Yahav, G. Ramalingam, and Mooly Sagiv. "Predicate abstraction and canonical abstraction for singly-linked lists." In Verification, Model Checking, and Abstract Interpretation (VMCAI), LNCS vol. 3385, pp. 181--198, Berlin, 2005. Springer-Verlag. ------------------------------------------ *** representing states with predicates (section 2.1) states are represented by logical formulas, instead of sets and functions, this allows theorem proving tools to compute the analysis... ------------------------------------------ REPRESENTING PROGRAM STATES AS A FIRST-ORDER LOGICAL STRUCTURE Objects represented by "individuals" State represented by a formula over a fixed set of predicates ------------------------------------------ individuals are just meta-variables So this representation is just substituting one kind of math for another, but it's helpful for computation ------------------------------------------ EXAMPLE FOR SINGLY-LINKED LISTS Objects (v) with a field "n" The set P^C = {eq, x, n} predicates meaning ======================================= eq(v1, v2) v1 equals v2 x(v) var x points to object v, (for x \in Var*) n(v1, v2) the next field of v1 points to v2 Consider the program: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We could encode the state and heap at the end of this code as: Individuals: U = {L0, L1} eq(L0, L0) = tt eq(L1, L1) = tt eq(L0, L1) = ff eq(L1, L0) = ff x(L0) = tt y(L1) = tt x(L1) = ff y(L0) = ff n(L0, L1) = tt n(L1, L0) = tt n(L0, L0) = ff n(L1, L1) = ff ------------------------------------------ P^C stands for the set of concrete predicates "n" is short for "next" Q: How can we tell if a variable y points to <> or an integer? if there is no location, l, such that y(l) ------------------------------------------ 2-STRUCTS OVER P (MODELS) def: A *2-valued logical structure* over a set of predicates P, 2-STRUCT_P is a pair S = (U,i) where U is the universe (all individuals) and i is the interpretation function, such that for all p in P^C of arity k: i(p): U^k -> T where T = {tt, ff} ------------------------------------------ Q: What's the arity of eq? x? n? 2, 1, 2 Thus, we will use 2-STRUCTS over P^C for modeling/encoding states ------------------------------------------ ABSTRACTION FROM FORMER CONCRETE STATES toPC(s, h) = (U,i) U = {xi \in Loc | xi \in ran(s) or xi \in ran(h)} i(eq(xi1, xi2)) = (xi1 == xi2) i(x(xi)) = (s(x) == xi) i(n(xi1, xi2)) = (h(xi1,next) == xi2) ------------------------------------------ Q: What would you take as a program's initial 2-STRUCT over P^C? U = {}, i is the empty function. Q: Is i invertible? not exactly, as we are ignoring integer variables, etc. Q: Could we write the language's semantics using 2-STRUCTS over P^C? sure (see the paper and below) The following is the collecting semantics using 2-structs ------------------------------------------ TRANSITIONS USING 2-STRUCTS OVER P^C define eqStable(i,i') = (forall xi1,xi2 \in U :: i'(eq(xi1,xi2)) = i(eq(xi1,xi2))) define nStable(i,i') = (forall xi1,xi2 \in U :: i'(n(xi1,xi2)) = i(n(xi1,xi2))) define varStable(i,i') = (forall x \in Var*, xi \in U :: i'(x(xi)) = i(x(xi))) [asgnil] ([x := nil]^l, (U, i)) --> (U,i') if (forall xi \in U :: i'(x(xi)) = ff) and (forall y in Var* : y \neq x : (forall xi \in U :: i'(y(xi)) = i(y(xi))) and eqStable(i,i') and nStable(i,i') [asgv] ([x := y]^l, (U, i)) --> (U,i') if eqStable(i,i') and nStable(i,i') and (forall z in Var* : z \neq x : (forall xi' \in U :: i'(z(xi')) = i(z(xi'))) and ((exists xi \in U :: i(y(xi)) = tt and i'(x(xi)) = tt and (forall xi' \in U : xi' \neq xi : i'(x(xi')) = ff)) or (forall xi \in U :: i(y(xi)) = ff and (forall xi' \in U :: i'(x(xi')) = ff))) [asgf] ([x := y.n]^l, (U, i)) --> (U,i') if eqStable(i,i') and nStable(i,i') and (forall z in Var* : z \neq x : (forall xi' \in U :: i'(z(xi')) = i(z(xi'))) and i(y(xi)) = tt ((i(n(xi,xi2)) = tt and i'(x(xi2)) = tt and (forall xi' \in U : xi' \neq xi2 : i'(x(xi')) = ff)) or (forall xi2 \in U :: i(n(xi,xi2)) = ff and i'(x(xi2)) = ff)) [fasgnil] ([x.n := nil]^l, (U, i)) --> (U,i') if eqStable(i,i') and varStable(i,i') and i(x(xi)) = tt and (forall xi' \in U :: i'(n(xi,xi')) = ff) [fasgv] ([x.n := y]^l, (U, i)) --> (U,i') if eqStable(i,i') and varStable(i,i') and i(x(xi)) = tt and ((i(y(xi')) = tt and (forall xi2 in \U : xi2 \neq xi' : i'(n(xi,xi2)) = ff) and (forall xi4,xi5 \in U : xi4 \neq xi : i'(n(xi4,xi5)) = i(n(xi4,xi5)))) or (forall xi' \in U :: i(y(xi')) = ff) and (forall xi2 \in U :: i'(n(xi,xi2)) = ff)) [fasgf] ([x.n := y.n]^l, (U, i)) --> (U,i') if eqStable(i,i') and varStable(i,i') and i(x(xi)) = tt and i(y(xi')) = tt ((exists xi2 \in U :: i(n(xi',xi2)) = tt and i'(n(xi,xi2)) = tt and (forall xi3 \in U :: i'(n(xi',xi3)) = ff)) or (forall xi4 \in U :: i(n(xi',xi4)) = ff and i'(n(xi,xi4)) = ff)) [mal] ([malloc x]^l, (U, i)) --> (U',i') if not(xi' \in U), and eqStable(i,i') and (forall xi \in U :: i'(x(xi)) = ff and i'(eq(xi,xi')) = ff) and i'(eq(xi',xi')) = tt and (forall y in Var* : y \neq x : (forall xi2 \in U :: i'(y(xi2)) = i(y(xi2))) and nStable(i,i') [fmal] ([malloc x.n]^l, (U, i)) --> (U',i') if i(x(xi)) = tt and not(xi' \in U) and eqStable(i,i') and varStable(i,i') and i'(x(xi)) = tt and i'(n(xi,xi')) = tt and (forall xi2 \in U : xi2 \neq xi : and i'(eq(xi2,xi')) = ff) and i'(eq(xi',xi')) = tt and nStable(i,i') and i'(n(xi,xi')) = tt ------------------------------------------ Note that n must related two locations or <> Q: How would you generalize to arbitrary selectors? have more than just the n predicate Q: Is there a limit to the size of the 2-STRUCTS in these concrete rules? no, because each malloc can add new locations to U... Q: So are they suitable for doing an analysis? no because it's potentially infinite, so we want an abstraction that makes the property space finite, (so that fixed point computations are easier and more efficient) The paper gives an encoding of the book's abstraction using predicates, but this isn't much fun and the paper shows it's superceded by the predicate abstraction, so we study... *** predicate abstraction (section 2.3) The goal is to have a finite representation of the heap, even if the heap has unbounded size. ------------------------------------------ PREDICATE ABSTRACTION Let P^A be a set of nullary predicates of interest. A *predicate abstraction* is defined as: P^A = {P_1, ..., P_m} where each P_j is defined by some phi_j: 2-STRUCT[P^C] -> T where T = {tt,ff} An *abstract state* is a truth assignment A: {1, ..., m} -> T An *abstraction mapping* beta: 2-STRUCT[P^C] -> 2-STRUCT[P^A] where beta(U,i) = (A, i') and i'(P_j) = phi_j(U,i) ------------------------------------------ Q: Given a concrete state, can we decide on P_i's abstract value? yes, i(P_i) = phi_i(beta(U,i)) ------------------------------------------ EXAMPLE Suppose we want to prove that the result of a program is a cyclic singly linked list. Idea: Design predicates useful for stating this property. Example set of nullary predicates: Let K > 0 be a fixed number. Let P^A = {NotNull[x] | x \in Var*} \cup {EqualsNext[k][x,y] | 0 <= k <= K and x, y \in Var*} Defining formulas: NotNull[x] = (exists v : v \neq <> : x(v)) EqualsNext[k][x,y] = (exists v_0, ..., v_k :: x(v_0) and y(v_k) and (forall j : 0 <= j <= k : n(v_j, v_{j+1}))) ------------------------------------------ Q: How do these predicates help describe this property? they let you state that you can reach from x back to itself (in k steps) Q: Are these nullary predicates? Don't they take arguments? no, but we model them as nullary predicates by writing the arguments as part of their their names Q: So what does EqualsNext[3][q,r] mean? from q's value one can reach r's value by using 3 field dereferences of field next. Q: How would you represent the set of predicates? with a bit vector! Q: What does it mean if a predicate P_i is true? that the corresponding information must be true in the state Q: What does it mean if NotNull[x] is false? x may be nil (<>) Q: What does it mean if EqualsNext[3][q,r] is false? we might not be able to reach what r points to from what q points to by following the next field 3 times. We could use a "unknown" logical value also. ------------------------------------------ IMPLEMENTING THE ANALYSIS Idea: Use the defining formulas to induce the transfer functions In our example A is fixed (for a program), so we have configurations of the forms: (S,i') and i'. [asgnil] ([x := nil]^l, i) --> i' if i'(NotNull[x]) = ff and (forall y \in Var* : y \neq x : i'(NotNull[y]) = i(NotNull[y])) and (forall z \in Var*, k <= K :: i'(EqualsNext[k][x,z]) = ff) [asgv] ([x := y]^l, i) --> i' if i'(NotNull[x]) = i(NotNull[y]) and (forall z \in Var* : z \neq x : i'(NotNull[z]) = i(NotNull[z])) and (i(NotNull[y]) = tt implies (i'(EqualsNext[0][x,y]) = tt)) [asgf] ([x := y.n]^l, i) --> i' if i(NotNull[y]) = tt and (forall k < K, z \in Var* :: i(EqualsNext[k+1][y,z]) = i'(EqualsNext[k][x,z])) and (i'(NotNull[x]) = (exists k <= K, z \in Var* :: i(EqualsNext[k][y,z]) = tt)) and (forall z \in Var* : z \neq x : i'(NotNull[z]) = i(NotNull[z])) [fasgnil] ([x.n := nil]^l, i') --> i' if i(NotNull[x]) = tt and i'(NotNull[x]) = ff and (forall k <= K, z \in Var* :: i'(EqualsNext[k][x,z]) = ff) and (forall y \in Var* : y \neq x : (i'(NotNull[y]) = i(NotNull[y])) and (forall k <= K, z \in Var* :: i'(EqualsNext[k][y,z]) = i(EqualsNext[k][y,z]))) ------------------------------------------ Q: How would you do the rest of these? ... [fasgv] ([x.n := y]^l, i') --> i'[EqualsNext[k][x,z] |-> EqualsNext[k-1][y,z]] if i'(NotNull[x]), for all z and k <= K [fasgv] ([x.n := y.n]^l, i') --> i'[EqualsNext[k][x,z] |-> EqualsNext[k][y,z]] if i'(NotNull[x]) and i'(NotNull[y]), for all z, k <= K [mal] ([malloc x]^l, i') --> i'[NotNull[x] |-> tt] [fmal] ([malloc x.n]^l, i') --> i'[EqualsNext[k][x,z] |-> ff)] if i'(NotNull[x]), for all z, k <= K Q: How do these relate to the transfer functions? the transfer function takes the block and i' value, and produces the new i' as directed. The transfer function definition only cares about the atomic blocks... This kind of inducement of the analysis from an abstraction function is the heart of abstract interpretation...