COP 5021 Lecture -*- Outline -*- * Shape Analysis (2.6) Back to ignoring procedures! But now we look at heap-allocated data Goals: finite approximations of data structures (shapes) useful for other software tools (e.g., null dereference detection) can be used for some kinds of verification of invariants (e.g., procedure doesn't introduce cycles into a list) ** syntax For reference, which we've seen and don't need to do again: a \in AExp "arithmetic expressions" b \in BExp "Boolean expressions" S \in Stmt "statements" x,y \in Var "variables" n \in Num "numeric literals" l \in Lab "labels" opa \in Op_a "arithmetic operators" opb \in Op_b "Boolean operators" opr \in Op_r "relational operators" ------------------------------------------ NEW AND MODIFIED SYNTAX sel \in Sel "selector names" p \in PExp "pointer expressions" op_p \in Op_p "pointer operators" p ::= "pointer expression" x "variable dereference" | x.sel "field dereference" a ::= p "dereference expression" | n | a1 opa a2 | nil b ::= op_p p "pointer test" | true | false | not b | b1 opb b2 | a1 opr a2 S ::= [p:= a]^l "assignment" | [malloc p]^l "allocation" | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S | assert [b]^l Op_p = {is-nil, ...} \cup {has-sel | sel \in Sel} ------------------------------------------ Note: this allows pointer arithmetic to be expressed becuase p is an arithmetic expression However, you can't field dereference an expression that uses pointer arithmetic (because field dereferences are only allowed on p, not a. Q: Could we rule out pointer arithmetic in a type system? Yes, if we make the types of pointers different than int Q: Can we test for equality of pointers? Yes, we can because p is an arithmetic expression ------------------------------------------ EXAMPLE (COPY-INTO) assume [not (f == t)]^0; while [(not is-nil(f)) and (not is-nil(t))]^1 do ([t.val := f.val]^2; [t := t.next]^3; [f := f.next]^4); ------------------------------------------ ** structural operational semantics (2.6.1) *** domains ------------------------------------------ OPERATIONAL SEMANTICS (2.6.1) Domains: xi \in Loc "locations" s \in State = Var* -> Storable Storable = Z + Loc + {<>} h \in Heap = (Loc x Sel) ->fin Storable ------------------------------------------ we assume that Loc is infinite <> is the nil value ->fin is for finite functions, and hence partial Var* is a finite set of variables (occurring in the program) Q: How would you explain the model of the heap? describes the fields of an object at a given location *** denotational semantics of expressions ------------------------------------------ SEMANTICS OF POINTER EXPRESSIONS P: PExp* -> (State x Heap) ->fin Storable P[[x]](s, h) = s(x) P[[x.sel]](s, h) = if s(x) \in Loc and (s(x), sel) \in dom(h) then h(s(x), sel) else undef ------------------------------------------ PExp* is subset of PExp such that (\forall p \in PExp* :: FV(p) \subseteq Var*) Q: Do we have to modify the semantics of arithmetic and Boolean expressions now? Yes, since pointers can be involved in the syntax, and since their semantics is partial and needs the heap... ------------------------------------------ SEMANTICS OF ARITHMETIC EXPRESSIONS A: AExp -> (State x Heap) ->fin Storable A[[p]](s, h) = P[[p]](s, h) A[[nil]](s, h) = <> A[[n]](s, h) = N[[n]] A[[a_1 op_a a_2]](s, h) = (OP_a[[op_a]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) ------------------------------------------ Q: What happens if P[[p]](s, h) is undefined? Then A[[p]](s, h) is undefined Q: Are there any other cases where A is undefined? Yes... Q: How do we prevent pointer arithmetic? making the meaning of the operators be undefined unless both arguments are integers (i.e., by changing OP_a[[.]]) ------------------------------------------ SEMANTICS OF BOOLEAN EXPRESSIONS B: BExp -> (State x Heap) ->fin T B[[a_1 op_r a_2]](s, h) = (OP_r[[op_r]]) (A[[a_1]](s,h)) (A[[a_2]](s,h)) B[[op_p p]](s, h) = (OP_p[[op_p]]) (P[[p]](s, h)) OP_p: Op_p -> Storable ->fin T OP_p[[is-nil]](v) = if v = <> then tt else ff ------------------------------------------ Q: What kinds of changes would be needed to OP_r, if any? Want to allow equality comparison of pointers, but not mixed comparisons between pointers and integers; the mixed cases should be undefined. *** operational semantics of statements ------------------------------------------ OPERATIONAL SEMANTICS OF STATEMENTS Configurations: (Stmt x State x Heap) + (State x Heap) Terminal configurations: (State x Heap) ------------------------------------------ Q: How does that change of configurations affect the description of the transitions? have to add the heap everywhere... ------------------------------------------ TRANSITIONS [asgn] ([x := a]^l, s, h) --> (s[x |-> A[[a]](s,h)], h) if A[[a]](s,h) is defined [fasg] ([x.sel := a]^l, s, h) --> (s, h[(s(x),sel) |-> A[[a]](s,h)]) if s(x) \in Loc and A[[a]](s,h) is defined [mal] ([malloc x]^l, s, h) --> (s[x |-> xi], h) if xi \in Loc does not occur in s or h [fmal] ([malloc x.sel]^l, s, h) --> (s, h[(s(x), sel) |-> xi], h) if xi \in Loc does not occur in s or h and s(x) \in Loc ------------------------------------------ Q: Does x.sel need to already be allocated when an fasg is executed? No, this "allocates" the field. Q: What happens if the side conditions are not met in [asgn] or [fasg]? the semantics is "stuck", because no transitions can take place, and since the configuration is not terminal, this indicates an error Q: Can that happen in [fmal]? yes Q: In [mal]? no, Loc is infinite xi is "fresh" in [mal] and [fmal] Q: What would the skip rule look like? seq1? [skip] ([skip]^l, s, h) --> (s, h) (S1, s, h) --> (S1', s', h') [seq1] ---------------------------------- (S1;S2, s, h) --> (S1';S2, s', h') Don't need to show the following... (S1, s, h) --> (s', h') [seq2] ---------------------------- (S1;S2, s) --> (S2, s', h') Q: How does the partiality of the Boolean semantics affect the if rules? [if1] (if [b]^l then S1 else S2, s, h) --> (S1, s, h) if B[[b]](s, h) = true [if2] (if [b]^l then S1 else S2, s, h) --> (S2, s, h) if B[[b]](s, h) = false So if B[[b]](s, h) is undefined, the semantics is "stuck", which indicates an error. The same would happen with the "while" rules [wh1] (while [b]^l do S, s, h) --> (S; while [b]^l do S, s, h) if B[[b]](s,h) = true [wh2] (while [b]^l do S, s, h) --> s if B[[b]](s,h) = false ------------------------------------------ EXAMPLE What transitions happen for [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We write L0 and L1 for locations. Let S_i be the statement with label i Let S_1234 be S_1; S_2; S_3; S_4, let S_234 be S_2; S_3; S_4, etc. Let s01 be the state where x has value L0 and y value L1 Let h0 = {} h1 = {(L0,next) |-> L1} h2 = {(L0,next) |-> L1, (L1,next) |-> L0} (S_1234, s??, h0) --> {by [seq2]} * ([malloc x]^1, s??, h0) --> {by [mal], using L0 as the new location} (s0?, h0) . (S_234, s0?, h0) --> {by [seq2]} * ([malloc x.next]^2, s0?, h0) --> {by [fmal], using L1 as the new location} (s0?, h1) . (S_34, s0?, h1) --> {by [seq2]} * ([y := x.next]^3, s0?, h1) --> {by [asgn], as A[[x.next]](s0?,h1) = L1} (s01, h1) . ([y.next := x]^4, s01, h1) --> {by [fasg], A[[x]](s01,h1) = L0} (s01, h2) ------------------------------------------ ** a shape analysis This follows the paper: ------------------------------------------ SHAPE ANAYSIS WITH PREDICATE ABSTRACTION Based on (largely quoted from): Roman Manevich, E. Yahav, G. Ramalingam, and Mooly Sagiv. "Predicate abstraction and canonical abstraction for singly-linked lists." In Verification, Model Checking, and Abstract Interpretation, volume 3385 of Lecture Notes in Computer Science, pages 181--198, Berlin, 2005. Springer-Verlag. ------------------------------------------ *** representing states with predicates (section 2.1) states are represented by logical formulas, instead of sets and functions, this is helpful for using theorem proving tools to compute the analysis... ------------------------------------------ REPRESENTING PROGRAM STATES AS A FIRST-ORDER LOGICAL STRUCTURE Objects represented by "individuals" State represented by a formula over a fixed set of predicates ------------------------------------------ Just substituting one kind of math for another, but helpful for computation ------------------------------------------ EXAMPLE FOR SINGLY-LINKED LISTS Objects (v) with a field "n" The set P^C = {eq, x, n} predicates meaning ======================================= eq(v1, v2) v1 equals v2 x(v) var x points to object v, for x \in Var* n(v1, v2) the next field of v1 points to v2 Consider the program: [malloc x]^1; [malloc x.next]^2; [y := x.next]^3; [y.next := x]^4; We could encode the state and heap at the end of this code as: Individuals: U = {L0, L1} eq(L0, L0) = tt eq(L1, L1) = tt eq(L0, L1) = ff eq(L1, L0) = ff x(L0) = tt y(L1) = tt x(L1) = ff y(L0) = ff n(L0, L1) = tt n(L1, L0) = tt n(L0, L0) = ff n(L1, L1) = ff ------------------------------------------ P^C stands for the set of concrete predicates "n" is short for "next" Q: How can we tell if a variable y points to <> or an integer? if there is no v such that y(v) ------------------------------------------ 2-STRUCTS OVER P def: A *2-valued logical structure* over a set of predicates P, 2-STRUCT_P is a pair S = (U,i) where U is the universe (set of individuals) and i is the interpretation function, such that for all p in P of arity k: i(p): U^k -> {tt, ff} ------------------------------------------ Q: What's the arity of eq? x? n? 2, 1, 2 We're using 2-STRUCTS over P^C for encoding states ------------------------------------------ ABSTRACTION FROM FORMER CONCRETE STATES toPC(s, h) = (U,i) U = {xi \in Loc | xi \in ran(s) or xi \in ran(h)} i(eq(xi1, xi2)) = (xi1 == xi2) i(x(xi)) = (s(x) == xi) i(n(xi1, xi2)) = (h(xi1,next) == xi2) ------------------------------------------ Q: What would you take as a program's initial 2-STRUCT over P^C? U = {}, i is the empty function. Q: Is i invertible? not exactly, as we are ignoring integer variables, etc. Q: Could we write the language's semantics using 2-STRUCTS over P^C? sure (see the paper and below) ------------------------------------------ TRANSITIONS USING 2-STRUCTS OVER P^C [asgnil] ([x := nil]^l, (U, i)) --> (U, i[x(xi) |-> ff]) if xi \in U and i(x(xi)) [asgv] ([x := y]^l, (U, i)) --> (U, i[x(xi) |-> tt]) if xi \in U and i(y(xi)) [asgf] ([x := y.n]^l, (U, i)) --> (U, i[x(xi) |-> tt]) if xi, xi2 \in U, i(y(xi2)), and i(n(xi2, xi)) [fasgnil] ([x.n := nil]^l, (U, i)) --> (U, i[n(xi,xi2) |-> ff]) if xi, xi2 \in U, i(x(xi)), and i(n(xi, xi2)) [fasgv] ([x.n := y]^l, (U, i)) --> (U, i[n(xi,xi2) |-> tt]) if xi, xi2 \in U, i(x(xi)), and i(y(xi2)) and there is no xi3 such that i(n(xi, xi3)) [fasgf] ([x.n := y.n]^l, (U, i)) --> (U, i[n(xi,xi4) |-> tt]) if xi, xi2 \in U, i(x(xi)), i(y(xi2)), and i(n(xi2, xi4)) and there is no xi3 such that i(n(xi, xi3)) [mal] ([malloc x]^l, (U, i)) --> (U \cup {xi}, i[x(xi)]) if not(xi \in U) [fmal] ([malloc x.n]^l, (U, i)) --> (U \cup {xi}, i[n(xi2, xi)]) if not(xi \in U) and i(x(xi2)) ------------------------------------------ Q: How would you generalize to arbitrary selectors? have more than just the n predicate Q: Is there a limit to the size of the 2-STRUCTS in these concrete rules? no Q: So are they suitable for doing an analysis? no, so need an abstraction The paper gives an encoding of the book's abstraction using predicates, but this isn't much fun and the paper shows it's superceded by the predicate abstraction, so we study... *** predicate abstraction (section 2.3) ------------------------------------------ PREDICATE ABSTRACTION Let P^A be a set of nullary predicates of interest. P^A = {P_1, ..., P_m} where each P_j is defined by some phi_j: 2-STRUCT[P^C] -> Boolean An *abstract state* is a truth assignment A: {1, ..., m} -> Boolean An *abstraction mapping* beta: 2-STRUCT[P^C] -> 2-STRUCT[P^A] where beta(U,i) = (A, i') and i'(P_j) = phi_j(U,i) ------------------------------------------ Q: Given a concrete state, can we decide on P_i's abstract value? yes ------------------------------------------ EXAMPLE Suppose we want to prove that the result of a program is a cyclic singly linked list. Idea: Design predicates useful in stating this property. Example set of nullary predicates: Let K > 0 be a fixed number. Let P^A = {NotNull[x] | x \in Var*} \cup {EqualsNext[k][x,y] | 0 <= k <= K and x, y \in Var*} Defining formulas: NotNull[x] = (exists v : v \neq <> : x(v)) EqualsNext[k][x,y] = (exists v_0, ..., v_k :: x(v_0) and y(v_k) and (forall j : 0 <= j <= k : n(v_j, v_{j+1}))) ------------------------------------------ Q: How do these predicates help describe this property? they let you state that you can reach from x back to itself. Q: Are these nullary predicates? Don't they take arguments? no, but we model them as nullary predicates by writing the arguments as part of their their names Q: So what does EqualsNext[3][q,r] mean? from q's value you can reach r's value by using 3 field dereferences of field next. Q: How would you represent the set of predicates? with a bit vector! Q: What does it mean if a predicate P_i is true? we know this information must be true in the state Q: What does it mean if NotNull[x] is false? x may be nil Q: What does it mean if EqualsNext[3][q,r] is false? we might not be able to reach what r points to from what q points to in 3 steps. We could use a "unknown" logical value also. ------------------------------------------ IMPLEMENTING THE ANALYSIS Idea: Use the defining formulas to induce the transfer functions In our example A is fixed (for a program), so we have configurations of the form (S,i') and i'. [asgnil] ([x := nil]^l, i') --> i'[NotNull[x] |-> ff, (forall k, y :: EqualsNext[k][x,y] |-> ff)] [asgv] ([x := y]^l, i') --> i'[NotNull[x] |-> b, EqualsNext[0][x,y] |-> tt] if b = i'(NotNull[y]) [asgf] ([x := y.n]^l, i') --> i'[NotNull[x] |-> (exists k, z :: EqualsNext[k][y,z]), (forall k, z : k < K : EqualsNext[k][x,z] |-> EqualsNext[k+1][y,z])] if i'(NotNull[y]) [fasgnil] ([x.n := nil]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> ff)] if i'(NotNull[x]) ------------------------------------------ Q: How would you do the rest of these? ... [fasgv] ([x.n := y]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> EqualsNext[k-1][y,z])] if i'(NotNull[x]) [fasgv] ([x.n := y.n]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> EqualsNext[k][y,z])] if i'(NotNull[x]) and i'(NotNull[y]) [mal] ([malloc x]^l, i') --> i'[NotNull[x] |-> tt] [fmal] ([malloc x.n]^l, i') --> i'[(forall k, z : k <= K : EqualsNext[k][x,z] |-> ff)] if i'(NotNull[x]) Q: How do these relate to the transfer functions? the transfer function takes the block and i' value, and produces the new i' as directed. The transfer function definition only cares about the atomic blocks... This kind of inducement of the analysis from an abstraction function is the heart of abstract interpretation...