Com S 541 Lecture -*- Outline -*- * Kernel Language for the Declarative Computation Model Based on Peter van Roy and Seif Haridi's book, "Concepts, Techniques, and Models of Computer Programming" (MIT Press, 2004), where all references that are not otherwise attributed are found. ** motivation (2) Q: What does programming involve? - a computation model - a programming model - a set of reasoning techniques (correctness and efficiency) Q: What computation models should we be interested in? useful for programmers Declarative model has simple reasoning techniques and can be made efficient. It's also fundamental to lots of programming langauges esp functional and logic ones Can be made concurrent without losing good properties ** Defining langauges (2.1) *** how langauges are defined Define languages by givning syntax and semantics Syntax in terms of context free grammar + typing rules Semantics can be done in several ways, including operationally *** syntax ------------------------------------------ EXTENDED BACKUS-NAUR FORM (EBNF) Example ::= { } ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ------------------------------------------ Describe this as a game (p. 33) to make a derivation. The book defines partial grammar rules using '...' in a production Q: Can you give an EBNF grammar for phone numbers of the form 555-1212? *** semantics (2.1.2) ------------------------------------------ SEMANTICS Goals: - simple - mathematical - reason about correctness - reason about efficiency Kernel Language Approach: [ Practical Language ] | | v [ Kernel Language ] E.g., fun {Sqr X} X * X end -translates-to-> proc {Sqr X Y} {'*' X X Y} end ------------------------------------------ Q: How to define the kernel language? in some other way (e.g., operational semantics) Q: What characteristics needed for kernel language? Why? - minimal - easy to translate into, without destroying structure - easy to understand - easy to reason about - has formal semantics **** linguistic abstractions In Oz, "fun" is a linguistic abstraction, it gets translated into proc Q: What is a linguistic abstraction? An abstraction that adds to a language, e.g. a for loop, functions (fun), classes, etc. Can be defined by macros Semantics by translation into kernel Can also help efficiency (e.g. switch statement in C/Java) **** syntactic sugar In Oz, can leave out "local" and "end" as if N == 1 then [1] else L in % ... end is sugar for if N == 1 then [1] else local L in % ... end end Q: What is a syntactic sugar? Q: How is it different from a linguistic abstraction? a shorthand, but not separately named, so not a new abstraction. Note: Many authors don't make this distinction. ** declarative computation model details (2.2-2.5) *** single-assignment store (2.2) ------------------------------------------ SINGLE-ASSIGNMENT STORE (2.2) Stores are sets of variables partitioned into: - sets of unbound but equal variables - singleton sets of determined variables bound to numbers, records, or procedures Notation and Type: sigma, s in Store = Variable -> PUValue x,y,z in Variable PUValue = Value + Set(Variable) V in Value = Number + Record + Closure + Variable (P,E) in Closure = x Environment ES in Set(Variable) dom(s) is the domain of s unbound(s,x) = (s(x) in Set(Variable)) aliases(s,x) = s(x), if unbound(s,x) determined(s,x) = (s(x) in Value) ------------------------------------------ The Set(Variable) is used for equivalence sets of unbound variables, we assume it can be distinguished from Values A Variable, or the Value it may refer to, is a "store entity" ------------------------------------------ NOTATION FOR SINGLE-ASSIGNMENT STORES Example 1: s = { x1, x2 = x3 } means s(x1) = {x1} // x1 is unbound s(x2) = {x2, x3} s(x3) = {x2, x3} pictured as: x1 [ *-]--->[unbound {x1}] x2 [ *-]--->[unbound {x2, x3}] x3 [ *-]-/ Example 2: { x1 = 541, x2 = [3,4], x3 = [3,4], x4 } pictured as: x1 [541] x2 [ *-]--->[ 3|*-]->[ 4|nil] x3 [ *-]--/ x4 [unbound] ------------------------------------------ Q: How would you implement this? Variables are addresses (locations) for memory. Values are addressed through the location's contents in memory, Values allocated on heap (in general). Q: What is a value store? one where all variables are bound to (complete) values. **** value creation (2.2.3) and operations on the store (2.8.2.2) ------------------------------------------ OPERATIONS ON THE STORE Allocation next: Store -> Variable alloc: Store -> Store such that: next(s) not in dom(s) next(s) in dom(alloc(s) unbound(alloc(s), next(s)) Binding of variables to values: bind: Store -> (Set(Variable) x Value) -> Store (bind(s)(ES, value))(y) = if y in ES then value else s(y) Binding of variables to variables: bind: Store -> Set(Variable) x Set(Variable) -> Store (bind(s)(ES1, ES2))(y) = if y in ES1 or y in ES2 then ES1 U ES2 else s(y) Example: bind({x1=x2, x3, x4=nil})({x1,x2}, 5) = {x1=5, x2=5, x3, x4=nil} Suppose s is {x1=x2=x3, x4, x5, x6=6, x7=7} What is: bind(s)({x4},{x5}) bind(s)({x4},{x1,x2,x3}) bind(s)({x5}, 5) bind(s)({x1,x2,x3}, 123) ------------------------------------------ ------------------------------------------ Unification of variables: unify: Store -> (Variable x Variable) -> (Store + (Failure x Store)) where Failure = String unify(s)(x, y) = if unbound(s,x) and unbound(s,y) then bind(s)(s(x), s(y)) elseif unbound(s,x) and determined(s,y) then bind(s)(s(x), s(y)) elseif unbound(s,y) and determined(s,x) then bind(s)(s(y), s(x)) elseif determined(s,x) and determined(s,y) then if s(x) in Number and s(y) in Number then if s(x) == s(y) then s else ("equality...", s) end elseif s(x) in Closure and s(y) in Closure then if equalClosures(s(x), s(y) then s else ("equality...", s) end else // assuming both are records let l(l1:x1, ..., ln:xn) = s(x) l'(l1':y1,...,lm':ym) = s(y) in if l==l' and [l1,...,ln] == [l1',...,lm'] then listUnify(s)([x1,...,xn], [y1,...,ym]) else ("equality...", s) end E.g., suppose s is {x1=x2=x3, x4, x5=n(l:7), x6=n(l:x1)} What is: unify(s)(x1,x4) unify(s)(x4,x1) unify(s)(x1,x5) unify(s)(x5,x6) ------------------------------------------ The failure cases are used to communicate to the rest of the semantics. If there is a failure, side effects on the store are not undone, hence we need to pass back the store with the failure message. Q: Can you write listUnify? **** environment (2.2.4-5) ------------------------------------------ ENVIRONMENT E in Environment = VIdentifier -> Variable Example: Program text: X = 541 1. value creation: x1 = 541 2. identifier binding X = x1 Now have environment: E(X) = x1 store: s(x1) = 541 Pictured: E s X [ *-]----> x1 [541] Notation E = {X --> x1} s = {x1 = 541} ------------------------------------------ Q: What is the value of X? Q: What is dereferencing? ------------------------------------------ NOTATIONS Environment manipulations adjunction: (E + {X --> x})(Y) = if Y == X then x else E(Y) restriction: (E| )(Y) {X1, ...,Xn} = if Y in {X1, ..., Xn} then E(Y) else undefined ------------------------------------------ Q: What's the domain of the restriction? **** partial values (2.2.6) ------------------------------------------ PARTIAL VALUES (2.2.6) def: A *partial value* is a data structure that may contain unbound variables. Example: declare X Y X = person(name:"George", age: Y) What environment and store does this produce? ------------------------------------------ See Fig. 2.12 X [ *-]-------> x1 [ *-]--> [ person | * | *-]-> [age|*] | | v | [name | * ] | | | v | ["George"] | / /--------------------------------/ v Y [ *-]---------x2 [ *-]----------------------> [unbound {x2}] Q: What happens after the binding Y = "25" is done? ------------------------------------------ COMPATABILITY OF VALUES Declarative variables can be bound to variables. X = Y Pictured: X [ *-]-----> x1 [ *-]---> [unbound {x1, x2}] / Y [ *-]-----> x2 [ *-]-/ Whenever a value is bound to one, the other sees it. declare X Y X = Y Y = 25 {Browse X} ------------------------------------------ X [ *-]-----> x1 [ 25] Y [ *-]-----> x2 [ 25] Q: Can this work for more than 2 variables? Yes, with the right implementation (union-find data structure). **** dataflow variables (2.2.8) ------------------------------------------ DATAFLOW EXECUTION What if a varaible is used before it is bound? 1. Get garbage from memory 2. Get default value for type (0) 3. Give error message and stop 4. Illegal, complain at compile time 5. Wait until variable is bound ------------------------------------------ Q: Why do these? Q: What languages do these? 1: C++, 2: Java fields and arrays, 3: Prolog in arithmetic, 4: Java for locals 5: Oz Q: What does Oz do? *** Kernel Language (2.3) **** Syntax (2.3.1) See Tables 2.1 and 2.2 Q: Why would you include all of these in the language? E.g., why skip? Q: Why leave out some things, e.g., = ? can get away with it at little cost Q: why records? needed for data completeness Note: variable identifiers can also be written in backquotes **** Values and types (2.3.2) ------------------------------------------ TYPES def: a *type* is a set of values with a set of operations Types in the declarative model Value Number Int Char Float Record Tuple Literal Bool Atom ... List String ... Procedure ... ... ------------------------------------------ See Figure 2.16 can use IsT to test membership in T, e.g., IsProcedure Q: What does v has type T mean? Q: What does a subtype mean in this interpretation of types? Q: What is dynamic typing? Why do it? ***** Basic types (2.3.3) See the book, already covered ***** Records (2.3.4-5) ------------------------------------------ RECORDS (2.3.4) Creation newrec(field1: Val1) Operations: ------------------------------------------ Q: What's the operation for creating a record? Just write it down! Q: What are the operations on records? See 2.3.5: Arity, Label, and '.' for field selection. Arity gives a list of field names. Q: How can records be used to make enumerations? variants? Q: What does == do for records? compares for structural equality, not object identity! Q: What is the difference between X = Y and X == Y ? ***** Procedures (2.3.4-5) ------------------------------------------ PROCEDURES Creation: proc { $ X Y } Y = X+X end Operations: ------------------------------------------ Q: What are the operations on procedures? See p. 55 creation 9with proc) and calling (with {}) *** Kernel Language Semantics (2.4) **** basic concepts (2.4.1) ***** Static Scoping ------------------------------------------ STATIC SCOPING def: In *static scoping*, each identifier, X, def: In *dynamic scoping*, each identifier, X, Example: local X F T in local X in X=2 F = proc {$ Y ?Z} Z=X+Y end end X=1 {F 10 T} {Browse T} end ------------------------------------------ ... denotes the variable bound to X by the closest textually surrounding declaration of X This is also known as lexical scoping ... denotes the variable bound to X by the most recent binding still active binding for X The ? in the declaration of F is a comment indicating the output parameter. Q: What does the example give with each kind of scoping? Q: In the example, does it matter if X=1 occurs after the declaration of F? Q: What is the meaning of the procedure local X in X = 541 proc {$ Y Z} Z = X+Y end end with dynamic scoping? ***** Free and bound identifier occurrences (p. 64) ------------------------------------------ FREE AND BOUND IDENTIFIER USES def: In proc {$ X ?Y} Exp) end X and Y are declarations formal parameters. They binds all occurrences of that identifier in Exp, unless there is an intervening declaration of these formals. {proc {$ X ?Y} Y=X end F1 Z} {{{proc {$ X ?Y} Y=proc {$ X ?Y} Y={X F} end end} F Z} F1 Z1} def: an identifier X *occurs free* in and expression Exp iff def: an identifier X *occurs bound* in Exp iff Exp contains a use of X that is bound by a declaration of X in Exp ------------------------------------------ Q: in the first expression, what does X refer to? Draw arrows from uses to declarations in the two examples Be sure they understand what "intervening declarations" mean before going on ... Exp contains a use of X that is not bound by any declaration of X in Exp ------------------------------------------ EXAMPLES F, F1 occur free in: F1 {F F1} proc {$ B ?Res} Res=F end B, B1 occur bound in: proc {$ B ?Res} Res = B end proc {$ B1 ?R1} R1= proc {$ B ?R} R={B1 B} end end There can be a mix: {fun {$ B} B end F} ^ ^ bound-/ \-free occurrence occurrence The same variable can occur both ways: {fun {$ N} N end N} ^ ^ bound-/ \-free occurrence occurrence Variables that are free in a subexpression may be bound in a larger expression fun {$ F} {fun {$ B} {B F1} end F} end ------------------------------------------ Q: So if n occurs free in a expression, does that mean it doesn't occur bound? ------------------------------------------ FOR YOU TO DO What are the (a) free, and (b) bound identifiers in ... fun {$ X} fun {$ Y} X end end {G {Tail X}} fun {$ X} {G {Tail X}} end fun {$ G} fun {$ X} {G {Tail X}} end end ------------------------------------------ ***** procedure values or closures (p. 65) Named after Haskell Curry, a logician (although it was actually invented by Frege and Schoenfinkel) ------------------------------------------ CURRYING function: fun {$ LS1 LS2 LS3} {Append LS1 {Append LS2 LS3}} end curried form: fun {$ LS1} fun {$ LS2} fun {$ LS3} {Append LS1 {Append LS2 LS3}} end end end FOR YOU TO DO Curry the following definition: fun {$ X Y} X+Y end ------------------------------------------ Q: Can this be done in C++? ------------------------------------------ CURRYING IN C++? #include typedef int (*func)(int); int takes_y(int y) { return(x + y); } func cadd(int x) { return(&takes_y); } int main() { cout << (cadd(2))(3) << endl; } ------------------------------------------ Q: does this work? no, what's the value of x in takes_y? To solve that problem, simulate the notion of a closure // corrected C++ program #include typedef int (*func)(int, int); class closure { public: closure(int x_val, func f_val) : x(x_val), f(f_val) {} int call(int arg) { return f(x, arg); } private: const int x; const func f; }; int add(int x, int y) { return x + y; } closure* cadd(int x) { return new closure(x, add); } int main() { cout << cadd(2)->call(3) << endl; } (This is one reason we don't use C/C++, it's too hard to do this...) ------------------------------------------ PROCEDURE VALUES ARE CLOSURES def: a *closure* is: code for a procedrue and an environment ------------------------------------------ environment give values for free identifiers Q: So, in general, what in C++ is like a closure? an object: it has a little environment (data members) and code (member functions) But again, in C++, don't have anonymous classes, and can't capture the environment at run-time without preparing with class definition ahead of time. **** The abstract machine (2.4.2) This semantics is a terminal transition system (Plotkin), little step (C. Gunter), or computation (M. Hennessy) semantics essentially using rewriting as a universal machine ***** little step semantics in general To give the semantics of a programming language, use two auxiliary functions: input and output ------------------------------------------ COMPUTATION (LITTLE STEP) SEMANTICS Meaning Programs <-------> Answers | ^ input | | output | | v -->* | State ------------> T def: a in Meaning[[P]] iff ------------------------------------------ the Meaning relation is often partial, and defined by going around the diagram ... there is a g in T such that input[[P]] reducesto* g and output(g) = a. Meaning[[P]] is a function when reducesto* is Church-Rosser (confluent) ***** terminal transition systems this is the guts of the system, defining the abstract machine by defining reducesto ------------------------------------------ TERMINAL TRANSITION SYSTEM (TTS) (State, -->, T) State: -->: T: -->* reflexive, transitive closure ------------------------------------------ ... a set of configurations (e.g., g) i.e., configurations of the abstract machine ... a binary relation on State Note that --> may be partial The --> relation (Hennessy) is often written as ==> ... subset of terminal configurations, must be such that if g in T, then there is no g' such that g --> g' Q: What are the possible outcomes for a program? a set of terminal states, an infinite computation, or getting stuck! ***** TTS for Oz ------------------------------------------ TTS FOR OZ (ST,s) in State = Stack x Store + Message Stack = ( x Environment)* T = {(nil,s) | s in Store} + Message Message = String input[[S]] = ([(S,{})], {}) output((nil,s)) = s output(Msg) = Msg ------------------------------------------ As an alternative, one could define, for example, output((nil,s)) = s(x_1) assuming that x_1 is some standard "answer" location The messages are used for halting Oz with an error (e.g., for uncaught exceptions) Q: What states are terminal? Q: How should we define the transitions (-->)? ------------------------------------------ TRANSITIONS (-->) [skip] ((skip, E) | Rest, s) --> (Rest, s) [sequence] ((S1 S2, E) | Rest, s) --> ((S1, E) | (S2, E) | Rest, s) [local] ((local X in S end,E)|Rest, s) --> ((S,E')|Rest, s') where E' = E+{X-->x} and x = next(s), and s' = alloc(s) [var-var binding] ((X=Y, E) | Rest, s) --> (Rest, s') where s' = unify(s)(E(X), E(Y)) and isStore(s') [var-var bindingerror] ((X=Y, E) | Rest, s) --> ((raise failure(Msg) end, E) | Rest, s') where (Msg,s') = unify(s)(E(X), E(Y)) [value creation to unbound] ((X=V, E) | Rest, s) --> (Rest, s') where unbound(s, E(X)) and v = MV[[V]](E) and s' = bind(s)(s(E(X)),v) [value unification] ((X=V, E) | Rest, s) --> (Rest, s3) where determined(s, E(X)) and y = next(s) and s' = alloc(s) and v = MV[[V]](E) and s'' = bind(s')({y},v) and s3 = unify(s'')(E(X),y) and isStore(s3) [value creation error] ((X=V, E) | Rest, s) --> ((raise failure(Msg) end, E) | Rest, s3) where y = next(s) and s' = alloc(s) and v = MV[[V]](E) and s'' = bind(s')({y},v) and (Msg, s3) = unify(s'')(E(X),y) [if-true] ((if X then S1 else S2 end,E)|Rest, s) --> ((S1, E)|Rest, s) where determined(s, E(X)) and s(E(X)) == true [if-false] ((if X then S1 else S2 end, E)|Rest, s) --> ((S2, E)|Rest, s) where determined(s, E(X)) and s(E(X)) == false [if-error] ((if X then S1 else S2 end, E)|Rest, s) --> ((raise error(...) end, E)|Rest, s) where determined(s, E(X)) and not(s(E(X)) == true) and not(s(E(X)) == false) [application] (({X Y1 ... Yn}, E)|Rest, s) --> ((Body, E')|Rest, s) where determined(s,E(X)) and s(E(X)) in Closure and [Z1, ..., Zn] = args(s(E(X))) and Body = body(s(E(X))) and E' = env(s(E(X))) + {Z1 -->E(Y1)} + ... + {Zn -->E(Yn)} [application-error] (({X Y1 ... Yn}, E)|Rest, s) --> ((raise error(...),E) | Rest, s) where determined(s,E(X)) and not(s(E(X)) in Closure) or s(E(X)) does not have n arguments [case-match] ((case X of L(F1: X1 ... Fn:Xn) then S1 else S2 end, E)|Rest, s) --> ((S1, E') | Rest, s) where determined(s,E(X)) and isRecord(s(E(X))) and Label(s(E(X))) == L and Arity(s(E(X))) == [F1, ..., Fn] and E' = E + {X1 -->s(E(X)).F1, ..., Xn -->s(E(X)).Fn} [case-else] ((case X of L(F1: X1 ... Fn:Xn) then S1 else S2 end, E)|Rest, s) --> ((S2, E) | Rest, s) where determined(s,E(X)) and not(isRecord(s(E(X)))) or not(Label(s(E(X))) == L) or not(Arity(s(E(X))) == [F1 ... Fn]) ------------------------------------------ Could merge the two cases of [value creation] into 1 rule, [value creation] ((X=V, E) | Rest, s) --> (Rest, s3) where y = next(s) and s' = alloc(s) and v = MV[[V]](E) and s'' = bind(s')({y},v) and s3 = unify(s'')(E(X),y) and isStore(s3) but this generates garbage more quickly. Q: What happens if one of the variables in var-var binding is not in the domain of the environment, E? it's illegal as a program, so rejected Q: What free variables are allowed in a program? none in real Oz, some standard environment is used Q: What's the parameter passing mechanism? Call by reference! Q: What happens to an if-statement when the condition variable is not determined? gets stuck suspends Q: Can the matching case change the store? ------------------------------------------ MEANING OF VALUE EXPRESSIONS MV: ValueExpression -> Environment -> Value MV[[X]](E) = E(X), where X in MV[[N]](E) = N, where N in MV[[L]](E) = L, where L in MV[[L(F1:X1, ..., Fn:Xn)]](E) = L(F1: MV[[X1]](E), ..., Fn: MV[[X1]](E)), where L(F1:X1, ..., Fn:Xn) in MV[[proc {$ F1 ... Fn} Body end]](E) = (proc {$ F1 ... Fn} Body end, E|FVP), where FVP = FV(Body) \ {F1 ... Fn} is the set of free identifiers in the procedure ------------------------------------------ Note that this is a denotational semantics The value of a procedure is a Closure, which is a pair of the procedure's text and the environment restricted to its free identifiers. ***** Examples (2.4.5) ------------------------------------------ EXAMPLES let S1 = local X in X=2 {Browse X} end Then ([(S1,{})], {}) --> ([(X=2 {Browse X}, {X-->x1})], {x1}) --> ([(X=2, {X-->x1}) ({Browse X}, {X-->x1})], {x1}) --> ([({Browse X}, {X-->x1})], {x1=2,x2=2}) --> ------------------------------------------ We stop at Browse, because we don't know it's definition. What would be a suitable rule for Browse? Could have part of the store or the configurations be the browse output (a sequence?). ------------------------------------------ ANOTHER EXAMPLE local X in local Y in Y = proc {$ ?R} R=X end X=1 local X in X=2 local Z in {Y Z} {Browse Z} end end end end ------------------------------------------ Work this together in class another example is the translation of fun {K X} fun {$ Y} X end end A = {{K 3} 4} ***** sugars ------------------------------------------ SYNTACTIC SUGARS local X1 ... Xn in S end ==> local X1 in ... local Xn in S end ... end {P E} ==> local X in X=E {P X} end ------------------------------------------ Q: How would you generalize the last one? **** Memory management (2.5) ***** Last call optimization (2.5.1) Q: What is a last call optimization? Q: What is it useful for? Q: Does the semantics already to this? yes ***** Garbage collection (2.5.2-4) Q: Why is garbage collection useful? Q: Does the semantics already allow it? Not by the rules, but okay to add such a rule, although that could make the semantics nondeterministic