3.1 Composition

Next: 3.2 Code Inheritance and Up: 3 Customization of Components Previous: 3 Customization of Components

3.1.1 Proving Properties of Imported Code (Obligation 2)
- Method Specifications
- Semantical Constraints
3.1.2 Reducing Proof Obligations for New Code (Obligation 3)
- Behavioral Subtyping
- Semantic Modules

3.1 Composition

Recall, that specifying a class invariant means that this invariant has to be preserved by all public methods of a program. Assume a module M with classes C₁ to C_m. M does not import any modules, i.e., M is a basic component. Verifying M means to prove the triple above for each public method of M.

Consider a module N which imports M. N contains the classes C_m+1 to C_n. I.e., the resulting program P is a composition of the modules M and N containing the classes C₁ to C_n. What triples have to be shown to verify P? Again, we have to prove that every public method of P preserves each invariant of any C_i. For most practical applications, this is equivalent to showing that (1) C_M::m preserves INV_M, (2) C_M::m preserves INV_N, (3) C_N::m preserves INV_M, and (4) C_N::m preserves INV_N, where C_M::m and C_N::m denote a public method m of a class declared in module M or N, respectively and INV_M and INV_N denote the conjunction of all class invariants in module M or N.

Obligation 1 is fulfilled as M is verified. Obligation 4 causes no further problems as it can be solved locally in module N (maybe using some specifications of M). Obligation 2 means to show properties of an imported module. This is not desirable as the imported code may come from a library and, thus, be not accessible to the verifier. Obligation 3 results in a huge amount of proof obligations for complex components as it contains one triple for each class in the import path of the module. So, in large systems, hundreds or thousands of triples have to be proved. In the next two paragraphs, we will discuss possible solutions to the problems caused by obligations 2 and 3.

3.1.1 Proving Properties of Imported Code (Obligation 2)

If imported code comes from a software library, clients of that code may only have access to interfaces and specifications, not to the implementation. Thus, we have to develop techniques that allow one to prove obligations of the second kind (see above) without revisiting imported code. This can be achieved by two approaches: Preservation of the invariants can be proved using specifications of imported methods, or can be achieved by requiring the invariants of new classes to fulfill certain semantical constraints.

Method Specifications

To deduce the preservation of invariants from method specifications, these have to be sufficiently strong and detailed. E.g., if the specification of method m states that m only modifies objects of class C, and if objects of a class D cannot reach C objects in memory, the invariant of D cannot be violated by m. This example shows two important aspects of method behavior: Sharing properties and invariance properties.

Sharing Properties describe how objects in memory are shared by different object structures. E.g., the last object of a singly linked list is shared by all other list objects of the list structure. Specification of sharing properties is crucial for verification because, in general, modification of one object changes the abstract value of all structures that share this object. Basic constructs to express sharing properties are reachability of objects or disjointness of object structures.

These constructs are quite coarse grained. [PH95] shows, that more elaborated techniques can be used to specify sharing properties of certain data representations on an abstract level. Thus, we have to study typical storage layouts to develop such techniques. In particular, these techniques should allow us to prove the absence of sharing of certain objects and thus to deduce the preservation of invariants.

Invariance Properties are used to describe side-effects of methods by specifying which parts of the object environment remain unchanged under method execution. They can be specified in two ways: One can enumerate all objects that might be modified (or created or deleted) under method execution in a so-called modifies-clause. I.e., all objects not mentioned have to be left unchanged. An other possibility is to directly specify which parts of the object environment stay untouched. This can be done by relating the pre- and poststate of the method. [PH97b] introduces two relations between object environments stating that (a) all objects reachable from a certain object remain unchanged or (b) all objects reachable from any object of a certain type remain unchanged.

The advantages of the direct technique are: (1) Sharing can be taken into account. In general, modification of one object affects the abstract values of all object structures referencing this object. This can hardly be expressed by modifies-clauses. (2) Relations between the pre- and poststate can be included into the usual pre-post-specification and do not require an extra modifies-clause. This makes the use of invariance properties in correctness proofs easier as they fit into the schema of Hoare-triples. (3) Object modification, creation, and deletion can be handled by the same technique. To exploit these advantages, more sophisticated techniques for relating object environments are needed. In particular, we want to specify side-effects in a very detailed and abstract way.

Specification Methodology might be the most important research topic in this area. To bring specification and verification techniques to practice, we need guidelines for programmers telling which properties of methods have to be specified to enable verification. Most work on specification methodology focuses on the specification of functional behavior. But as we showed in the last two paragraphs, verification of component-based software requires the specification of invariance and sharing properties as well. Programmers usually have detailed knowledge about the algorithms they implement. But they are not really aware of side-effects and the sharing properties of the data structures they use. Thus, we have to develop specification techniques that allow them to intuitively specify these properties.

Semantical Constraints

Not every first-order formula is acceptable as an invariant. E.g., invariants should not make statements about all allocated objects. In most cases, this would lead to unimplementable specifications. Thus, the expressivity of invariants has to be restricted. As such restrictions are semantical properties, they are called semantical constraints. A simple semantical constraint is that the invariant of an object X may only state properties of objects reachable from X.

By putting stronger constraints on invariants, we can achieve that the invariants of new classes cannot be violated by imported methods. E.g., if an invariant only states properties of objects of classes that are declared in the current module (i.e., not imported), this invariant can only be violated by methods of these classes or their subclasses, i.e., not by imported methods.

The development of such semantical constraints will allow one to add new classes to a program and neither have to revisit imported code nor have to prove the preservation of invariants based on the specification of imported methods. Besides the technical aspects, we have to study the influence of such restrictions on programming methodology.

3.1.2 Reducing Proof Obligations for New Code (Obligation 3)

Although it is theoretically possible to prove that every new method preserves the invariant of each imported class, this is not desired because of two reasons: (1) In large systems the proof obligations become too big and, thus, unmanageable. (2) To support encapsulation of implementations, it is desirable to hide implementation-dependent parts of invariants from client modules. In the next two paragraphs, we show how behavioral subtyping can be used to reduce verification effort, and present the idea of semantic module concepts that allow one to drop the proof obligations about semantically private classes.

Behavioral Subtyping

Informally, behavioral subtyping is defined as follows: S is a behavioral subtype of T, if (1) each public method of S fulfills the specification of the corresponding method of T, in case the self object is of type S, and (2) if the invariant of S implies the invariant of T for all S objects.

Enforcing all subtypes to be behavioral subtypes reduces the effort of verification dramatically as it is sufficient to show the preservation of invariants of classes which are leaves of the subtyping tree. In usual programming languages, determining whether a class is a leaf is not possible until the whole program context -- and, thus, the whole subtyping tree -- is known. This is never the case for components. Therefore, we require all superclasses to be abstract. I.e., concrete classes can never be superclasses. Thus, all concrete classes are leaves of the subtyping tree. We do not have to prove the invariant of abstract classes as there is no implementation for that class and, thus, objects of this class will never exist. To sum up, enforcing behavioral subtyping reduces verification effort to proving the preservation of the invariant of all concrete classes. As a consequence of this technique, subtyping and code inheritance have to be discerned conceptually. We discuss this aspect in section 3.2.

In the context of semantical program development, enforcing behavioral subtyping is no further restriction to the programmer: Point (1) of the definition above is required anyway because due to dynamic binding, methods of subtypes can be executed in contexts where methods of the supertype were expected. Thus, they have to behave identically in these contexts. Otherwise, the context could not be verified. Aspect (2) is already implied by the semantics of invariants (see section 2.2). One can formally prove that the semantics presented in section 2.2 is equivalent to the proof obligations sketched in the last paragraph (the proof itself is beyond the scope of this paper and, thus, not presented here).

Semantic Modules

Syntactic module concepts support the encapsulation of classes. E.g., a module providing a doubly linked list can be implemented based on a public class for the head of the list and a private class C which stores the list elements. As the list can only be accessed via the head, C can be hidden from clients of the module. From a semantic point of view, this is no longer true, as clients of the module might violate the invariant of C or vice versa: A private class C may contain an attribute a of a public type D. Thus, the invariant of C, denoted by inv_C, may state properties of the objects stored in a. All methods of the module M containing C and D preserve inv_C. But, a client module N may contain a subtype of D providing a method that violates inv_C. Thus, we need a module concept which provides semantically private classes, i.e., classes that can be hidden from clients of the module in the sense that their invariants are not part of the interface of the module. This helps to solve the two problems sketched above: (1) Semantically private classes do not contribute to the proof obligations for client modules, which eases verification. (2) It is no longer necessary to reveal details of the implementation by exporting invariants which specify properties of the data representation.

Semantical privacy can be achieved by enforcing certain syntactical restrictions of the implementation, by putting semantical constraints on invariants. To illustrate this approach, we introduce a predicate P on types. P(T) holds, iff T is concrete and P holds for the type of each attribute of T. The invariant of a type for which P holds can not be affected by program extensions as subtyping is ruled out. This simple syntactical limitation is of course too restrictive as it forbids the use of subtyping. By combining this idea with a simple constraint on the invariant, the situation becomes better: The invariant of T may only state properties of objects which are reachable via attributes whose types fulfill P. This combination enables semantical privacy and a restricted use of subtyping.

Pushing the constraints on invariants further leads to the following notion: The invariant of a class C has only access to the invariants of reachable objects, not to their implementation. In combination with behavioral subtyping, this constraint assures the preservation of the invariant. The disadvantage of this approach is that it leads to an increasing number of types as showed by the following example: Assume, we have a class fraction with two attributes numerator and denominator of type int. The invariant states that the denominator is non-zero. Now we implement a class C which has an attribute a of class fraction. We want to specify that the fraction stored in a is non-negative. Because of the constraint sketched above, this can not be specified in the invariant of C as it is a property of fraction. Thus, we have to build a new class which essentially behaves like fraction but has a stronger invariant. This approach fully supports subtyping.

This discussion reveals two interesting research topics: (1) We need an appropriate mix between syntactical and semantical constraints. Therefore, we have to study implementations of software libraries to find the balance between the (partly contradictory) aims of supporting subtyping, providing semantical private types, keeping the amount of types low, and making the whole approach applicable in practice. (2) As shown in the example above, constraining the invariants may lead to lots of types with different specifications but very similar (or identical) implementations. Thus, we need techniques to share code between classes and derive the correctness of an importing class from the correctness of the exporting class. This is discussed in the next section.

Next: 3.2 Code Inheritance and Up: 3 Customization of Components Previous: 3 Customization of Components

Peter Mueller and Arnd Poetzsch-Heffter
Sept. 2, 1997