INTRODUCTION TO DATA STRUCTURES: COURSE SPECIFICATION Gary T. Leavens Department of Computer Science, Iowa State University Ames, Iowa 50011-1040 USA leavens@cs.iastate.edu $Date: 1995/01/17 18:37:04 $ ABSTRACT Computer Science 228 is an introduction to data structures, with emphasis on data abstraction and information hiding. This document specifies the course's general and specific objectives. The specific objectives are specified in enough detail to determine the kinds of questions that will appear on homeworks and tests. 1. INTRODUCTION Computer Science 228 is a course that introduces students to data structures. The study of data structures seeks to answer the following questions. * How does one match a data structure to a given problem? Conversely, what problems does each kind of data structure help solve? * How can one correctly implement a specified abstract data type (ADT)? * How can one ensure that an ADT implementation is efficient? What efficiency trade-offs are involved? Com S 228 addresses all of these questions. 1.2 COURSE DESCRIPTION This class is designed to give you essential programming tools and knowledge. You will learn how to model data in a computer, how to specify and use standard ADTs, and how to implement such ADTs with standard data structures. You will learn how efficient or expensive various combinations of data structures and algorithms are. Along the way, you will develop skills in imperative programming. You will learn some of design principles and ideas for managing small (up to 500 line) programs. Finally we hope to share our sense of the satisfaction that comes from crafting an elegant, correct and efficient solution to a programming problem. The ISU catalog description of the course is as follows: An object-oriented approach to data structures and algorithms using [the] C++ language. Object-oriented programming. Program correctness. Stacks, queues, trees, searching, sorting, analysis of algorithms, graphs, and file processing. Emphasis on writing and running programs. This course is designed for majors. (4 credits). Object-oriented programming involves the use of abstract data types (C++ classes are an implementation technique for ADTs), message passing (the C++ term is virtual function calls), and inheritance (the C++ term is class derivation). Technically, this class will not use object-oriented programming, but instead will use mainly object-based programming; this means using ADTs, but not message passing or inheritance. Sorry, but the catalog description is inaccurate. A data structure is a representation (in a computer) for a collection of related information. Examples from Scheme are lists and vectors. A data structure can be used to implement an ADT, which can be thought of as a useful interface to several related data structures. The main ones ADTs discussed in the course are listed in the catalog description: stacks, queues, trees. (We will not cover graphs.) (The distinction between an ADT and a data structure is that a particular data structure implies particular space and time resource usage, whereas such details are usually suppressed in the specification of an ADT.) An algorithm is a method for performing some computational task. For example, a Scheme procedure embodies an algorithm, provided it always terminates when called. Many algorithms, such as varieties of searching and sorting algorithms, work with specific ADTs, and thus are best studied together with them and the data structures that implement the corresponding ADTs. A program is correct when its possible execution sequences are a subset of those given by its specification. It is thus impossible to discuss program correctness without having a specification for the program. Informal specifications will be discussed in the course, as will informal techniques for developing correct programs. Analysis of algorithms means deriving some measure or measures of the resource usage of a program. A computer scientist would probably phrase it as ``estimating the efficiency of an algorithm.'' The efficiency of an algorithm is often expressed using asymptotic notation (e.g., heap-sort is an O(n log n) time algorithm). Graphs will not be discussed in this offering of 228. File processing will only be a minor part of the course. To clarify a common misconception---this is not a course about C++. We will use C++, and you will learn about C++, but C++ is not the focus of the course. The focus is on data structures, and C++ is a good vehicle for that. 1.2 ACKNOWLEDGEMENTS The specification for the course described here was developed with the help of professors Baker and Fernendez-Baca. Many other faculty in the Computer Science department at ISU have contributed ideas and discussions. Special thanks to professor Baker for his help with this offering and in making available material from previous offerings. 2. WHO SHOULD NOT TAKE THIS COURSE Some students try to take this course just to learn C++. I want to explicitly discourage you if you are not taking this course to learn data structures. For example, if you are a graduate student or advanced undergraduate, this course is not for you, drop it and just read a book and work some problems on your own. (I'm happy to offer suggestions.) Also some students (especially advanced undergraduates) try to take this course just to have a course with C++ on their resume. This is a mistake if you already have credit for Com S 208; if so drop this course, as you can't also get credit for Com S 228. If you already have credit for Com S 212, then you can get credit for 228 as well, but be warned that you'll be seeing a lot of the data structure material you've already learned along with some C++. 3. PREREQUISITES The formal prerequisite for Com S 228 in the ISU catalog is ``[Com S] 227, [and] credit or enrollment in Math 165.'' You *must* talk with me if you don't meet the prerequisites. One reason for the math corequisite is that both programming and math involve problem solving. You won't get the maximum benefit from Com S 228 if you haven't done enough problem solving. Another reason is that there are various mathematical subjects (e.g., algorithm analysis, logic) that are used in the textbook. Finally, Math 165 is a prerequisite for subsequent courses in the theoretical part of the Computer Science curriculum (e.g., Com S 330). The reasons for the Com S 227 prerequisite are discussed below. 3.1 SKILLS NEEDED FROM Com S 227 You should have the following skills from Com S 227: * ability to use with computers and Unix: logging in, commands, editing, etc. * data modeling. * small algorithm coding and development: case analysis, recursion, iteration. * ability to communicate clearly to humans by using: helping procedures, modularization, and programming at the right level of detail (abstraction). * handling input and output. * avoiding redundancy by using/building functional abstractions. * some imperative programming using vectors, assignment, and mutation. One important point is that we will assume that you are fully versed in the important topic of recursion. 3. GENERAL OBJECTIVES The general objectives for Com S 228 are divided into two parts: a set of essential objectives and a set of enrichment objectives. The essential objectives will be helpful for your career as a computer scientist or engineer. You are encouraged to explore the enrichment objectives, both for their own sake and because learning more about those will help deepen your understanding of the essential objectives. 3.1 ESSENTIAL OBJECTIVES In one sentence the essential goal is for you to acquire skills and knowledge in imperative programming and data structures. In more detail, you should: * Have a working knowledge of basic algorithms and data structures. This includes: - a working knowledge of such implementation data structures as arrays, records, variants, and pointers, and - the use of such implementation data structures to implement ADTs such as stacks, queues, lists, sets, trees, and algorithms such as searching, and sorting. * Understand ADTs, their importance in object-based software development, how they are specified (informally), and how they can be implemented in C++. * Design, write, and document C++ programs of up to 500 lines in size. This includes the ability to: - design and write code that uses previously compiled C++ classes, given (informal) specifications of those classes, - design and (informally) specify ADTs to modularize and solve programming problems (i.e., object-based design), and - pick from known data structures those appropriate to model the information needed to solve the problem, and - creatively develop algorithms to solve such problems. * Understand the significance of information hiding in software development, and how information hiding can be achieved by using C++ modules and classes. * Understand asymptotic notation and notions of time and space complexity, and be able to roughly estimate the efficiency of algorithms. * Be able to carefully (albeit informally) reason about the correctness of programs that use ADTs and about the correctness of implementations of ADTs. The goal of this course is to give you a set of design tools (modules, ADTs, specifications as contracts) and computational tools (imperative programming, data structures, and algorithms) and to introduce you to two fascinating areas of Computer Science: the design of software systems, and the design and analysis of data structures and algorithms. A working knowledge of standard ADTs and data structures is fundamental to both the theory and practice of computing. Most interesting computational problems require the use of some data structure. Implementation data structures are fundamental to most programming languages. By increasing your knowledge of ADTs, such as stacks, queues, lists, sets, trees, etc., you will be able to solve common programming problems quickly, correctly, and in a modular fashion. ADTs are a key idea behind modern programming practice. An ADT captures the client's view of a data structure, and abstracts it from its possible implementations by a specification. This specification forms the contract between the implementor and the client. Seeing how ADTs are implemented in C++ will help you understand them, and will aid your programming in whatever language you happen to use. Since C++ is a fairly low-level language, learning how to implement ADTs in C++ will help you understand the efficiency (resource) issues involved. Skills in C++ programming will be important for later in the curriculum and currently seem to be useful in getting certain jobs. However, note that C++ itself (as a language, in all its gory details) is *not* the focus of this course; only the details of C++ that are needed to achieve the other objectives will be taught. On the other hand, this is quite a useful subset of the language. What is more important, are the skills of object-based design and programming that C++ gives a notation for. The ability to use existing classes is of increasing importance in writing production software, because one has to interface one's code to window systems, database systems, etc., which can all be presented as a collection of existing ADTs. The ability to design ADTs as needed, and to pick from one's knowledge of data structures useful ADTs and implementation techniques for them is important for you if you wish to write larger programs. Finally the ability to creatively develop algorithms is needed to actually compute results using the ADTs and data structures. Information hiding is important when working with many people on a project, or when developing a larger program. It consists of the ability to compartmentalize the knowledge and decisions that make up a design and implementation. Notions of time and space complexity are fundamental to the study of data structures and algorithms. Asymptotic notation is the most basic tool used in such analysis. This notation helps you compare data structures and algorithms, and to estimate which may be best for a given situation. Such distinctions are of practical importance for large amounts of data. The ability to carefully reason about correctness is important in making you a more productive programmer. The time spent in careful design and reasoning is more than repaid in lessening the amount of time needed to debug your code. 3.2 ENRICHMENT OBJECTIVES Enrichment objectives could be multiplied endlessly. Listed here are those that would be easy to teach based on the text, or that various instructors might wish to teach. * Understand the basics of object-oriented programming. Object-oriented programming adds to object-based programming the concepts of message passing (along with subtype polymorphism) and inheritance. The former is important in the design of flexible systems that can withstand change gracefully. The latter is important for the building of application frameworks that can be extended by users. * Understanding standard algorithm design strategies. These are standard concepts (like ``divide and conquer'') that help you design your own algorithms. * Understanding graph algorithms. Graph algorithms occur often in practical programming, and have been extensively studied. 3.3 SPECIFIC OBJECTIVES In general we will adopt the objectives of the Headington and Riley text. How these will appear on homeworks and tests is described below. 4. CONVENTIONS FOR EVALUATION OF ANSWERS To ensure more objectivity in grading essay and programming questions, we establish the following conventions here. 4.1 SHORT ANSWER AND ESSAY QUESTIONS Short answer questions will be graded on the basis of completeness (did you list all the relevant parts of the answer?) soundness (did you make any errors of fact or contradict yourself?) and clarity. You can help ensure clarity by giving examples where possible. For essays, critical justifications, and the like there may be no one right answer; so the criteria for judgement will be whether your essay is: * clear (you use examples where appropriate, you write clear sentences, diagrams, and you avoid excess verbosity), * sound (you start from facts or reasonable assumptions, your logic is convincing), and * complete (you consider relevant aspects of the problem and alternatives). Extra consideration will be given to answers that are especially creative. 4.2 PROGRAM QUESTIONS You are encouraged to use helping procedures and ADTs. For programming problems, there is usually more than one right answer, hence it is important to write a clear solution. Answers to programming questions will be scored with the following breakdown of points: * 60% for correctness (whether your program solves the given problem correctly in the given subset of the language) * 40% for elegance and clarity (a good, modular design using appropriate ADTs, that is well-documented with clear specifications, making good use of language features, the creativity of your solution) You should try to eliminate all syntax errors from your programs, as they affect both the correctness and clarity of your solution; you may receive no points for programs with major syntax errors. Checking of external inputs, and other conventions for dealing with humans are not important (for this class), unless the problem says so. 5. TESTS Test questions will be similar to, but not the same as, the problems in the book from the chapter in question. The problems will only involve programming ideas that are discussed in the textbook, perhaps from earlier chapters that we have covered. The specific ideas needed will not be stated, and to solve some problems two or more ideas may need to be combined. Problems may be stated by giving a specification and requiring that the entire solution be developed, or by giving a partial solution and requiring the completion of the unfinished parts of the solution. The specifications will be given in careful English as discussed in class, augmented with concrete examples. 6. DISCLAIMER The details of this course are subject to change as experience dictates. You will be informed of any changes. The following is the version information for this file: $Id: course-spec.txt,v 1.5 1995/01/17 18:37:04 leavens Exp leavens $